Overview
Rows
97,009
Columns
25
Missing Values
0%
0 total
Duplicate Rows
8
27.11 MB in memory
Data Quality
Missing Values by Column
No missing values detected
Outlier Summary
| Column | Outliers | % | Lower Bound | Upper Bound |
|---|---|---|---|---|
| age_difference | 22,490 | 23.18% | 0.00 | 0.00 |
| total_products | 1,042 | 1.07% | 4.50 | 16.50 |
Distributions
shopping_pt — Distribution
shopping_pt — Box Plot
Min: 3.00Q1: 5.00Med: 7.00Q3: 8.00Max: 12.50
record_type — Distribution
record_type — Box Plot
Min: 1.00Q1: 1.00Med: 1.00Q3: 1.00Max: 1.00
day — Distribution
day — Box Plot
Min: 0.00Q1: 1.00Med: 2.00Q3: 3.00Max: 6.00
location — Distribution
location — Box Plot
Min: 10001.00Q1: 10937.00Med: 12031.00Q3: 13429.00Max: 16580.00
group_size — Distribution
group_size — Box Plot
Min: 1.00Q1: 1.00Med: 1.00Q3: 1.00Max: 1.00
homeowner — Distribution
homeowner — Box Plot
Min: 0.00Q1: 0.00Med: 1.00Q3: 1.00Max: 1.00
car_age — Distribution
car_age — Box Plot
Min: 0.00Q1: 3.00Med: 8.00Q3: 12.00Max: 25.50
risk_factor — Distribution
risk_factor — Box Plot
Min: 1.00Q1: 2.00Med: 3.00Q3: 3.00Max: 4.00
age_oldest — Distribution
age_oldest — Box Plot
Min: 18.00Q1: 29.00Med: 44.00Q3: 60.00Max: 75.00
age_youngest — Distribution
age_youngest — Box Plot
Min: 16.00Q1: 26.00Med: 40.00Q3: 57.00Max: 75.00
married_couple — Distribution
married_couple — Box Plot
Min: 0.00Q1: 0.00Med: 0.00Q3: 0.00Max: 0.00
C_previous — Distribution
C_previous — Box Plot
Min: 1.00Q1: 1.00Med: 3.00Q3: 3.00Max: 4.00
duration_previous — Distribution
duration_previous — Box Plot
Min: 0.00Q1: 2.00Med: 5.00Q3: 9.00Max: 15.00
A — Distribution
A — Box Plot
Min: 1.00Q1: 1.00Med: 1.00Q3: 1.00Max: 1.00
B — Distribution
B — Box Plot
Min: 0.00Q1: 0.00Med: 0.00Q3: 1.00Max: 1.00
C — Distribution
C — Box Plot
Min: 1.00Q1: 1.00Med: 2.00Q3: 3.00Max: 4.00
D — Distribution
D — Box Plot
Min: 1.00Q1: 2.00Med: 3.00Q3: 3.00Max: 3.00
E — Distribution
E — Box Plot
Min: 0.00Q1: 0.00Med: 0.00Q3: 1.00Max: 1.00
F — Distribution
F — Box Plot
Min: 0.00Q1: 0.00Med: 1.00Q3: 2.00Max: 3.00
G — Distribution
G — Box Plot
Min: 1.00Q1: 2.00Med: 2.00Q3: 3.00Max: 4.00
cost — Distribution
cost — Box Plot
Min: 518.00Q1: 605.00Med: 634.00Q3: 663.00Max: 750.00
age_difference — Distribution
age_difference — Box Plot
Min: 0.00Q1: 0.00Med: 0.00Q3: 0.00Max: 0.00
total_products — Distribution
total_products — Box Plot
Min: 5.00Q1: 9.00Med: 10.00Q3: 12.00Max: 16.00
Summary Statistics
| Column | Count | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|
| shopping_pt | 97,009 | 6.86 | 2.00 | 3.00 | 7.00 | 12.50 |
| record_type | 97,009 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 |
| day | 97,009 | 2.08 | 1.47 | 0.00 | 2.00 | 6.00 |
| location | 97,009 | 12272.87 | 1564.61 | 10001.00 | 12031.00 | 16580.00 |
| group_size | 97,009 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 |
| homeowner | 97,009 | 0.55 | 0.50 | 0.00 | 1.00 | 1.00 |
| car_age | 97,009 | 8.12 | 5.53 | 0.00 | 8.00 | 25.50 |
| risk_factor | 97,009 | 2.72 | 0.92 | 1.00 | 3.00 | 4.00 |
| age_oldest | 97,009 | 45.18 | 17.39 | 18.00 | 44.00 | 75.00 |
| age_youngest | 97,009 | 42.68 | 17.49 | 16.00 | 40.00 | 75.00 |
| married_couple | 97,009 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| C_previous | 97,009 | 2.46 | 1.03 | 1.00 | 3.00 | 4.00 |
| duration_previous | 97,009 | 6.08 | 4.68 | 0.00 | 5.00 | 15.00 |
| A | 97,009 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 |
| B | 97,009 | 0.48 | 0.50 | 0.00 | 0.00 | 1.00 |
| C | 97,009 | 2.29 | 1.00 | 1.00 | 2.00 | 4.00 |
| D | 97,009 | 2.52 | 0.71 | 1.00 | 3.00 | 3.00 |
| E | 97,009 | 0.46 | 0.50 | 0.00 | 0.00 | 1.00 |
| F | 97,009 | 1.17 | 0.95 | 0.00 | 1.00 | 3.00 |
| G | 97,009 | 2.28 | 0.89 | 1.00 | 2.00 | 4.00 |
| cost | 97,009 | 634.52 | 42.37 | 518.00 | 634.00 | 750.00 |
| age_difference | 97,009 | 2.50 | 7.38 | 0.00 | 0.00 | 59.00 |
| total_products | 97,009 | 10.19 | 2.50 | 4.00 | 10.00 | 17.00 |
Correlations
Pearson Correlation Matrix
-1+1
Top Correlated Pairs
| Variable 1 | Variable 2 | Correlation |
|---|---|---|
| age_oldest | age_youngest | 0.9105 |
| C_previous | C | 0.6998 |
| C | total_products | 0.6734 |
| D | total_products | 0.6364 |
| C | D | 0.6083 |
| G | total_products | 0.5428 |
| E | total_products | 0.5169 |
| C_previous | total_products | 0.4664 |
| F | total_products | 0.4607 |
| C_previous | D | 0.4432 |
Cramer's V (Categorical Associations)
| Variable 1 | Variable 2 | Cramer's V |
|---|---|---|
| state | car_value | 0.0474 |
Data Preprocessing
Missing Values Filled
36,018
Outliers Capped
80,985
Features Engineered
2
Columns Processed
23
Missing Values: Before vs After
Imputation Log
| Column | Strategy | Fill Value | Count |
|---|---|---|---|
| risk_factor | median | 3 | 34,346 |
| C_previous | median | 3 | 836 |
| duration_previous | median | 5 | 836 |
Engineered Features
age_difference60 unique values
age_oldest - age_youngest
total_products14 unique values
Sum of product selections A-G
Purchase Prediction Model
Best Model
Random Forest
Accuracy
70.5%
Precision
69.2%
Recall
70.5%
F1 Score
60.4%
Model Comparison (5-Fold Cross-Validation)
| Model | Accuracy | Precision | Recall | F1 | ROC-AUC | CV Mean | CV Std |
|---|---|---|---|---|---|---|---|
| Random ForestBest | 70.5% | 69.2% | 70.5% | 60.4% | 61.4% | 69.9% | 0.0019 |
| Gradient Boosting | 70.3% | 66.4% | 70.3% | 62.7% | 61.1% | 69.3% | 0.0046 |
| XGBoost | 69.3% | 64.5% | 69.3% | 63.8% | 60.1% | 68.1% | 0.0049 |
| Logistic Regression | 69.8% | 64.8% | 69.8% | 59.5% | 59.8% | 69.9% | 0.0011 |
Confusion Matrix
Target: Plan Changed (last quote vs purchase)
No Change (0)
Changed (1)
No Change (0)
2,636
31
Changed (1)
1,097
60
Predicted