Insurance Product Purchase Prediction

Overview

Rows

97,009

Columns

25

Missing Values

0%

0 total

Duplicate Rows

8

27.11 MB in memory

Data Quality

Missing Values by Column

No missing values detected

Outlier Summary

ColumnOutliers%Lower BoundUpper Bound
age_difference22,49023.18%0.000.00
total_products1,0421.07%4.5016.50

Distributions

shopping_pt — Distribution

shopping_pt — Box Plot

Min: 3.00Q1: 5.00Med: 7.00Q3: 8.00Max: 12.50

record_type — Distribution

record_type — Box Plot

Min: 1.00Q1: 1.00Med: 1.00Q3: 1.00Max: 1.00

day — Distribution

day — Box Plot

Min: 0.00Q1: 1.00Med: 2.00Q3: 3.00Max: 6.00

location — Distribution

location — Box Plot

Min: 10001.00Q1: 10937.00Med: 12031.00Q3: 13429.00Max: 16580.00

group_size — Distribution

group_size — Box Plot

Min: 1.00Q1: 1.00Med: 1.00Q3: 1.00Max: 1.00

homeowner — Distribution

homeowner — Box Plot

Min: 0.00Q1: 0.00Med: 1.00Q3: 1.00Max: 1.00

car_age — Distribution

car_age — Box Plot

Min: 0.00Q1: 3.00Med: 8.00Q3: 12.00Max: 25.50

risk_factor — Distribution

risk_factor — Box Plot

Min: 1.00Q1: 2.00Med: 3.00Q3: 3.00Max: 4.00

age_oldest — Distribution

age_oldest — Box Plot

Min: 18.00Q1: 29.00Med: 44.00Q3: 60.00Max: 75.00

age_youngest — Distribution

age_youngest — Box Plot

Min: 16.00Q1: 26.00Med: 40.00Q3: 57.00Max: 75.00

married_couple — Distribution

married_couple — Box Plot

Min: 0.00Q1: 0.00Med: 0.00Q3: 0.00Max: 0.00

C_previous — Distribution

C_previous — Box Plot

Min: 1.00Q1: 1.00Med: 3.00Q3: 3.00Max: 4.00

duration_previous — Distribution

duration_previous — Box Plot

Min: 0.00Q1: 2.00Med: 5.00Q3: 9.00Max: 15.00

A — Distribution

A — Box Plot

Min: 1.00Q1: 1.00Med: 1.00Q3: 1.00Max: 1.00

B — Distribution

B — Box Plot

Min: 0.00Q1: 0.00Med: 0.00Q3: 1.00Max: 1.00

C — Distribution

C — Box Plot

Min: 1.00Q1: 1.00Med: 2.00Q3: 3.00Max: 4.00

D — Distribution

D — Box Plot

Min: 1.00Q1: 2.00Med: 3.00Q3: 3.00Max: 3.00

E — Distribution

E — Box Plot

Min: 0.00Q1: 0.00Med: 0.00Q3: 1.00Max: 1.00

F — Distribution

F — Box Plot

Min: 0.00Q1: 0.00Med: 1.00Q3: 2.00Max: 3.00

G — Distribution

G — Box Plot

Min: 1.00Q1: 2.00Med: 2.00Q3: 3.00Max: 4.00

cost — Distribution

cost — Box Plot

Min: 518.00Q1: 605.00Med: 634.00Q3: 663.00Max: 750.00

age_difference — Distribution

age_difference — Box Plot

Min: 0.00Q1: 0.00Med: 0.00Q3: 0.00Max: 0.00

total_products — Distribution

total_products — Box Plot

Min: 5.00Q1: 9.00Med: 10.00Q3: 12.00Max: 16.00

Summary Statistics

ColumnCountMeanStdMinMedianMax
shopping_pt97,0096.862.003.007.0012.50
record_type97,0091.000.001.001.001.00
day97,0092.081.470.002.006.00
location97,00912272.871564.6110001.0012031.0016580.00
group_size97,0091.000.001.001.001.00
homeowner97,0090.550.500.001.001.00
car_age97,0098.125.530.008.0025.50
risk_factor97,0092.720.921.003.004.00
age_oldest97,00945.1817.3918.0044.0075.00
age_youngest97,00942.6817.4916.0040.0075.00
married_couple97,0090.000.000.000.000.00
C_previous97,0092.461.031.003.004.00
duration_previous97,0096.084.680.005.0015.00
A97,0091.000.001.001.001.00
B97,0090.480.500.000.001.00
C97,0092.291.001.002.004.00
D97,0092.520.711.003.003.00
E97,0090.460.500.000.001.00
F97,0091.170.950.001.003.00
G97,0092.280.891.002.004.00
cost97,009634.5242.37518.00634.00750.00
age_difference97,0092.507.380.000.0059.00
total_products97,00910.192.504.0010.0017.00

Correlations

Pearson Correlation Matrix

shopping_ptrecord_typedaylocationgroup_sizehomeownercar_agerisk_factorage_oldestage_youngestmarried_coupleC_previousduration_pre…ABCDEFGcostage_differencetotal_productsshopping_ptrecord_typedaylocationgroup_sizehomeownercar_agerisk_factorage_oldestage_youngestmarried_coupleC_previousduration_pre…ABCDEFGcostage_differencetotal_products
-1
+1

Top Correlated Pairs

Variable 1Variable 2Correlation
age_oldestage_youngest0.9105
C_previousC0.6998
Ctotal_products0.6734
Dtotal_products0.6364
CD0.6083
Gtotal_products0.5428
Etotal_products0.5169
C_previoustotal_products0.4664
Ftotal_products0.4607
C_previousD0.4432

Cramer's V (Categorical Associations)

Variable 1Variable 2Cramer's V
statecar_value0.0474

Data Preprocessing

Missing Values Filled

36,018

Outliers Capped

80,985

Features Engineered

2

Columns Processed

23

Missing Values: Before vs After

Imputation Log

ColumnStrategyFill ValueCount
risk_factormedian334,346
C_previousmedian3836
duration_previousmedian5836

Engineered Features

age_difference60 unique values

age_oldest - age_youngest

total_products14 unique values

Sum of product selections A-G

Purchase Prediction Model

Best Model

Random Forest

Accuracy

70.5%

Precision

69.2%

Recall

70.5%

F1 Score

60.4%

Model Comparison (5-Fold Cross-Validation)

ModelAccuracyPrecisionRecallF1ROC-AUCCV MeanCV Std
Random ForestBest70.5%69.2%70.5%60.4%61.4%69.9%0.0019
Gradient Boosting70.3%66.4%70.3%62.7%61.1%69.3%0.0046
XGBoost69.3%64.5%69.3%63.8%60.1%68.1%0.0049
Logistic Regression69.8%64.8%69.8%59.5%59.8%69.9%0.0011

Confusion Matrix

Target: Plan Changed (last quote vs purchase)

No Change (0)
Changed (1)
No Change (0)
2,636
31
Changed (1)
1,097
60
Predicted

Feature Importance