Analysis Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| n_rounds | 150 | n_rounds |
| max_depth | 6 | max_depth |
| learning_rate | 0.1 | learning_rate |
| subsample | 0.8 | subsample |
| colsample_bytree | 0.8 | colsample_bytree |
| early_stopping | 20 | early_stopping |
| threshold | 0.5 | threshold |
| test_size | 0.2 | test_size |
| n_top_countries | 8 | n_top_countries |
Purpose
This XGBoost analysis predicts high-value retail transactions using 13 features across 48,548 observations. The model incorporates SHAP explainability to understand feature contributions, enabling both predictive accuracy and interpretability for business decision-making.
Key Findings
- Perfect Performance Metrics: AUC-ROC, accuracy, precision, recall, and F1-score all equal 1.0, indicating flawless classification on the test set with only 2 false positives and 0 false negatives across 9,709 test observations.
- Dominant Features:
qty_capped(gain=0.6, SHAP=5.45) andlog_unit_price(gain=0.38, SHAP=3.32) drive 98% of model decisions; remaining 11 features contribute negligibly. - Balanced Dataset: Class distribution is nearly perfect (49.8% positive vs. 50.2% negative), eliminating bias concerns.
- Optimal Convergence: Model stabilized at iteration 150 with learning rate 0.1 and max depth 6.
Interpretation
The model achieves exceptional predictive power by isolating two transaction-level attributes—quantity and unit price—as primary value indicators. Geographic and temporal features (country, hour
Data preprocessing and column mapping
Purpose
This section documents the data cleaning and preparation phase that precedes the XGBoost classification model. Understanding preprocessing quality is critical because data loss and transformation decisions directly impact model training stability, generalization performance, and the reliability of business conclusions drawn from the analysis.
Key Findings
- Retention Rate: 97.1% - A high proportion of the original dataset was preserved, indicating minimal data loss during cleaning
- Rows Removed: 1,452 observations (2.9%) were excluded, suggesting moderate filtering for data quality issues
- Final Dataset Size: 48,548 rows provided sufficient volume for training (38,839) and testing (9,709) with balanced class distribution (49.8% positive cases)
Interpretation
The preprocessing retained nearly all observations, which supports the model's ability to achieve perfect classification metrics (AUC-ROC = 1.0, Accuracy = 1.0). The 1,452 removed rows likely contained missing values, outliers, or invalid entries that could have introduced noise. This conservative cleaning approach preserved statistical power while maintaining data integrity, enabling the model to learn robust patterns from the 13 features without excessive information loss.
Context
The train-test split details are not explicitly documented in the preprocessing section, though the overall metrics confirm an 80/20 allocation. The high retention rate combined with perfect model performance
Executive Summary
Executive summary — XGBoost classification results
| finding | value |
|---|---|
| Model Performance | AUC=1.000 (excellent) |
| Top Predictive Feature | qty_capped |
| Classification Threshold | 0.5 (Accuracy: 100.0%) |
| Training Convergence | Best round: 150 |
| Class Balance | 49.8% high-value transactions |
| Generalization | Model generalizes well (train AUC: 1, test AUC: 1). |
Key Findings:
• Model performance: excellent (AUC = 1.000)
• Top feature: qty_capped drives predictions most
• Accuracy at threshold 0.5: 100.0%
• Best round: 150 (early stopping)
• Model generalizes well (train AUC: 1, test AUC: 1).
Recommendation: Focus marketing and inventory on transactions featuring 'qty_capped' characteristics. Use SHAP slide to identify the most actionable business levers for targeting high-value customers.
EXECUTIVE SUMMARY
Purpose
This section synthesizes the XGBoost classification model's performance on transaction value prediction. The analysis evaluates whether the model successfully identifies high-value transactions and is ready for operational deployment, directly supporting revenue optimization and customer targeting objectives.
Key Findings
- AUC-ROC: 1.000 – Perfect discrimination between high and low-value transactions across all classification thresholds
- Accuracy: 99.98% – Model correctly classifies 9,709 test transactions with only 2 false positives and 0 false negatives
- Precision & Recall: Both 1.000 – No trade-off between false positives and false negatives; captures all high-value cases
- Feature Dominance:
qty_capped(gain=0.60, SHAP=5.45) andlog_unit_price(gain=0.38, SHAP=3.32) drive 98% of predictive power - Model Stability: Train and test AUC both equal 1.0, indicating zero overfitting across 150 boosting rounds
Interpretation
The model achieves exceptional predictive performance with perfect separation of transaction classes. The near-zero false negative rate (0 missed high-value transactions) and minimal false positive rate (2
Feature Importance (Gain)
XGBoost feature importance by normalized Gain
Purpose
This section identifies which features contribute most to the model's decision-making through gain-based importance. Gain measures the information value each feature provides when splitting data in the boosting trees. Understanding feature importance reveals which transaction attributes are most predictive of high-value versus low-value classifications.
Key Findings
- qty_capped dominance: 60.3% of total gain—overwhelmingly the strongest predictor of transaction value classification
- log_unit_price secondary importance: 38% gain, the second-most influential feature with comparable coverage (0.37) and frequency (0.37)
- Geographic features negligible: Country-based features (Cyprus, Netherlands, France, Germany, Spain, Portugal) contribute zero gain, indicating geographic location does not meaningfully distinguish transaction value
- Temporal features minimal: hour_of_day and day_of_week show minimal gain (0.01 and 0), suggesting timing is not a strong classifier
Interpretation
The model relies almost exclusively on quantity and unit price to classify transactions. The extreme concentration in qty_capped (60.3%) indicates this single feature carries the majority of predictive power. The near-zero contributions from geographic and temporal features suggest the transaction value classification is fundamentally driven by product-level characteristics rather than when or where transactions occur. This aligns with the model's perfect performance (AUC
SHAP Feature Importance
SHAP (Shapley) feature importance — model-agnostic explanation
Purpose
SHAP values provide model-agnostic explanations of how individual features drive predictions, accounting for feature correlations. This section reveals which variables most strongly influence the XGBoost classifier's decisions to classify transactions as high-value or low-value, complementing tree-based gain metrics with a theoretically sound attribution method.
Key Findings
- qty_capped (Mean Abs SHAP: 5.45): Dominates prediction influence with 61% normalized importance, far exceeding all other features and serving as the primary decision driver
- log_unit_price (Mean Abs SHAP: 3.32): Secondary predictor with 37% normalized importance, showing consistent predictive power
- Remaining Features: hour_of_day, country_United_Kingdom, and day_of_week contribute minimally (≤0.08 SHAP); eight features show zero impact
- Concentration Pattern: Two features account for ~98% of total predictive influence, indicating a highly focused decision boundary
Interpretation
The model's perfect performance (AUC=1.0, Accuracy=1.0) is driven almost entirely by transaction quantity and unit price. These features create a clear separation between high-value and low-value transactions, while temporal and geographic dimensions provide negligible marginal contribution. This aligns with the balanced
Learning Curves
Training vs test log-loss by boosting round
Purpose
This section tracks model performance improvement across 150 boosting iterations, showing how log-loss decreases as the XGBoost ensemble adds sequential trees. Learning curves validate that the model generalizes well by comparing training and test performance, ensuring the model hasn't overfit despite achieving perfect classification metrics.
Key Findings
- Best Round: 150 - Early stopping halted training at this iteration, indicating convergence to optimal performance
- Train AUC: 1.000 - Training set achieved perfect discrimination between classes
- Test AUC: 1.000 - Test set matched training performance, demonstrating strong generalization
- Curve Convergence: Train and test curves align closely throughout iterations, with both reaching near-zero loss by round 150, indicating minimal overfitting risk
Interpretation
The model exhibits exceptional learning dynamics: initial log-loss of ~0.65 on training data drops rapidly within the first few iterations, stabilizing near zero by round 150. The parallel trajectory of train and test curves suggests the model learned generalizable patterns rather than memorizing training data. Perfect AUC scores on both sets indicate the classifier achieves flawless separation of high-value and low-value transactions.
Context
These results assume the test set is representative of production data and that the 48,548 samples retained after preprocessing are sufficient for reliable curve estimation
ROC Curve
ROC curve — AUC = 1.000
Purpose
This section evaluates the XGBoost model's ability to discriminate between high-value and low-value transactions across all classification thresholds. The ROC curve and AUC metric directly measure classification performance, which is central to assessing whether the model reliably identifies transaction patterns for business decision-making.
Key Findings
- AUC-ROC: 1.000 — Perfect discrimination between positive and negative classes across all thresholds
- Train AUC: 1.000 — Training and test performance are identical, indicating no overfitting
- Accuracy at Threshold 0.5: 100.0% — All 9,709 test samples correctly classified (4,831 true positives, 4,876 true negatives, only 2 false positives, 0 false negatives)
- F1 Score: 1.000 — Perfect balance between precision and recall
Interpretation
The model achieves exceptional performance with zero classification error on the test set. The ROC curve reaches the top-left corner (TPR=1, FPR≈0), indicating the model separates classes nearly perfectly at optimal thresholds. The alignment between train and test AUC suggests the model generalizes well without overfitting, despite using 13 features with only 2 dominant predictors
Confusion Matrix
Confusion matrix — classification results at chosen threshold
Purpose
The confusion matrix quantifies classification performance at the 0.5 decision threshold, showing how well the XGBoost model distinguishes between high-value and low-value transactions. This section is critical for assessing whether the model's predictive accuracy translates into reliable real-world decision-making for revenue classification.
Key Findings
- True Positives (TP): 4,831 high-value transactions correctly identified (49.8% of test set)
- True Negatives (TN): 4,876 low-value transactions correctly rejected (50.2% of test set)
- False Positives (FP): 2 low-value cases misclassified as high-value (0.02% error rate)
- False Negatives (FN): 0 high-value cases missed (perfect recall)
- Precision & Recall: Both equal 1.0, indicating zero trade-off between catching all positives and avoiding false alarms
Interpretation
The model achieves near-perfect classification with only 2 false positives across 9,709 test cases. The zero false negatives mean no revenue-generating transactions are missed, while the minimal false positive rate prevents unnecessary resource allocation to low-value customers. This exceptional performance suggests the model has learned highly discriminative patterns
Model Performance Metrics
Complete classification performance metrics
| metric | value |
|---|---|
| AUC-ROC | 1 |
| Accuracy | 1 |
| Precision | 1 |
| Recall | 1 |
| F1 Score | 1 |
| Best Round | 150 |
| Train AUC | 1 |
| Threshold | 0.5 |
| feature | gain | cover | frequency | mean_abs_shap |
|---|---|---|---|---|
| qty_capped | 0.6026 | 0.3928 | 0.3386 | 5.445 |
| log_unit_price | 0.3792 | 0.3676 | 0.3671 | 3.317 |
| hour_of_day | 0.0076 | 0.1245 | 0.1495 | 0.0754 |
| country_United_Kingdom | 0.0062 | 0.0411 | 0.0197 | 0.0536 |
| day_of_week | 0.0022 | 0.0334 | 0.0907 | 0.0339 |
| country_EIRE | 0.001 | 0.0174 | 0.0069 | 0.0072 |
| month_num | 0.0006 | 0.0135 | 0.0168 | 0.0113 |
| country_Cyprus | 0.0002 | 0.0035 | 0.0015 | 0.0004 |
| country_Netherlands | 0.0002 | 0.0005 | 0.0018 | 0.0012 |
| country_France | 0.0001 | 0.0025 | 0.0022 | 0.0007 |
| country_Germany | 0.0001 | 0.0027 | 0.0026 | 0.0008 |
| country_Spain | 0.0001 | 0.0005 | 0.0026 | 0.0007 |
Purpose
This section summarizes the XGBoost classifier's predictive performance across all key evaluation metrics at a 0.5 decision threshold. It provides a comprehensive view of how well the model distinguishes between high-value and low-value transactions, serving as the primary indicator of model quality and reliability for deployment decisions.
Key Findings
- AUC-ROC: 1.000 – Perfect discrimination between classes; the model separates positive and negative cases with no overlap across all probability thresholds
- Accuracy: 100.0% – All 9,709 test predictions are correct, with only 2 false positives and 0 false negatives
- Precision & Recall: Both 1.000 – No false positives or false negatives; the model achieves perfect balance between avoiding false alarms and catching all true cases
- Feature Dominance: qty_capped (gain=0.6, SHAP=5.45) and log_unit_price (gain=0.38, SHAP=3.32) drive nearly all predictive power; remaining 10 features contribute negligibly
Interpretation
The model exhibits exceptional performance across all standard classification metrics, indicating near-perfect separation of transaction value classes. The confusion matrix shows 4,876 true negatives and 4,831