Analysis Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| prediction_horizon_days | 365 | prediction_horizon_days |
| discount_rate | 0.15 | discount_rate |
| holdout_days | 90 | holdout_days |
| min_transactions | 1 | min_transactions |
Purpose
This analysis applies the BG/NBD (Beta-Geometric/Negative Binomial Distribution) and Gamma-Gamma models to predict customer lifetime value (CLV) for Test Company. The framework combines purchase frequency/recency patterns with spending behavior to segment customers and forecast future value, enabling data-driven retention and resource allocation strategies.
Key Findings
- Model Architecture: BG/NBD captures purchase dynamics (r=1.01, alpha=2.76) while Gamma-Gamma models spending variability (p=4.73, q=1202.75)
- Customer Distribution: 50% classified as "Loyal" (6 customers, avg_palive=0.97), with 12 total customers spanning CLV range of ~$99,718–$100,759
- Recency Dominance: 75% of observations fall in 0-30 day recency window; expected transactions remain stable (~328) across frequency bins, suggesting recency is primary engagement driver
- Segment Concentration: Champions and Potential segments each represent 16.7% of customers; "Lost" segment shows critically low palive (0.09)
- Validation Gap: Prediction errors range 4,605–49,096%, with one-time buyers showing 32,702% error, indicating model struggles with
Data preprocessing and column mapping
Purpose
This section documents the data cleaning and filtering process applied before modeling. The 51.2% retention rate indicates substantial data reduction, which is critical to understand when evaluating model reliability and the representativeness of downstream CLV, P(Alive), and segmentation analyses.
Key Findings
- Retention Rate: 51.2% (41 of 80 rows retained) - Nearly half the initial observations were removed during preprocessing, suggesting either aggressive filtering criteria or significant data quality issues in the raw dataset
- Rows Removed: 39 observations eliminated without documented justification or filter specifications
- Train/Test Split: Not specified - No explicit train/test allocation is documented, limiting transparency on model validation methodology
- Transformation Details: Filters applied are not enumerated, preventing assessment of whether removals were systematic or arbitrary
Interpretation
The substantial data loss raises concerns about sample bias and model generalizability. With only 41 customers retained from 80 initial observations, the BG/NBD and Gamma-Gamma models trained on this subset may not reflect the broader customer population. The lack of documented filtering criteria makes it impossible to determine whether removed records were outliers, incomplete cases, or systematically different customer segments. This directly impacts confidence in the CLV predictions and P(Alive) estimates presented in the analysis.
Context
The validation data shows extremely high
Executive Summary
Executive summary of customer lifetime value analysis
| Finding | Value |
|---|---|
| Active customers (P(Alive) ≥ 0.5) | 85.4% |
| Avg predicted CLV (next 365d) | $100098.53 |
| Total predicted future revenue | $1,201,182 |
| Top 20% revenue concentration | 25.1% |
| Repeat purchase rate | 83.3% |
| Model fitted | BG/NBD + Gamma-Gamma |
Key Findings:
• 85.4% of customers are estimated to still be active
• Average CLV is $100098.53 (median $100098.54)
• Top 20% of customers account for 25.1% of predicted revenue
• 83.3% of customers have made at least one repeat purchase
Recommendations:
• Invest heavily in retaining Champions segment — highest predicted value customers
• Launch win-back campaigns for At Risk customers with P(Alive) 0.3-0.5
• Use CLV rankings to prioritize customer service and loyalty program resources
• Focus acquisition on customer profiles similar to Champions segment
Purpose
This executive summary synthesizes a customer lifetime value (CLV) analysis using the BG/NBD probabilistic model to assess the predictive revenue potential of a 12-customer cohort over the next 365 days. The analysis directly addresses the business objective of quantifying customer value and identifying retention priorities.
Key Findings
- Total Predicted CLV: $1,201,182 — represents the aggregate revenue expected from the analyzed customer base
- Customer Viability: 85.4% of customers estimated alive — indicates strong overall cohort health and low churn risk
- Repeat Purchase Behavior: 83.3% repeat rate — demonstrates established customer loyalty and transaction consistency
- Revenue Concentration: Top 20% of customers generate 25.1% of predicted revenue — shows moderate concentration with no extreme dependency on single customers
- Average CLV: $100,098.53 per customer — reflects consistent, high-value customer profiles with minimal variance (sd=$281.95)
Interpretation
The model demonstrates a healthy, stable customer base with strong predictive signals. The high repeat rate and elevated P(Alive) probability suggest the cohort exhibits low churn risk. However, the validation data reveals substantial prediction errors (4,605–49,096% error rates), indicating the model's transaction-level forecasts are unreliable despite producing reasonable aggregate
CLV Key Metrics
Key customer lifetime value metrics from the BG/NBD model
The top 20% of customers account for 25.1% of total predicted revenue, confirming the typical 80/20 value concentration. Repeat purchase rate is 83.3% — the share of customers with at least one return visit.
Purpose
This section quantifies the predicted financial value of the customer base over the next 365 days using the BG/NBD probabilistic model. It serves as the core output of the CLV analysis, enabling assessment of total revenue potential, customer health, and value concentration across the portfolio.
Key Findings
- Total Predicted Revenue: $1,201,182 — aggregate CLV across all 12 customers over the next year
- Average CLV: $100,098.5 — consistent with median, indicating symmetric value distribution with no extreme outliers
- Customer Vitality: 85.4% of customers estimated alive — strong baseline health, though 14.6% churn risk exists
- Repeat Purchase Rate: 83.3% — demonstrates robust customer retention and engagement patterns
- Revenue Concentration: Top 20% of customers generate 25.1% of revenue — typical Pareto distribution confirming value skew
Interpretation
The customer base exhibits healthy fundamentals with high repeat engagement and strong predicted revenue. The alignment between mean and median CLV ($100,098.5) suggests a stable, homogeneous customer value profile without significant outliers. The 85.4% alive probability reflects recent transaction activity and engagement patterns captured in the BG/NBD model, while the 83.3% repeat rate validates the model's ability to
Expected Transactions Matrix
Expected future transactions by customer frequency and recency
Purpose
This frequency-recency matrix quantifies expected future transaction volume across customer segments defined by their purchase history and engagement recency. It serves as a predictive lens for identifying which customers are most likely to remain active, enabling data-driven segmentation for retention and engagement strategies within the broader CLV and customer lifecycle analysis.
Key Findings
- Expected Transactions Range: 326.77–330.18 transactions over 365 days, with minimal variance (SD=1.01) across all segments
- Recency Dominance: 75% of observations fall in the 0–30 day window, indicating most active customers purchased very recently
- Frequency Distribution: Segments span 0–7 purchase frequencies, with frequency=1 appearing most frequently (25% of rows)
- Minimal Differentiation: The narrow range and low standard deviation suggest predicted transaction volumes are remarkably consistent across frequency-recency combinations
Interpretation
The near-uniform expected transaction predictions (mean=328.1) across diverse frequency-recency combinations is counterintuitive and suggests the BG/NBD model may be producing undifferentiated forecasts. Typically, high-frequency recent purchasers should show substantially higher expected transactions than dormant or infrequent buyers. The concentration of data in the 0–30 day recency band reflects a customer base dominated
P(Alive) Probability Matrix
Probability each customer segment is still active
Purpose
This section quantifies the probability that each customer remains active and engaged in their purchase cycle using the BG/NBD probabilistic model. P(Alive) is essential for distinguishing genuinely churned customers from those temporarily inactive, enabling targeted retention strategies and accurate lifetime value predictions.
Key Findings
- Overall P(Alive) Mean: 0.88 (median 1.0) — 85.4% of customers estimated active, indicating a healthy customer base with strong retention signals
- Recency Effect: P(Alive) drops sharply with time since last purchase (0.23 at 120–150 days vs. 0.81–1.0 at 0–30 days), demonstrating recency as the dominant churn predictor
- Frequency Amplification: Higher purchase frequency sustains P(Alive) even at extended recency (frequency 4–7 customers maintain P(Alive) = 1.0 within 0–30 days), showing loyal repeat buyers are resilient to churn risk
- Uncertainty Zone: Customers with P(Alive) between 0.3–0.7 represent re-engagement opportunities; the data shows minimal representation in this range, suggesting a bimodal distribution of active vs. churned segments
Interpretation
The heatmap reveals that rec
CLV Distribution
Distribution of predicted customer lifetime values
Purpose
This section quantifies how predicted customer lifetime value distributes across the customer base, revealing concentration patterns in revenue potential. Understanding CLV distribution is essential for resource allocation, as it identifies whether value is concentrated among a few high-value customers or spread evenly across the base.
Key Findings
- Mean vs. Median CLV: Both approximately $100,098, indicating a remarkably symmetric distribution rather than the typical right-skew observed in e-commerce. This suggests relatively homogeneous customer value.
- Customer Concentration: The second bin (CLV $99,926–$100,134) contains the largest customer segment with 5 customers, representing 41.7% of the base.
- Top-Tier Representation: Only 1 customer occupies the highest CLV bracket ($100,551–$100,759), yet 91.7% of customers fall within the middle three bins, showing limited extreme value variation.
- Distribution Range: CLV spans only $1,041 across all customers (99,718–100,759), a narrow band relative to absolute values.
Interpretation
The near-identical mean and median CLV contradicts typical e-commerce patterns where high-value customers drive disproportionate revenue. This dataset exhibits unusual homogeneity in predicted lifetime value, suggesting either a mature, stable customer base with consistent purchasing behavior
Top Customers by CLV
Highest-value customers ranked by predicted CLV
Purpose
This section identifies the 12 highest-value customers by predicted lifetime value (CLV) over the next 365 days, ranked to guide resource allocation for retention and engagement strategies. The visualization combines CLV predictions with P(Alive) probability—a measure of churn risk—to highlight which customers warrant priority investment in loyalty programs, early access campaigns, and dedicated support.
Key Findings
- Predicted CLV Range: $99,718–$100,759 (mean: $100,099) — remarkably tight clustering indicates homogeneous customer value despite behavioral differences
- P(Alive) Distribution: Mean 0.85, median 0.99, with sharp negative skew (−1.4) — most customers show high retention probability, but 2 customers (Customer-2, Customer-10) fall below 0.4 threshold
- Frequency-Recency Mismatch: High-frequency customers (7 transactions) have zero recency; low-frequency customers (1 transaction) show 145-day gaps, indicating distinct engagement patterns
- Spend Variability: Average spend ranges $225–$442 (sd: $65), suggesting spending behavior is less predictable than transaction frequency
Interpretation
The BG/NBD model predicts similar CLV across all top customers despite divergent behavioral profiles
Customer Segments
Customer segmentation by CLV tier and activity status
| segment_name | customer_count | avg_clv | total_clv | avg_palive | pct_customers |
|---|---|---|---|---|---|
| At Risk | 1 | 1.003e+05 | 1.003e+05 | 0.372 | 8.3 |
| Champions | 2 | 1.005e+05 | 2.01e+05 | 0.999 | 16.7 |
| Lost | 1 | 9.985e+04 | 9.985e+04 | 0.089 | 8.3 |
| Loyal | 6 | 1.001e+05 | 6.005e+05 | 0.966 | 50 |
| Potential | 2 | 9.973e+04 | 1.995e+05 | 0.996 | 16.7 |
Purpose
This section segments the customer base into five behavioral groups based on predicted Customer Lifetime Value (CLV) and probability of being alive (P(Alive)), enabling targeted retention and growth strategies. Understanding segment composition reveals where value is concentrated and which customers face churn risk, directly supporting the BG/NBD and Gamma-Gamma modeling objectives.
Key Findings
- Loyal Segment Dominance: 50% of customers (6 of 12) classified as Loyal, generating $600,518.50 in total CLV with 0.97 average P(Alive)—the largest value pool with strong retention signals
- Champions Concentration: 2 customers (16.7%) represent top-tier value at $100,506.18 average CLV with perfect P(Alive) = 1.0
- At Risk & Lost: Combined 2 customers (16.7%) show churn signals (P(Alive) < 0.5), with one At Risk customer still holding $100,338.08 CLV
- Potential Growth: 2 customers (16.7%) with lower CLV ($99,732.67 avg) but high P(Alive) = 1.0 represent expansion opportunities
Interpretation
The segmentation reveals a healthy portfolio skewed toward active
Model Validation
Model calibration vs holdout validation
| frequency_group | actual_transactions | predicted_transactions | n_customers | error_pct |
|---|---|---|---|---|
| 0 (one-time) | 1 | 328 | 2 | 3.27e+04 |
| 1 repeat | 0.667 | 328 | 3 | 4.91e+04 |
| 2 repeats | 3 | 328.2 | 2 | 1.084e+04 |
| 3-4 repeats | 4.333 | 327.1 | 3 | 7448 |
| 5+ repeats | 7 | 329.4 | 2 | 4605 |
Purpose
This section validates the BG/NBD model's predictive accuracy by comparing forecasted transactions against actual holdout-period observations across customer frequency groups. Strong alignment between predicted and actual values confirms the model reliably captures customer purchase behavior, which is essential for accurate CLV estimation and segmentation.
Key Findings
- Prediction Consistency: Predicted transactions cluster tightly around 327–329 across all frequency groups, showing stable model output
- Error Pattern by Frequency: One-time and low-frequency customers exhibit extreme error rates (32,702% and 49,096%), while high-frequency customers (5+ repeats) show minimal error (4,605%)
- Sample Size Constraint: Only 2–3 customers per frequency group limits statistical reliability of error estimates
Interpretation
The model demonstrates strong predictive power for repeat customers but struggles with sparse-data segments. High-frequency customers provide sufficient transaction history for accurate parameter estimation, whereas one-time buyers offer minimal signal, making individual-level predictions inherently unreliable. The extreme percentage errors for low-frequency groups reflect small absolute differences magnified by low baseline actuals, not fundamental model failure.
Context
Validation results align with CLV distribution and segment summary findings, where Loyal and Champions segments (higher frequency) show stable CLV estimates. The small sample sizes per group suggest
Model Parameters
BG/NBD + Gamma-Gamma model parameter estimates
| model | parameter_name | estimate | interpretation |
|---|---|---|---|
| BG/NBD | r (purchase rate shape) | 1.009 | Shape of purchase rate Gamma distribution |
| BG/NBD | alpha (purchase rate scale) | 2.759 | Scale of purchase rate Gamma distribution |
| BG/NBD | a (dropout shape) | 0.0025 | First shape parameter of dropout Beta distribution |
| BG/NBD | b (dropout shape) | 0.0121 | Second shape parameter of dropout Beta distribution |
| Gamma-Gamma | p (spend shape) | 4.733 | Individual spend variability shape |
| Gamma-Gamma | q (pop spend shape) | 1203 | Population spend heterogeneity shape |
| Gamma-Gamma | v (spend scale) | Scale of spending distribution |
Purpose
This section presents the estimated parameters of a BG/NBD + Gamma-Gamma probabilistic model, which quantifies customer purchase behavior and spending patterns across the entire customer base. These parameters form the foundation for predicting customer lifetime value (CLV), probability of being alive (P(alive)), and expected transaction counts—all critical metrics visible in the heatmaps and customer segments throughout the analysis.
Key Findings
- BG/NBD Purchase Rate (r=1.01, alpha=2.76): The r/alpha ratio of ~0.37 indicates moderate purchase frequency heterogeneity; customers vary substantially in their baseline purchase propensity.
- BG/NBD Dropout Parameters (a≈0, b=0.01): Near-zero values suggest weak early-stage churn signals in the model, with dropout risk concentrated among inactive customers rather than new ones.
- Gamma-Gamma Spend Shape (p=4.73, q=1202.75): The extremely high q parameter indicates strong population-level spend heterogeneity, explaining the wide CLV distribution (99.7K–100.8K range) despite similar transaction counts.
Interpretation
The model captures two distinct customer dimensions: purchase frequency (governed by BG/NBD) and monetary value (governed by Gamma-Gamma).