Analysis overview and configuration

Configuration

Analysis TypeTitle Ab Test

CompanyMCP Analytics

ObjectiveDid the title changes improve click-through rates after adjusting for avg_position changes?

Analysis Date2026-03-03

Processing Idtest_1772576314

Total Observations7

Module Parameters

Parameter	Value	_row
position_adjustment	true	position_adjustment
confidence_level	0.95	confidence_level
min_impressions	10	min_impressions
decision_threshold	0.05	decision_threshold

Title Ab Test analysis for MCP Analytics

Interpretation

Purpose

This analysis evaluates whether title changes improved click-through rates (CTR) across 7 pages, with statistical adjustment for average position shifts. The test compares a control group (12 initial observations) against a treatment group (7 initial observations), examining whether position-adjusted CTR improvements are statistically significant and practically meaningful.

Key Findings

Average Adjusted CTR Lift: 0.27x across all pages—treatment variant shows 0.419 adjusted CTR versus 0.148 for control, representing a 183% relative improvement after position normalization
Statistical Significance: Only 1 of 7 pages (14.3%) achieved statistical significance; the tutorial page showed a 0.72x adjusted lift with p-value of 0.01
Position Improvement: Treatment pages ranked 2.2 positions higher on average (9.53 vs 11.76), suggesting title changes may have improved search visibility
Inconclusive Results: 6 pages (85.7%) remain inconclusive despite positive adjusted lifts, indicating insufficient statistical power (median power = 0.165)

Interpretation

While the treatment variant demonstrates a substantial adjusted CTR improvement and consistent position gains, the analysis lacks statistical power to confirm most improvements are real rather than random variation. The single promoted page shows genuine significance, but the

Data preprocessing and column mapping

Data Quality

Initial Rows19

Final Rows7

Rows Removed12

Retention Rate36.8

Data Quality

Metric	Value
Initial Rows	19
Final Rows	7
Rows Removed	12
Retention Rate	36.8%

Processed 19 observations, retained 7 (36.8%) after cleaning

Interpretation

Purpose

This section documents the data filtering applied before statistical analysis of the A/B test results. The 63.2% removal rate reflects quality control measures necessary to ensure only valid page-level comparisons enter the analysis. Understanding retention is critical because it directly impacts the reliability of conclusions drawn about treatment effectiveness across the tested pages.

Key Findings

Retention Rate: 36.8% (7 of 19 rows retained) - Indicates substantial filtering, likely removing pages with insufficient impressions or incomplete control/treatment pairs
Rows Removed: 12 observations excluded from analysis - Suggests strict minimum impression thresholds (10 impressions noted in decision parameters) were enforced
Final Dataset: 7 pages analyzed - Matches the "pages_tested" metric, confirming all retained rows represent valid page-level comparisons with both control and treatment variants

Interpretation

The aggressive filtering ensures statistical validity by excluding underpowered comparisons. With only 7 pages retained, the analysis focuses on pages meeting minimum data quality standards. This explains the low average statistical power (0.165) observed across tests—even retained pages have limited treatment impressions (mean: 238 vs. control: 1,362), creating inherent power constraints that directly contribute to the 85.7% inconclusive verdict rate.

Context

No train/test split is

Key Metrics

pages_tested: 7
pages_significant: 1
overall_win_rate: 1
avg_adjusted^ctr_|_ctr_|_ctr$lift: 0.2707
pages_winning: 1
pages_losing: 0
pages_inconclusive: 6

Key Findings

finding	value
Overall Verdict	1 promote, 0 rollback, 0 neutral, 6 keep running
Pages Tested	7
Statistically Significant	1 (14.3%)
Win Rate	100.0% (1/1)
Biggest Winner	linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html (+1.44x)
Biggest Loser	support-vector-machine-svm-practical-guide-for-data-driven-decisions.html (-0.31x)
Average Adjusted CTR Lift	0.27x
Inconclusive (Need More Data)	6 (85.7%)

Summary

Bottom Line: Tested 7 SEO title changes. 1 pages showed statistically significant effects. Overall win rate: 100.0% (significant positive / total significant). Average avg_position-adjusted CTR lift: 0.27x.

Key Findings:
• 1 pages to PROMOTE - Significant positive CTR lift, deploy new titles
• 0 pages to ROLLBACK - Significant negative CTR lift, revert to old titles
• 6 pages INCONCLUSIVE - Need more data (extend experiment 2-4 weeks)

Recommendation: Promote winning titles immediately. Rollback losers. Extend inconclusive experiments. This title testing strategy appears to be working with a 100.0% win rate among significant results.

Interpretation

EXECUTIVE SUMMARY

Purpose

This section synthesizes the overall A/B testing results across 7 pages to assess whether the SEO title optimization strategy achieved its business objective. It provides decision-makers with a clear bottom-line assessment of test performance, statistical confidence, and readiness for deployment.

Key Findings

Pages Tested: 7 total pages evaluated for title optimization impact
Statistically Significant Results: 1 page (14.3%) showed conclusive evidence of improvement
Win Rate Among Significant Pages: 100% (1 winner, 0 losers)
Average Adjusted CTR Lift: 0.27x across all tested pages, indicating position-normalized performance gains
Inconclusive Pages: 6 pages (85.7%) lack sufficient statistical power to draw conclusions

Interpretation

The experiment identified one clear winner—the product bundle affinity tutorial achieved a 72% adjusted CTR lift with p-value of 0.01. However, 86% of tested pages remain inconclusive due to low statistical power (average power: 0.165), suggesting insufficient sample sizes or effect sizes too small to detect reliably. The 100% win rate reflects only one significant result, limiting confidence in the broader strategy's effectiveness.

Context

The treatment group received substantially fewer impressions (1,

Position-Adjusted CTR Lift per Page with 95% Confidence Intervals

Interpretation

Purpose

This forest plot visualizes position-adjusted CTR lift across 7 tested pages, isolating the treatment effect from natural position-based click variations. By adjusting for search position, the analysis reveals whether observed CTR changes reflect genuine content improvements or simply result from ranking shifts. This is critical for understanding whether the treatment causally improved user engagement.

Key Findings

Average Adjusted CTR Lift: 0.27x across all pages—a modest positive signal, but heavily influenced by outliers
Statistical Significance: Only 1 of 7 pages (14.3%) achieved significance; the Shopify tutorial showed a 0.72x lift with narrow confidence intervals
Effect Size Range: Spans from -0.31x (SVM article) to +1.44x (LDA article), indicating highly variable treatment response across content types
Confidence Interval Width: Most bars cross zero, reflecting insufficient sample sizes to distinguish signal from noise

Interpretation

The experiment reveals mixed results when controlling for position effects. While the average lift appears positive, 86% of pages remain inconclusive—their confidence intervals encompass zero, meaning observed differences could plausibly be due to chance. The single significant winner (Shopify tutorial) demonstrates the treatment can work, but most pages require additional data to confirm whether improvements are real or statistical artifacts.

Side-by-Side Control vs Treatment Metrics (Before Position Adjustment)

page	control_impressions	control_clicks	control_ctr	control_position	treatment_impressions	treatment_clicks	treatment_ctr	treatment_position	raw_ctr_lift	position_change
articles/arima-practical-guide-for-data-driven-decisions.html	1303	2	0.0015	15.91	282	0	0	10.56	-0.0015	-5.35
articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html	919	1	0.0011	12.73	291	1	0.0034	10.66	0.0023	-2.07
articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html	1082	2	0.0018	15.63	195	3	0.0154	11.61	0.0136	-4.02
articles/one-class-svm-practical-guide-for-data-driven-decisions.html	1567	1	6.00e-04	9.22	422	1	0.0024	8.24	0.0018	-0.98
articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html	1639	0	0	6.21	262	0	0	8.27	0	2.06
articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html	1331	3	0.0023	13.54	176	0	0	11.14	-0.0023	-2.4
tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html	1693	0	0	9.1	41	1	0.0244	6.26	0.0244	-2.84

Interpretation

Purpose

This section presents raw, unadjusted performance metrics to establish a baseline comparison between control and treatment periods. It serves as the foundation for understanding why position-adjusted analysis is necessary—the treatment variant achieved a higher raw CTR (0.65% vs 0.10%), but this difference may be partially or entirely attributable to improved search ranking (avg_position improved by 2.23 positions) rather than title quality alone.

Key Findings

Raw CTR Lift: Treatment shows 0.55 percentage points higher CTR (0.65% vs 0.10%), but this is confounded by ranking improvements
Position Improvement: Treatment pages ranked 2.23 positions higher on average (9.53 vs 11.76), which naturally drives more clicks regardless of title effectiveness
Impression Imbalance: Control received 5.7× more impressions (9,534 vs 1,669), reflecting the unequal traffic split during the test period
Click Volume: Treatment generated 6 clicks from fewer impressions, while control generated 9 clicks from substantially more exposure

Interpretation

The raw metrics suggest treatment outperforms, but this comparison conflates two distinct factors: title quality and search ranking. Since treatment pages ranked higher, they received more favorable visibility, making it impossible to isolate whether improved CTR stems

Position vs CTR Scatter with Expected CTR Curve

Interpretation

Purpose

This scatter plot isolates the relationship between search position and click-through rate to determine whether the treatment's CTR improvements stem from better positioning or from title/content changes that drive clicks independent of position. By overlaying observed data against the expected CTR curve, it reveals whether the treatment variant is outperforming or underperforming industry benchmarks at its achieved positions.

Key Findings

Position Improvement: Treatment variant achieved mean position of 9.53 vs. control's 11.76 (−2.23 position change), moving closer to top results where higher CTR is expected
CTR vs. Expected: Treatment points cluster near or slightly above the expected CTR curve, suggesting performance aligns with position-based benchmarks rather than exceptional title/content quality
Curve Alignment Pattern: Most pages show treatment points tracking parallel to the expected curve, indicating position changes drive CTR gains rather than independent title optimization

Interpretation

The treatment's +0.27x adjusted CTR lift appears largely attributable to improved search positioning rather than superior title or content relevance. With treatment pages ranking ~2.2 positions higher on average, they naturally capture more clicks according to industry expectations. Only the promoted tutorial page (0.72x lift) shows meaningful divergence from the curve, suggesting genuine title/content superiority beyond positional advantage.

Context

Low absolute

Position Adjustment Calculation Breakdown

page	variant	raw_ctr	position_val	expected_ctr	adjusted_ctr	adjustment_factor
articles/arima-practical-guide-for-data-driven-decisions.html	control	0.0015	15.91	0.0059	0.2558	0.2558
articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html	control	0.0011	12.73	0.0081	0.136	0.136
articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html	control	0.0018	15.63	0.0061	0.2971	0.2971
articles/one-class-svm-practical-guide-for-data-driven-decisions.html	control	6.00e-04	9.22	0.015	0.04	0.04
articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html	control	0	6.21	0.034	0	0
articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html	control	0.0023	13.54	0.0075	0.3058	0.3058
tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html	control	0	9.1	0.015	0	0
articles/arima-practical-guide-for-data-driven-decisions.html	treatment	0	10.56	0.0096	0	0
articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html	treatment	0.0034	10.66	0.0095	0.3565	0.3565
articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html	treatment	0.0154	11.61	0.0089	1.736	1.736
articles/one-class-svm-practical-guide-for-data-driven-decisions.html	treatment	0.0024	8.24	0.02	0.12	0.12
articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html	treatment	0	8.27	0.02	0	0
articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html	treatment	0	11.14	0.0092	0	0
tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html	treatment	0.0244	6.26	0.034	0.7176	0.7176

Interpretation

Purpose

This section isolates the title/content quality impact from ranking position effects by normalizing actual CTR against industry benchmarks for each position. Since the treatment variant achieved better average positions (9.53 vs 11.76), adjusted CTR reveals whether improved clicks stem from better rankings alone or from genuinely stronger title performance. This is critical for evaluating whether the treatment represents a true quality improvement.

Key Findings

Adjusted CTR Range: 0 to 1.74x across all observations, with mean of 0.28x—indicating most pages underperform position-adjusted benchmarks
Treatment Outperformance: Treatment variant shows higher adjusted CTR (0.419x avg) versus control (0.148x avg), a 0.271x lift—suggesting title changes drive engagement beyond position gains
Extreme Outlier: Linear Discriminant Analysis treatment achieves 1.74x adjusted CTR, dramatically exceeding position-based expectations
Position Dependency: Expected CTR varies 0.01–0.03 based on position; treatment's improved ranking (−2.23 position change) would naturally boost raw CTR, but adjusted metrics show additional quality gains

Interpretation

The adjustment mechanism reveals that treatment pages don't merely benefit from ranking improvements—they demonstrate stronger intrinsic appeal relative to their positions. The

Significance Testing Results with P-Values and Power Analysis

page	p_value	adjusted_p_value	ci_lower	ci_upper	power	is_significant	sample_size_adequate
articles/arima-practical-guide-for-data-driven-decisions.html	1	1	-0.0052	0.0021	0.1536	False	True
articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html	0.9749	1	-0.007	0.0117	0.0949	False	True
articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html	0.0305	0.2137	-0.007	0.034	0.3621	False	True
articles/one-class-svm-practical-guide-for-data-driven-decisions.html	0.8958	1	-0.0046	0.008	0.1046	False	True
articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html	1	1	0	0	0	False	True
articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html	1	1	-0.0071	0.0025	0.1447	False	True
tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html	0.0017	0.012	-0.0353	0.0841	0.295	True	True

Interpretation

Purpose

This section evaluates whether observed treatment effects are statistically reliable or likely due to chance. Using two-proportion z-tests with Bonferroni correction, it controls for false positives when testing multiple pages simultaneously. Understanding statistical significance is critical for distinguishing genuine improvements from noise in the experiment.

Key Findings

Significance Rate: 14.3% (1 of 7 pages) achieved statistical significance at the 0.05 threshold—only the Shopify tutorial showed a reliable effect
Median P-Value: 0.975 indicates most pages show no meaningful difference between control and treatment variants
Average Statistical Power: 0.165 (well below the 0.8 benchmark) reveals the experiment is severely underpowered to detect real effects
Bonferroni Adjustment: Protects against false positives by raising the effective significance bar when testing multiple pages

Interpretation

The low power (16.5%) means the experiment lacks sufficient sample size to reliably detect treatment effects across most pages. Six of seven pages remain inconclusive—not because the treatment failed, but because the data volume is insufficient to distinguish signal from noise. The single significant result (Shopify tutorial, p=0.01) passed the corrected threshold, but the broader pattern suggests most observed differences are statistically indistinguishable from zero

Per-Page Recommendations with Estimated Click Uplift

page	verdict	adjusted_ctr_lift	p_value	estimated_monthly_click_uplift
articles/arima-practical-guide-for-data-driven-decisions.html	keep_running	-0.2558	1	0
articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html	keep_running	0.2205	1	0
articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html	keep_running	1.438	0.2137	0
articles/one-class-svm-practical-guide-for-data-driven-decisions.html	keep_running	0.08	1	0
articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html	keep_running	0	1	0
articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html	keep_running	-0.3058	1	0
tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html	promote	0.7176	0.012	30

Interpretation

Purpose

This section synthesizes per-page experiment results into clear verdicts based on statistical significance and effect size. It translates raw statistical findings into actionable categories—promote, rollback, neutral, or keep running—enabling stakeholders to make informed decisions about which title treatments should be deployed, reverted, or extended with additional data collection.

Key Findings

Pages Winning: 1 page (14.3%) achieved statistical significance with positive effect—the Shopify bundle affinity tutorial with +0.72x adjusted CTR lift and p-value of 0.01
Pages Inconclusive: 6 pages (85.7%) lack sufficient statistical power (mean power = 0.165) to draw definitive conclusions, despite showing mixed effect directions (range: -0.31x to +1.44x)
Estimated Monthly Click Uplift: Only the winning page projects measurable impact (30 additional clicks/month); all others show zero uplift due to non-significance
No Negative Outcomes: Zero pages met rollback criteria, indicating the treatment variants caused no statistically detectable harm

Interpretation

The experiment reveals a highly imbalanced power distribution: one clear winner emerged from statistical testing, but the majority of pages remain underpowered. The wide range of adjusted CTR lifts (-0.31x to +1

Experiment Overview

Configuration

Module Parameters

Interpretation

Purpose

Key Findings

Interpretation

Data Preprocessing

Data Quality

Data Quality

Interpretation

Purpose

Key Findings

Interpretation

Context