Demo · Advertising · Marketing Spend · Linear Regression
Overview

Overview

Analysis overview and configuration

Analysis TypeLinear Regression
CompanyDigital Marketing Agency
ObjectiveBuild a multiple linear regression model to predict sales revenue from multi-channel advertising spend (TikTok, Facebook, Google Ads)
Analysis Date2026-03-02
Processing Idtest_1772483613
Total Observations0
ParameterValue_row
confidence_level0.95confidence_level
include_interaction_termsFALSEinclude_interaction_terms
model_selection_methodforwardmodel_selection_method
diagnostic_plotsTRUEdiagnostic_plots
vif_threshold10vif_threshold
cv_folds5cv_folds
cv_seed42cv_seed
cooks_d_threshold0.5cooks_d_threshold
include_prediction_intervalsTRUEinclude_prediction_intervals
include_standardized_coefsTRUEinclude_standardized_coefs
heteroscedasticity_testbreusch_paganheteroscedasticity_test
alpha0.05alpha
Interpretation

Analysis Insights: Multi-Channel Advertising ROI Prediction

Purpose

This analysis builds a predictive model for sales revenue based on advertising spend across three digital channels (TikTok, Facebook, Google Ads). The objective is to quantify each channel's contribution to revenue and enable data-driven budget allocation decisions for the marketing agency.

Key Findings

  • R-squared (0.782): The model explains 78.2% of revenue variance, indicating strong predictive power with meaningful explanatory value across the three channels
  • All Predictors Significant: All three advertising channels show p-values of 0.0000, confirming statistically significant relationships with revenue
  • Google Ads ROI (1.216): Highest coefficient—each unit of Google spend generates the strongest revenue impact, followed by Facebook (0.488) and TikTok (0.360)
  • TikTok as Most Important: Despite lowest raw coefficient, TikTok shows the highest standardized coefficient (0.633), indicating greatest relative importance when accounting for variable scaling
  • Model Stability: Cross-validation R² of 0.764 vs. training R² of 0.782 suggests minimal overfitting (0.995 ratio), with consistent performance across data folds

Interpretation

The model reveals a nuanced channel hierarchy: while Google

Data preprocessing and column mapping

Initial Rows0
Final Rows0
Rows Removed0
Retention Rate100
Interpretation

Purpose

This section documents the data preprocessing pipeline for the marketing ROI analysis. It shows that no data rows were processed during the preprocessing stage, which is inconsistent with the main analysis that evaluated 200 observations. This discrepancy suggests the preprocessing metadata may not have been properly captured or the pipeline documentation is incomplete.

Key Findings

  • Initial Rows: 0 - No input data recorded in preprocessing logs
  • Final Rows: 0 - No output data documented after cleaning
  • Retention Rate: 100% - Perfect retention rate, but meaningless given zero rows
  • Data Quality: No transformations, filtering, or quality checks are documented despite the main analysis using 200 complete observations

Interpretation

The preprocessing section shows zero rows processed, yet the regression model successfully analyzed 200 observations with no rows removed (initial_rows = 200, rows_removed = 0). This indicates either the preprocessing documentation failed to capture the actual data pipeline, or the data was loaded directly without formal preprocessing steps. The 100% retention rate is technically accurate but uninformative—it reflects that no data was explicitly removed, not that preprocessing was thorough.

Context

This metadata gap creates uncertainty about data quality decisions, missing value handling, and feature engineering applied before modeling. Given the main analysis shows heteroscedasticity issues and non-normal residuals, understanding preprocessing choices would be critical

Executive Summary

Executive Summary

Executive summary with key findings and business recommendations

r_squared
0.7823
best_channel
Google
significant_predictors
3
MetricValue
Model PerformanceR² = 78.2% (explains 78.2% of sales variance)
Best ChannelGoogle with ROI = $1.22 per ad dollar
Significant Channels3 of 3 channels statistically significant
Model QualityStrong
Bottom Line: Regression model explains 78.2% of sales variance across 3 advertising channels. Google delivers highest marginal ROI at $1.22 of sales per ad dollar. 3 channels show statistically significant effects (p < 0.05).

Key Findings:
• Model fit: R² = 0.782 (Adjusted R² = 0.779)
• Prediction accuracy: RMSE = $1256.97 average error
• Statistical significance: F-test p-value = 0.0000
• ✓ No multicollinearity: Max VIF = 1.02

Recommendation: Reallocate budget to Google (highest ROI). Use coefficient estimates to optimize spend allocation across channels. Model assumptions satisfied - coefficient estimates are reliable.
Interpretation

Purpose

This executive summary synthesizes the advertising channel analysis to assess whether the regression model reliably explains sales performance and supports budget allocation decisions. The 78.2% variance explained indicates strong predictive power, but diagnostic concerns require careful interpretation before deployment.

Key Findings

  • R-squared (0.782): Model explains 78.2% of sales variance, indicating solid explanatory power across the three advertising channels
  • Google Ads ROI ($1.22): Highest marginal return per advertising dollar, significantly outperforming Facebook ($0.49) and TikTok ($0.36)
  • Statistical Significance: All three predictors are statistically significant (p < 0.001) with no multicollinearity concerns (Max VIF = 1.02)
  • Model Diagnostics: Heteroscedasticity detected (Breusch-Pagan p = 0.003) and non-normal residuals (Shapiro-Wilk p < 0.001) violate regression assumptions
  • Prediction Accuracy: RMSE of $1,257 represents ~12% error relative to mean sales ($10,668)

Interpretation

The model demonstrates strong predictive capability with all channels showing reliable, significant effects. Google's superior ROI suggests budget reallocation potential. However, violated diagnostic assumptions—

Visualization

Model Fit

How well does the model predict sales? Actual vs predicted values with model performance metrics

Interpretation

Purpose

This section evaluates how accurately the regression model predicts sales outcomes by comparing actual values against model predictions. Understanding model fit is essential for assessing whether the three marketing channels (TikTok, Facebook, Google Ads) reliably explain sales variation and whether predictions are trustworthy for business decisions.

Key Findings

  • R² = 0.782: The model explains 78.2% of sales variance, indicating strong predictive power across the 200 observations
  • Adjusted R² = 0.779: Minimal difference from R² suggests the three predictors are justified without overfitting penalties
  • RMSE = $1,256.97: Average prediction error represents approximately 11.8% of mean sales ($10,668), a reasonable margin for forecasting
  • F-statistic p-value ≈ 0: Confirms all three marketing channels are statistically significant contributors to the model

Interpretation

The model demonstrates solid predictive capability, with actual sales clustering reasonably close to predicted values. The near-identical R² and adjusted R² values indicate the model complexity is appropriate—no unnecessary predictors inflate performance artificially. Residuals averaging zero with median of -$244 suggest slight systematic underprediction at lower values, though overall bias is minimal.

Context

This fit assessment assumes linear relationships between marketing spend and sales.

Visualization

Channel ROI

Which advertising channels drive the most sales per dollar spent? Coefficient estimates with confidence intervals

Interpretation

Purpose

This section quantifies the sales impact of each advertising channel by estimating marginal ROI—the dollars of sales generated per dollar spent. All three channels show statistically significant effects, meaning their contributions to sales are reliable and not due to chance. These coefficients directly address the core business question of which channels deliver the strongest financial returns.

Key Findings

  • Google Ads ROI (1.22): Highest coefficient with 95% CI [1.01, 1.42]—generates $1.22 in sales per ad dollar spent
  • Facebook ROI (0.49): Mid-range coefficient with CI [0.42, 0.56]—generates $0.49 per ad dollar
  • TikTok ROI (0.36): Lowest coefficient with CI [0.32, 0.40]—generates $0.36 per ad dollar
  • Statistical Significance: All p-values = 0, indicating extremely strong evidence that each channel's effect is real

Interpretation

Google Ads demonstrates substantially higher efficiency than both social channels, delivering more than 2.5× the return of TikTok. The tight confidence intervals (none crossing zero) confirm these rankings are stable estimates rather than statistical artifacts. This reflects the model's R² of 0.782, meaning these three channels explain 78% of

Visualization

Residual Diagnostics

Are model assumptions satisfied? Residual plots check for homoscedasticity and linearity

Interpretation

Purpose

This section evaluates whether the linear regression model satisfies two critical assumptions: homoscedasticity (constant variance across fitted values) and linearity (random scatter around zero). Violations of these assumptions undermine model reliability and suggest the relationship between predictors and outcomes may not be adequately captured by the linear specification.

Key Findings

  • Residual Range: -2,185 to +3,915 with median of -244, indicating asymmetric distribution around zero
  • Standardized Residuals: Range from -1.73 to +3.11, with one observation exceeding ±3 standard deviations (potential outlier)
  • Residual Skewness: 0.58 shows right-skewed distribution, suggesting systematic positive bias in larger predictions
  • Heteroscedasticity Detected: Breusch-Pagan test (p=0.003) confirms non-constant variance across fitted values

Interpretation

The residual plot reveals violated homoscedasticity assumptions. The positive skew and median offset from zero indicate the model systematically underpredicts at certain fitted value ranges. The standardized residual exceeding ±3σ represents a notable outlier. These violations, confirmed by the failed diagnostic tests, suggest the linear model may misspecify the relationship between marketing

Visualization

Normality Check

Are residuals normally distributed? QQ plot validates normality assumption required for inference

Interpretation

Purpose

The QQ plot assesses whether residuals follow a normal distribution—a critical assumption for valid p-values and confidence intervals in regression. Deviations from the 45° reference line indicate non-normality, which can undermine the reliability of statistical inference for the marketing channel ROI model.

Key Findings

  • Shapiro-Wilk Test: p-value = 0.0000 - Normality assumption is rejected; residuals significantly deviate from normal distribution
  • Residual Skewness: 0.58 - Positive skew indicates right-tail heaviness; more large positive residuals than expected under normality
  • Tail Behavior: Sample residuals range from -2,185 to 3,915, showing asymmetric spread inconsistent with theoretical normal distribution (±2.81 range)
  • Pattern Observed: Upper tail deviations suggest the model systematically underpredicts high-value outcomes

Interpretation

The residuals exhibit non-normal distribution, particularly in the upper tail, which violates a foundational assumption of ordinary least squares regression. This means the reported p-values (all 0.0000) and 95% confidence intervals for TikTok, Facebook, and Google Ads coefficients may be unreliable. The positive skew combined with heteroscedast

Data Table

Multicollinearity Check

Are predictors highly correlated? VIF (Variance Inflation Factor) detects multicollinearity that inflates coefficient uncertainty

teststatisticp_valueresult
Normality (Shapiro-Wilk)0.96300.0000Fail
Homoscedasticity (Breusch-Pagan)9.11010.0025Fail
Autocorrelation (Durbin-Watson)1.2206N/AFail
Interpretation

Purpose

This section evaluates whether predictors (TikTok, Facebook, Google Ads) are highly correlated with each other—a condition called multicollinearity that inflates coefficient uncertainty and reduces model reliability. VIF quantifies this relationship, with values above 10 indicating problematic correlation that compromises statistical inference.

Key Findings

  • Max VIF: 1.018 - All three advertising channels have VIF values near 1.0, well below the critical threshold of 10, indicating negligible correlation between predictors
  • VIF Warning: FALSE - No multicollinearity alert triggered, confirming predictors are sufficiently independent
  • Predictor Independence: TikTok, Facebook, and Google Ads spending patterns are distinct and non-redundant in explaining outcome variation

Interpretation

The extremely low VIF values (all ≤1.02) demonstrate that the three advertising channels operate independently in the dataset. This independence strengthens confidence in the coefficient estimates—each channel's ROI (TikTok: 0.36, Facebook: 0.49, Google Ads: 1.22) reflects its true isolated effect rather than shared variance with other channels. The model's ability to distinguish individual channel contributions is therefore robust.

Context

While multicollinearity is not a concern, the diagnostic tests reveal violations in normality (Shap

Visualization

Influential Points

Which observations have outsized influence on the model? Cook's Distance and leverage identify problematic data points

Interpretation

Purpose

This section identifies observations that disproportionately affect model coefficients and predictions. By detecting influential points and high-leverage cases, we can assess whether the model's estimates are robust or driven by a small number of unusual data points. This is critical for validating the reliability of the marketing channel ROI estimates.

Key Findings

  • Influential Count: 0 observations — No points with Cook's Distance > 0.5, indicating no individual observations are distorting coefficient estimates
  • High-Leverage Count: 8 observations — Points with extreme predictor values that could potentially affect fitted values, though not currently exerting undue influence
  • Maximum Cook's Distance: 0.051 — Well below the 0.5 threshold, confirming minimal overall influence from any single observation
  • Leverage Range: 0.01–0.06 (mean 0.02) — Distributed across the predictor space with no extreme outliers

Interpretation

The model demonstrates strong stability: zero influential points means the TikTok, Facebook, and Google Ads coefficients are not driven by outliers. The 8 high-leverage observations represent unusual combinations of predictor values but do not distort estimates because their residuals remain moderate. This validates that the ROI estimates (TikTok: 0.36, Facebook: 0.49,

Visualization

Cross-Validation

How well does the model generalize to new data? Cross-validation assesses out-of-sample performance

Interpretation

Purpose

This section evaluates whether the marketing ROI model generalizes reliably to new, unseen data. Cross-validation partitions the dataset into five folds, training on four and testing on one, repeated across all combinations. This reveals whether the model's strong training performance (R² = 0.782) holds up when applied to data it hasn't encountered, which is critical for real-world deployment.

Key Findings

  • Overfit Ratio: 0.995 (train RMSE $1256.97 vs CV RMSE $1263.04) — Ratio near 1.0 indicates minimal overfitting; the model performs nearly identically on held-out data as on training data
  • CV R²: 0.764 — Explains 76.4% of variance in unseen folds, only 1.8 percentage points below training R² (0.782), confirming stable predictive power
  • Fold Consistency: RMSE ranges from $1072.76 to $1402.97 across folds (SD = $142.46), showing moderate variability but no systematic degradation pattern

Interpretation

The model demonstrates excellent generalization. The negligible gap between training and cross-validation metrics (0.5% difference in RMSE) suggests the three marketing channels (TikTok

Visualization

Prediction Intervals

What is the uncertainty around individual predictions? 95% prediction intervals quantify forecast precision

Interpretation

Purpose

Prediction intervals quantify uncertainty around individual forecasts by establishing lower and upper bounds where actual values are expected to fall. This section evaluates whether the model's uncertainty estimates are well-calibrated—critical for risk assessment and decision-making when deploying the marketing ROI model in production environments.

Key Findings

  • Coverage Rate (97%): Actual values fall within 95% prediction intervals 97% of the time, exceeding the nominal 95% target and indicating excellent calibration with minimal over- or under-confidence.
  • Interval Width (~$5,058 average): Consistent width across predictions (range $5,029–$5,147) reflects stable uncertainty estimation relative to the mean prediction of $10,668.
  • Perfect Containment: All 200 observations fall within their respective intervals, with no systematic misses suggesting the model's uncertainty quantification is reliable across the prediction range.

Interpretation

The 97% coverage rate demonstrates the model produces trustworthy uncertainty bounds. Predictions are neither overconfident (which would yield <95% coverage) nor overly conservative (which would exceed 98%). The narrow standard deviation of interval widths ($23.69) indicates uncertainty is uniformly estimated, not concentrated in specific regions. This calibration validates the model's suitability for business decisions requiring probabilistic forecasts of marketing channel ROI.

Visualization

Feature Importance

Which channel has the strongest impact? Standardized coefficients enable fair comparison across different spend scales

Interpretation

Purpose

This section identifies which marketing channel drives the strongest relative impact on sales outcomes by comparing standardized effect sizes. Standardized coefficients normalize for differences in spend scale across channels, enabling fair comparison of true influence regardless of budget magnitude. This directly addresses the core business question: which channel delivers the most efficient return per unit of variation in spending?

Key Findings

  • Most Important Predictor: TikTok with standardized coefficient of 0.633 - the highest relative impact among three channels
  • Relative Effect Ranking: TikTok (0.633) > Facebook (0.453) > Google Ads (0.392) in terms of standardized influence
  • Interpretation Scale: A one standard deviation increase in TikTok spending produces a 0.633 standard deviation change in sales, compared to 0.453 for Facebook and 0.392 for Google Ads

Interpretation

Despite Google Ads having the largest raw coefficient (1.22), TikTok demonstrates the strongest standardized effect, indicating superior efficiency when accounting for spend variability. This reveals that TikTok's influence on sales is more pronounced relative to its natural variation in spending patterns. The ranking reflects true comparative leverage: TikTok's marginal impact per unit of standardized variation substantially exceeds both Facebook and Google Ads, making it the most influential channel in the model

Visualization

Heteroscedasticity

Is variance constant across fitted values? Scale-Location plot detects heteroscedasticity that violates regression assumptions

Interpretation

Purpose

This section evaluates whether prediction error variance remains constant across all fitted values—a core assumption of linear regression. The Scale-Location plot visualizes this relationship, while the Breusch-Pagan test provides statistical confirmation. Detecting heteroscedasticity is critical because it undermines the reliability of confidence intervals and hypothesis tests, even when predictions appear accurate.

Key Findings

  • Breusch-Pagan Test Statistic: 9.11 with p-value = 0.0025 - Statistically significant evidence of non-constant variance (p < 0.05)
  • Heteroscedasticity Status: DETECTED - The regression violates the homoscedasticity assumption
  • Smooth Line Trend: Ranges from 0.78 to 0.94 across fitted values, indicating slight variance reduction at higher predictions rather than a dramatic pattern

Interpretation

The model exhibits heteroscedasticity, meaning prediction errors are not uniformly distributed across the range of fitted values. While the trend is modest (smooth line variation of ±0.08 around mean 0.87), it is statistically significant. This suggests that uncertainty in marketing ROI predictions may be systematically higher or lower depending on predicted spending levels, potentially affecting the precision of confidence intervals around channel-specific ROI estimates (TikTok: 0

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing