Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| alternative | two.sided | alternative |
| confidence_level | 0.95 | confidence_level |
| continuity_correction | TRUE | continuity_correction |
This analysis compares total ad exposure between two marketing groups (ad vs. psa) using the Mann-Whitney U test, a non-parametric statistical method chosen because the data violates normality assumptions. The objective is to determine whether meaningful differences exist in exposure levels between these test groups despite their unequal sample sizes and skewed distributions.
The analysis provides strong statistical evidence that ad exposure differs significantly between groups. The ad group experiences higher median exposure (79 vs. 35), with the difference estimated at 32
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 500 |
| Final Rows | 500 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data preprocessing pipeline for the Mann-Whitney U test comparing ad and psa groups. Perfect data retention (100%) indicates no rows were removed during cleaning, meaning all 500 observations proceeded directly to statistical analysis without filtering or exclusion.
The complete retention of all 500 rows suggests either exceptionally clean source data or minimal preprocessing requirements. This is significant for the Mann-Whitney U test results, as the full sample (475 ad, 25 psa) directly informed the statistical comparison. The absence of data removal means no selection bias was introduced through filtering, preserving the original group imbalance (95% ad vs. 5% psa) that characterizes the dataset.
The lack of train/test split indicates this was a descriptive statistical analysis rather than predictive modeling. The severe group imbalance (19:1 ratio) persisted through preprocessing, which may affect the robustness of the significant p-value (0.008) despite the medium effect size observed.
| Finding | Value |
|---|---|
| Statistical Significance | Yes (p=0.0079) |
| Effect Size | Medium (r=-0.315) |
| ad Median | 79.00 (IQR: 129.50) |
| psa Median | 35.00 (IQR: 94.00) |
| Median Difference (H-L) | 32.00 (95% CI: 8.00 to 63.00) |
| Sample Sizes | n1=475, n2=25 |
This analysis compares two groups (ad and psa) using a Mann-Whitney U test to determine whether meaningful differences exist between them. The test was selected because both groups violated normality assumptions, making it the appropriate non-parametric alternative to a t-test. Understanding whether these groups differ statistically and practically is essential for informed decision-making.
The ad group demonstrates consistently higher values than the psa group across the distribution. The Hodges-Lehmann estimate of 32 (95% CI: 8
Visual comparison of distributions between two groups
This distribution comparison visualizes how measurements differ between the ad and psa groups, revealing their underlying data shapes. It serves as critical visual evidence supporting the choice of non-parametric testing, since both groups violate normality assumptions (Shapiro-Wilk p-values < 0.001).
The overlapping histograms demonstrate that the ad group has substantially greater dispersion and right-skewness compared to psa. This visual pattern aligns with the Mann-Whitney U test result (p=0.008), confirming a statistically significant difference in central tendency. The Hodges-Lehmann estimate of 32 units represents the median difference between groups, with
Median and interquartile range comparison between groups
This section visualizes the distribution and central tendency of values across two groups (ad and psa) using box plots. It provides a clear, visual comparison of medians, spread, and outliers—essential for understanding whether the groups differ meaningfully in their typical values and variability.
The box plot comparison reveals that the ad group consistently exhibits higher values than psa across the distribution. The ad median (79) is more than double the psa median (35), and the wider IQR in ad reflects greater variability in the middle 50% of observations. This visual evidence aligns with the Mann-Whitney U test result (p=0.008), confirming a statistically significant difference between groups with a medium effect size.
Rank positions showing the basis of the U statistic
The rank distribution reveals how the Mann-Whitney U test assigns ranks to observations from both groups when combined. This visualization demonstrates the foundation of the statistical test: if one group systematically occupies higher or lower ranks, it indicates a meaningful difference in central tendency between the groups, independent of the original scale.
The negative rank-biserial correlation (-0.315) indicates the PSA group concentrates in lower rank positions, meaning PSA observations tend to have smaller original values than AD observations. This rank-based difference, combined with the significant p-value (0.008), confirms a statistically meaningful shift in the distribution's location. The Mann-Whitney U statistic (7807.5) quantifies this rank separation, providing evidence that the groups differ beyond random variation.
Rank-based testing
Mann-Whitney U test statistics and p-value
| Metric | Value |
|---|---|
| Mann-Whitney U | 7807.50 |
| P-Value | 0.0079 |
| Rank-Biserial Correlation | -0.315 |
| Hodges-Lehmann Estimate | 32.000 |
| 95% CI Lower | 8.000 |
| 95% CI Upper | 63.000 |
This section presents the Mann-Whitney U test results, a non-parametric statistical test appropriate for comparing two independent groups with non-normal distributions. It determines whether the ad and psa groups have statistically significantly different distributions, which is essential for validating whether observed differences are genuine rather than due to random variation.
The p-value of 0.0079 provides strong evidence that the ad and psa groups have meaningfully different distributions. This finding aligns with the descriptive statistics showing the ad group has a higher median (79 vs. 35) and greater spread. The Mann-Whitney U test was appropriately chosen because both groups failed normality tests (Shapiro-Wilk p-values near 0), making it more reliable than parametric alternatives.
The severe sample size imbalance (475 ad vs. 25 psa observations) should be considered when interpreting results
Effect size and practical significance assessment
This section quantifies the practical magnitude of the difference between the ad and psa groups beyond statistical significance. While the p-value (0.008) confirms a difference exists, effect size metrics reveal how large that difference is in real-world terms, which is essential for assessing whether the finding has meaningful practical importance.
The statistically significant p-value is paired with a medium effect size, meaning the ad group (median=79) genuinely differs from the psa group (median=35) in practical terms. The 32-unit median difference represents a meaningful gap, though the wide confidence interval (8–63) reflects uncertainty due to the small psa sample (n=25) and high variability in both groups.
These
Descriptive statistics for each group
| Group | N | Median | Q1 | Q3 | IQR | Min | Max | Mean | SD |
|---|---|---|---|---|---|---|---|---|---|
| ad | 475 | 79 | 35.5 | 165 | 129.5 | 1 | 1328 | 134.9 | 165 |
| psa | 25 | 35 | 11 | 105 | 94 | 1 | 334 | 76.68 | 94.83 |
This section provides descriptive statistics for each group, emphasizing median and interquartile range (IQR) rather than mean and standard deviation. This approach is essential here because both groups failed normality tests (Shapiro-Wilk p < 0.001), making median-based metrics more robust and interpretable for non-normal distributions.
The Ad group demonstrates consistently higher values across the distribution compared to the PSA group. The Mann-Whitney U test (p = 0.008) confirms this difference is statistically significant. The negative rank-biserial correlation (−0.315, medium effect) indicates PSA values tend to rank lower, supporting the hypothesis that these groups differ meaning
Shapiro-Wilk normality tests justifying non-parametric approach
| Group | Shapiro_W | Shapiro_p_value | Normality |
|---|---|---|---|
| ad | 0.6908 | 0.0000 | Rejected |
| psa | 0.7760 | 0.0001 | Rejected |
This section validates the statistical method choice for comparing the two groups (ad vs. psa). Since parametric tests assume normally distributed data, the Shapiro-Wilk normality test determines whether a non-parametric Mann-Whitney U test is appropriate. This justification is critical for ensuring the validity of the significance findings reported in the overall analysis.
Both the ad group (n=475) and psa group (n=25) exhibit significant departures from normality, as evidenced by p-values far below the 0.05 threshold. This non-normality is consistent with the observed positive skewness (1.0) and right-tailed distributions visible in the boxplot data, where maximum values substantially exceed medians. The Mann-Whitney U test (p=0.008) is therefore the appropriate choice for comparing these groups