Overview

Analysis Overview

Analysis overview and configuration

Analysis TypeT Test

CompanyEducational Research Institute

ObjectiveCompare math scores between male and female students using an independent samples t-test

Analysis Date2026-03-12

Processing Idtest_1773376459

Total Observations1000

Parameter	Value	_row
alternative	two.sided	alternative
confidence_level	0.95	confidence_level
significance_level	0.05	significance_level
var_equal	FALSE	var_equal

Interpretation

Purpose

This analysis compares math scores between male and female students using an independent samples t-test on 1,000 observations. The objective is to determine whether statistically significant differences exist in academic performance between genders, providing evidence-based insights into student achievement patterns.

Key Findings

Mean Difference: -5.095 points (females score lower than males on average)
Statistical Significance: p-value = 0.0000 indicates the difference is highly statistically significant
Effect Size (Cohen's d): -0.341 represents a small practical effect despite statistical significance
95% Confidence Interval: -6.947 to -3.243 points, confirming the difference is real and consistent
Sample Composition: 518 females vs. 482 males with comparable variability (SD ~15 points each)

Interpretation

Male students demonstrate significantly higher math scores than female students, with males averaging 68.73 compared to females at 63.63. While the 5-point difference is statistically robust (t = -5.398, df = 997.98), the small effect size indicates this difference, though real, represents modest practical significance. Both groups show similar score distributions (IQR = 20, range 0-100), suggesting comparable variability in performance within each gender.

###

Data preprocessing and column mapping

Initial Rows1000

Final Rows1000

Rows Removed0

Retention Rate100

Interpretation

Purpose

This section documents the data preprocessing pipeline for a comparative analysis examining differences between female and male groups. Perfect data retention (100%) indicates no rows were removed during cleaning, meaning the full dataset of 1,000 observations proceeded to statistical testing without data loss or exclusion criteria applied.

Key Findings

Retention Rate: 100% (1,000 of 1,000 rows retained) - No observations were filtered or removed during preprocessing
Rows Removed: 0 - The dataset required no cleaning interventions, suggesting either high initial data quality or minimal validation criteria
Sample Composition: Balanced groups (518 female, 482 male) were preserved intact through the pipeline

Interpretation

The complete retention of all observations supports the validity of the subsequent Welch's t-test results, which compared mean values across both groups. With no data loss, the statistical power and representativeness of the analysis remain uncompromised. The balanced group sizes (approximately 52% female, 48% male) were maintained, enabling fair comparison of the -5.095 mean difference observed between groups.

Context

No train/test split was applied, indicating this analysis focused on descriptive comparison rather than predictive modeling. The absence of documented transformations suggests the raw values (0–100 scale) were analyzed directly, though the Shapiro-Wilk

Executive Summary

Executive summary of t-test results

initial_rows

1000

final_rows

1000

rows_removed

Finding	Value
Statistical Significance	Yes (p=0.0000)
Effect Size	Small (d=-0.341)
female Mean	63.63 (SD: 15.49)
male Mean	68.73 (SD: 14.36)
Mean Difference (95% CI)	-5.095 (95% CI: -6.947 to -3.243)
Sample Sizes	n1=518, n2=482

Bottom Line: There IS a statistically significant difference between female and male (p=0.0000). The effect size is small (Cohen's d = -0.341), indicating a small practical difference.

Key Findings:
• Compared 518 observations from female vs 482 from male
• Means: 63.63 vs 68.73 (difference: 5.095)
• Small effect (Cohen's d: -0.341)
• Used Welch's t-test

Recommendation: Both statistical significance and meaningful effect size support taking action based on this group difference.

Interpretation

EXECUTIVE SUMMARY

Purpose

This analysis compares a measured outcome between female and male populations using a rigorous statistical test. The findings directly address whether meaningful differences exist between these groups, which is critical for understanding population-level patterns and informing targeted strategies.

Key Findings

Statistical Significance: p-value = 0.0000 - The difference between groups is highly unlikely due to chance alone
Mean Difference: Males score 5.1 points higher than females (68.73 vs 63.63 on a 0-100 scale)
Effect Size: Cohen's d = -0.341 - While statistically significant, the practical magnitude is small
Sample Balance: 518 females and 482 males provide robust statistical power with no data loss
Normality Caveat: Both groups show slight deviations from normality (p < 0.05), though Welch's t-test is robust to this violation

Interpretation

The analysis confirms a statistically significant difference between groups with 99.99% confidence. Males consistently score approximately 5 points higher. However, the small effect size (Cohen's d = -0.341) indicates this difference, while real, represents modest practical separation. The 95% confidence interval (-6.95 to -3.24) excludes zero, reinfor

Visualization

Distribution Comparison

Visual comparison of distributions between two groups

Interpretation

Purpose

This density overlay visualization compares the distribution shapes and central tendencies between female and male groups. It provides a visual foundation for understanding whether observed differences are driven by shifts in the entire distribution or concentrated in specific regions, complementing the statistical test results.

Key Findings

Mean Difference: Males score 5.1 points higher (68.73 vs. 63.63), representing a rightward shift in the male distribution
Spread Similarity: Both groups show comparable variability (SD: 15.49 female, 14.36 male), indicating consistent dispersion across groups
Distribution Shape: Both distributions appear approximately symmetric (skew ≈ -0.08), suggesting the difference is primarily a location shift rather than shape distortion
Overlap Pattern: Substantial curve overlap indicates considerable within-group variation relative to between-group differences

Interpretation

The density curves reveal that while males demonstrate a statistically significant higher mean (p < 0.001), the distributions overlap considerably. This aligns with the small effect size (Cohen's d = -0.341), indicating the practical magnitude of difference is modest despite statistical significance. The parallel spread patterns suggest the groups have homogeneous variance, supporting the equal variance assumption used in the Welch's t-test.

Context

The visual representation assumes kernel density estimation accuracy. The range extension (−

Visualization

Box Plot Comparison

Means and spread comparison between groups via box plots

Interpretation

Purpose

This section visualizes the distribution and central tendency of values across gender groups through box plots. It provides an intuitive way to compare group differences in location, spread, and variability—essential for understanding whether observed differences are meaningful or attributable to natural variation.

Key Findings

Mean Difference: Males score 5.1 points higher (68.73 vs. 63.63), a statistically significant gap (p < 0.001)
Spread Consistency: Both groups show similar variability (SD: 15.49 for females, 14.36 for males), with identical interquartile ranges (IQR = 20)
Distribution Shape: Both groups display symmetric distributions (skew ≈ 0.02) across the 0–100 scale, with comparable medians (65 vs. 69)

Interpretation

The box plots reveal that while males consistently score higher on average, the distributions largely overlap, indicating substantial within-group variation. The small effect size (Cohen's d = -0.341) confirms that despite statistical significance, the practical difference is modest. Both groups span the full measurement range, suggesting the underlying construct varies considerably within each gender.

Context

These visual comparisons complement the Welch's t-test results. Note that both groups violated normality assumptions (Shapiro-

Visualization

Normality Diagnostics (QQ Plot)

QQ plots and Shapiro-Wilk tests to assess normality assumption

Interpretation

Purpose

This section evaluates whether the data meets the normality assumption required for valid t-test inference. Normality diagnostics are critical because violations can affect the reliability of p-values and confidence intervals, particularly with smaller samples. Understanding departures from normality helps contextualize the robustness of the group comparison findings.

Key Findings

Shapiro-Wilk p-value (Female): 0.0035 - Statistically significant departure from normality; the female group distribution deviates from a normal curve
Shapiro-Wilk p-value (Male): 0.0380 - Marginal but significant departure from normality; the male group shows slight non-normal behavior
Variance Equality (F-test): p = 0.0902 - Variances are approximately equal across groups, supporting the use of Welch's t-test
QQ Plot Pattern: Sample values show slight deviations at distribution tails, consistent with bounded data (0–100 range)

Interpretation

Both groups exhibit statistically significant departures from normality, though the effect is modest. The near-equal variances (p > 0.05) justify the Welch's t-test choice, which is robust to moderate normality violations. The significant gender difference (t = -5

Visualization

Effect Size

Cohen's d effect size and practical significance assessment

Interpretation

Purpose

This section quantifies the practical significance of the observed difference between female and male groups. While statistical significance (p < 0.001) confirms the difference is real, effect size measures whether that difference is meaningful in practical terms. Cohen's d standardizes the difference relative to variability, enabling comparison across studies and contexts.

Key Findings

Cohen's d: -0.341 (Small) - The difference falls within the "small" range (0.2–0.5), indicating modest practical significance despite strong statistical evidence
Mean Difference: -5.095 units (95% CI: -6.947 to -3.243) - Males scored approximately 5 points higher on average, with high confidence the true difference lies between 3.2 and 6.9 units
Confidence Interval: The narrow CI excludes zero, reinforcing that the difference is consistent and reliable across repeated sampling

Interpretation

The statistically significant t-test result is tempered by a small effect size, meaning the groups differ reliably but not dramatically. Males average 5 points higher than females, but this 5-point gap represents only about one-third of a standard deviation—a clinically or practically modest distinction. The tight confidence interval confirms precision in estimation despite the small magnitude.

Context

Effect size complements p-values by addressing "how much

Data Table

Test Results

t-test statistics, p-value, and detailed results table

Metric	Value
t-statistic	-5.3980
Degrees of Freedom	997.98
p-value	0.0000
Mean Difference	-5.095
95% CI Lower	-6.947
95% CI Upper	-3.243
Cohen's d	-0.341
Effect Magnitude	Small

Interpretation

Purpose

This section presents the statistical hypothesis test results comparing values between female and male groups. It determines whether observed differences are statistically significant or likely due to random variation, providing the quantitative foundation for rejecting or accepting the null hypothesis of equal population means.

Key Findings

t-statistic: -5.398 - Indicates males score approximately 5.4 standard errors higher than females, with the negative sign reflecting the direction of difference
p-value: 0.0000 (8.42e-08) - Extremely small probability that this difference occurred by chance alone
Degrees of Freedom: 997.98 - Reflects the large sample size (n=1000) providing robust statistical power
Significance: TRUE - Result meets the conventional α=0.05 threshold for statistical significance

Interpretation

The Welch's t-test conclusively demonstrates a statistically significant difference between groups. With a p-value far below 0.05, we reject the null hypothesis that female and male means are equal. The mean difference of -5.095 points (95% CI: -6.947 to -3.243) indicates males consistently score higher. However, Cohen's d of -0.341 reveals this difference is practically small in magnitude, suggesting statistical significance does not necessarily imply large real-world impact.

Context

Data Table

Summary Statistics

Descriptive statistics for each group

Group	N	Mean	SD	Median	IQR	Min	Max
female	518	63.63	15.49	65	20	0	100
male	482	68.73	14.36	69	20	27	100

Interpretation

Purpose

This section provides descriptive statistics for each group to establish baseline characteristics before statistical comparison. By reporting both mean and median alongside standard deviation, it enables assessment of central tendency and spread—critical for understanding whether the groups differ systematically and whether the data meet assumptions for parametric testing.

Key Findings

Female Group (n=518): Mean=63.63, SD=15.49, Median=65 — slightly lower central tendency with comparable variability
Male Group (n=482): Mean=68.73, SD=14.36, Median=69 — approximately 5-point higher mean with marginally tighter spread
Distributional Symmetry: Both groups show near-zero skewness (0.02), indicating symmetric distributions despite Shapiro-Wilk test violations

Interpretation

The 5.1-point mean difference (males higher) forms the basis for the subsequent t-test comparison. Both groups exhibit similar spread (SD ~15), supporting the equal variances assumption confirmed by the F-test (p=0.090). Median values closely track means, suggesting minimal outlier influence despite non-normality flags. This consistency between mean and median strengthens confidence in the parametric test results.

Context

Non-normality detected via Shapiro-Wilk tests (p<0.05) reflects sensitivity