Analysis Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| n_components | 4 | n_components |
| scale_data | TRUE | scale_data |
| variance_threshold | 0.8 | variance_threshold |
Purpose
This PCA analysis reduces 4 marketing and sales features into 2 principal components to identify the primary dimensions of variation in the dataset. By compressing the feature space while retaining 75.4% of total variance, the analysis enables simpler visualization and interpretation of marketing spend patterns without losing critical information.
Key Findings
- Variance Explained (PC1 + PC2): 75.4% — Two components capture three-quarters of all variation, validating the dimensionality reduction strategy
- PC1 Dominance: 48% variance — Driven primarily by feature_4 (−0.70 loading) and feature_1 (−0.56 loading), representing the strongest axis of differentiation
- PC2 Contribution: 27.4% variance — Feature_3 (−0.80 loading) and feature_2 (0.60 loading) define the secondary dimension
- Data Quality: 100% retention across 200 observations — No missing values compromised the analysis
Interpretation
The analysis successfully distills marketing spend and sales variation into two interpretable dimensions. PC1 appears to capture a scale or intensity factor (negative loadings on features 1 and 4), while PC2 represents a contrast between feature_3 and feature_2. The 75.4% variance threshold
Data preprocessing and column mapping
Purpose
This section documents the data quality and retention outcomes during preprocessing for the PCA analysis. Perfect retention is critical for dimensionality reduction, as PCA requires complete feature matrices to compute meaningful variance structures across all 200 marketing observations.
Key Findings
- Retention Rate: 100% (200/200 rows preserved) - All observations successfully passed quality checks with no exclusions
- Rows Removed: 0 - No data loss occurred during cleaning or standardization procedures
- Data Completeness: Full dataset available for PCA computation across all 4 marketing features
- Train/Test Split: Not applicable - PCA is unsupervised and operates on the complete dataset without partitioning
Interpretation
The perfect retention rate indicates robust data quality in the marketing spend and sales dataset. No missing values or anomalies triggered removal, allowing the full 200-observation sample to contribute to principal component calculations. This maximizes statistical power for identifying variance dimensions and ensures the 75.4% cumulative variance explained by PC1 and PC2 is based on complete information rather than imputed or filtered data.
Context
While 100% retention is favorable, the analysis assumes features were already standardized (scale_data=TRUE) during PCA execution. The lack of train/test splitting reflects PCA's unsupervised nature; however, this means no independent validation
Executive Summary
Executive summary of PCA findings
| Finding | Value |
|---|---|
| Features Analyzed | 4 |
| Recommended Components | 2 |
| Variance Captured | 75.4% |
| PC1 Variance | 48% |
| Observations Used | 200 |
Key Findings:
• PC1 alone captures 48% of variance — the dominant dimension in the data
• The first 2 components explain 75.4% combined
• 2 component(s) selected via Kaiser criterion and variance threshold
• 4 features reduced to 2 dimension(s) — 50% dimensionality reduction
Recommendation: Use the top 2 component(s) as input features for downstream models (clustering, classification, regression). Review the loadings heatmap to give each component a meaningful business name.
Purpose
This PCA analysis successfully reduced 4 marketing spend and sales features into 2 principal components, achieving the stated objective of identifying key dimensions of variation in the dataset. The analysis enables downstream modeling with simplified feature space while retaining meaningful variance structure.
Key Findings
- Variance Captured: 75.4% across 2 components—just below the typical 80% threshold but acceptable for dimensionality reduction
- PC1 Dominance: First component alone explains 48% of total variance, indicating one strong underlying dimension drives most variation
- Dimensionality Reduction: 4 features compressed to 2 components (50% reduction) with minimal information loss
- Feature Contributions: PC1 heavily weighted by feature_4 (−0.70) and feature_1 (−0.56); PC2 driven by feature_3 (−0.80) and feature_2 (0.60)
Interpretation
The analysis reveals that marketing spend and sales data cluster around two primary dimensions of variation. PC1 represents a scale or magnitude dimension (negative loadings suggest inverse relationships), while PC2 captures a distinct orthogonal pattern. Together, these components preserve three-quarters of the original information, making them suitable for clustering or predictive modeling without substantial degradation.
Context
PCA assumes linear relationships and requires standard
Scree Plot
Variance explained by each principal component
Purpose
The scree plot visualizes the variance contribution of each principal component, helping identify which dimensions capture the most meaningful variation in marketing spend and sales data. This section is critical for determining dimensionality reduction effectiveness—showing how much information is retained when moving from 4 original features to fewer principal components.
Key Findings
- PC1 Variance: 48% - The first component alone captures nearly half of all variation, indicating a dominant underlying dimension in the marketing data
- Top 2 Components: 75.4% cumulative variance - Two components together retain three-quarters of total information, suggesting substantial dimensionality reduction is possible
- Eigenvalue Decline: Drops sharply from 1.92 (PC1) to 0.12 (PC4), with PC3 and PC4 contributing only 24.6% combined—indicating a clear elbow point after PC2
Interpretation
The scree plot reveals a strong concentration of variance in the first two components, supporting the PCA recommendation to retain only PC1 and PC2. This pattern suggests the four original marketing features are highly correlated and can be effectively compressed into two uncorrelated dimensions without substantial information loss. The steep drop-off after PC2 indicates that PC3 and PC4 capture only marginal, increasingly redundant variation.
Context
PCA assumes linear relationships among
PC Score Plot
Observations projected onto the first two principal components
Purpose
This score plot projects 200 marketing observations onto the first two principal components, revealing the underlying structure of variation in marketing spend and sales data. By visualizing observations in reduced dimensional space, the plot identifies natural groupings, outliers, and patterns that would be invisible when examining the original four features individually.
Key Findings
- Axes Variance: 75.4% - The first two components capture three-quarters of total variance, indicating strong dimensionality reduction without substantial information loss
- PC1 Range: -3.57 to 2.69 (SD=1.39) - Captures 48% of variance; shows left-skewed distribution with one notable negative outlier (row 197 at -3.0)
- PC2 Range: -1.4 to 2.64 (SD=1.05) - Captures 27.4% of variance; more symmetric distribution suggesting balanced secondary variation
- Observation Spread: Points distributed across all quadrants with no obvious clustering, indicating continuous variation rather than discrete market segments
Interpretation
The relatively even scatter across PC space suggests marketing spend and sales metrics vary continuously across the 200 observations rather than forming distinct clusters. The left-skewed PC1 distribution indicates one observation exhibits an extreme pattern in the primary dimension of variation—likely representing either an outlier or a genuinely
Variable Loadings
Contribution of each original variable to each principal component
Purpose
The loadings heatmap reveals how each of the 4 original marketing features contributes to the principal components. By identifying which variables load strongly together on the same component, this section enables you to assign business meaning to the mathematical dimensions—transforming abstract PCs into interpretable marketing dimensions that explain variation in spend and sales patterns.
Key Findings
- PC1 Dominance: Feature_4 (-0.70) and Feature_1 (-0.56) drive PC1 most strongly, suggesting these variables move together and represent the primary axis of variation (48% of total variance)
- PC2 Contrast: Feature_3 (-0.80) and Feature_2 (0.60) load oppositely on PC2, indicating they represent a contrasting dimension capturing 27.4% of variance
- Feature_1 Versatility: Loads meaningfully across PC1 (-0.56), PC3 (0.66), and PC4 (0.51), showing it contributes to multiple dimensions
- Loading Range: Values span -0.80 to 0.66, indicating moderate to strong contributions across all components
Interpretation
The negative loadings on PC1 (Feature_4, Feature_1) suggest these marketing metrics increase together in one direction. PC2's opposing loadings reveal a trade-off dynamic
Cumulative Variance
Cumulative variance explained as more components are added
Purpose
This section quantifies the trade-off between dimensionality reduction and information retention. It demonstrates how many principal components are needed to capture meaningful variance in the marketing spend and sales data, guiding decisions about which components to retain for downstream analysis or visualization.
Key Findings
- PC1 Alone: Captures 48% of variance—insufficient for comprehensive representation of data structure
- PC1 + PC2 Combined: Captures 75.4% of total variance—falls slightly short of the 80% threshold but represents the recommended balance point
- Diminishing Returns: Adding PC3 reaches 97.1% cumulative variance, but the marginal gain (21.6%) comes at the cost of losing dimensionality reduction benefits
- Threshold Gap: The 4.6 percentage point shortfall from the 80% target reflects a practical trade-off between parsimony and completeness
Interpretation
The analysis reveals that two components effectively summarize three-quarters of the variation in the original four features. This suggests the underlying marketing and sales metrics share substantial covariance—likely reflecting common business drivers. While the 75.4% figure falls modestly below the 80% threshold, retaining only two dimensions reduces the feature space by 50% while preserving most meaningful variation, making it suitable for visualization and interpretation of marketing dynamics.
Context
P
Component Summary
Summary statistics for each principal component
| Component | Eigenvalue | Variance_Pct | Cumulative_Pct | Recommended |
|---|---|---|---|---|
| PC1 | 1.92 | 48% | 48% | ✓ Retain |
| PC2 | 1.097 | 27.4% | 75.4% | ✓ Retain |
| PC3 | 0.865 | 21.6% | 97.1% | |
| PC4 | 0.117 | 2.9% | 100% |
Purpose
This section identifies which principal components merit retention based on statistical criteria. It shows how much variance each component captures and whether it meets the Kaiser criterion (eigenvalue > 1 for scaled data). This directly supports the marketing analytics objective by determining how many dimensions are needed to represent the key variation in marketing spend and sales data.
Key Findings
- Recommended Components: 2 components retain 75.4% of total variance, meeting both Kaiser and variance thresholds
- PC1 Eigenvalue: 1.92 (largest) — captures 48% of variance independently, indicating a dominant dimension of variation
- PC2 Eigenvalue: 1.1 — contributes an additional 27.4%, bringing cumulative variance to 75.4%
- Variance Drop-off: PC3 (eigenvalue 0.86) and PC4 (eigenvalue 0.12) fall below the Kaiser threshold, indicating diminishing information value
Interpretation
The two-component solution efficiently summarizes the marketing dataset's structure. PC1 represents nearly half the total variation, while PC2 adds substantial explanatory power. Together, they capture three-quarters of the data's variance while eliminating noise from weaker components. This 2D representation enables simplified visualization and analysis of marketing spend-sales relationships without substantial information loss.
Context
The Kaiser
Top Variable Loadings
Top variable loadings per component (top 3 by absolute value)
| Component | Variable | Loading | Abs_Loading |
|---|---|---|---|
| PC1 | Sales | -0.6984 | 0.6984 |
| PC1 | TikTok | -0.556 | 0.556 |
| PC1 | -0.3783 | 0.3783 | |
| PC2 | Google Ads | -0.7986 | 0.7986 |
| PC2 | 0.6 | 0.6 | |
| PC2 | Sales | -0.0482 | 0.0482 |
| PC3 | TikTok | 0.6595 | 0.6595 |
| PC3 | -0.6028 | 0.6028 | |
| PC3 | Google Ads | -0.447 | 0.447 |
| PC4 | Sales | -0.7129 | 0.7129 |
| PC4 | TikTok | 0.5058 | 0.5058 |
| PC4 | 0.3653 | 0.3653 |
Purpose
This section identifies which original features most strongly define each principal component by ranking variables by absolute loading magnitude. High absolute loadings (near ±1) reveal the core drivers of variation in each dimension, enabling interpretation of what each PC represents in the marketing spend and sales context.
Key Findings
- PC1 Dominance: feature_4 (−0.70) and feature_1 (−0.56) are the primary drivers, both with negative loadings, indicating they move inversely with PC1 scores
- PC2 Structure: feature_3 (−0.80) shows the strongest single loading across all components, with feature_2 (0.60) providing contrasting positive direction
- feature_2 Consistency: Appears in top 3 for all four components (loading range: −0.60 to 0.60), suggesting it contributes to multiple dimensions of variation
- Loading Strength: Mean absolute loading of 0.53 indicates moderate-to-strong variable contributions; no feature is negligible
Interpretation
The analysis reveals that marketing spend and sales variation is primarily captured by feature_4 and feature_1 (PC1: 48% variance), with feature_3 providing orthogonal contrast (PC2: 27.4% variance). The consistent appearance of feature_2 across components suggests