DIMENSIONALITY

Principal Component Analysis (PCA)

PCA using prcomp with automatic selection of first 3 components. Includes cos² quality metrics, IQR-based outlier detection, and comprehensive loading analysis.

What Makes This Powerful

Quality Metrics (Cos²)

Calculates cos² values showing how well each observation is represented in PC space. Quality ratings: Poor (<0.5), Fair (0.5-0.7), Good (0.7-0.9), Excellent (>0.9).

Outlier Detection

IQR-based outlier detection using Euclidean distance from origin in PC space. Threshold = Q3 + 1.5*IQR. Reports outlier count and percentage.

Component Analysis

Fixed 3 components (or fewer if features < 3). Squared loadings for feature contributions. Scree plot with eigenvalues and cumulative variance.

What You Need to Provide

Numeric features required

Provide features array with column names. Set scale_data=true (default) for standardization. Complete case analysis removes rows with any missing values.

Algorithm uses prcomp with center=FALSE, scale=FALSE (after pre-scaling if requested). Automatically selects min(3, n_features) components. Calculates cos² quality metrics and IQR-based outlier detection.

Schema Preview / rows = observations, cols = features

Quick Specs

ComponentsFixed 3 (or fewer)
ScalingOptional (default true)
QualityCos² metrics included
OutliersIQR-based detection

How We Reduce and Interpret

From scaling to biplots

1

Data Preparation

Complete case analysis (remove NA rows). Optional scaling using scale() function. Apply prcomp with center=FALSE, scale=FALSE.

2

Component Analysis

Calculate variance explained (eigenvalues), proportion and cumulative variance. Select min(3, n_features) components. Extract loadings and PC scores.

3

Quality & Outliers

Calculate cos² (PC distances/original distances) for quality. Detect outliers using IQR method on Euclidean distances. Create biplot data with PC1 and PC2.

Why This Analysis Matters

PCA with enhanced quality metrics (cos²) and outlier detection, providing interpretable dimensionality reduction with automatic component selection.

Cos² values quantify representation quality for each observation. IQR-based outlier detection identifies unusual patterns. Fixed 3-component selection simplifies interpretation. Squared loadings show feature contributions to each component.

Note: Uses prcomp with center=FALSE, scale=FALSE (after optional pre-scaling). Components fixed at min(3, n_features). Complete case analysis removes any rows with missing values. Quality ratings: Poor/Fair/Good/Excellent based on cos² thresholds.

Ready to Reduce Dimensions?

Summarize structure and accelerate modeling

Read the article: PCA