How to Use Customer Cohort Retention Analysis in Stripe: Step-by-Step Tutorial
Introduction to Customer Cohort Retention Analysis
Understanding customer retention is one of the most critical metrics for any subscription-based or recurring revenue business using Stripe. While overall retention rates give you a snapshot, customer cohort retention analysis reveals the deeper story: which groups of customers stay loyal over time, and what characteristics drive long-term engagement and revenue.
A cohort is simply a group of customers who share a common characteristic during a specific time period—most commonly, customers who made their first purchase in the same month. By tracking how these cohorts behave over time, you can answer crucial questions:
- Are customers acquired in January more loyal than those acquired in June?
- How long does the average customer stay subscribed?
- Which acquisition channels or campaigns produce the most loyal customers?
- Is our retention improving or declining over time?
- What percentage of revenue comes from customers in their first year versus later years?
This tutorial will walk you through performing customer cohort retention analysis using your Stripe data, from data extraction to actionable insights. Whether you're a product manager, data analyst, or founder, you'll learn how to build retention tables that reveal what truly drives customer loyalty in your business.
Prerequisites and Data Requirements
Before diving into cohort retention analysis, ensure you have the following prerequisites in place:
Technical Requirements
- Stripe Account Access: You need read access to your Stripe dashboard and API keys (preferably restricted keys with read-only permissions)
- Data Analysis Tool: Python with pandas, SQL database, or a BI tool like Tableau or Looker
- API Knowledge: Basic understanding of REST APIs and JSON (if extracting data programmatically)
- Time Range: At least 6-12 months of customer data for meaningful cohort analysis
Data Requirements
Your cohort retention analysis will need the following Stripe data objects:
- Customers: Customer creation dates, metadata, and attributes
- Subscriptions: Subscription start dates, status, cancellation dates, and plan information
- Invoices/Payments: Payment history to track active revenue-generating periods
- Events: Customer lifecycle events (optional but helpful for detailed analysis)
Knowledge Requirements
Familiarity with these concepts will help you interpret results more effectively:
- Basic cohort analysis principles
- Retention rate calculations
- Customer lifecycle stages
- Understanding of statistical significance in A/B testing to validate cohort differences
Step 1: Extract Stripe Customer Data
The first step in cohort retention analysis is extracting the necessary customer and subscription data from Stripe. You can do this through the Stripe API or by exporting data from the dashboard.
Option A: Using Stripe API (Recommended for Automation)
Here's a Python example using the Stripe library to extract customer and subscription data:
import stripe
import pandas as pd
from datetime import datetime
# Initialize Stripe with your API key
stripe.api_key = 'sk_test_your_api_key_here'
# Function to extract all customers
def get_all_customers():
customers = []
has_more = True
starting_after = None
while has_more:
if starting_after:
response = stripe.Customer.list(limit=100, starting_after=starting_after)
else:
response = stripe.Customer.list(limit=100)
customers.extend(response.data)
has_more = response.has_more
if has_more:
starting_after = response.data[-1].id
return customers
# Function to extract subscription data for each customer
def get_customer_subscriptions(customer_id):
subscriptions = stripe.Subscription.list(customer=customer_id, limit=100)
return subscriptions.data
# Extract all customers
print("Extracting customer data...")
all_customers = get_all_customers()
# Build dataset
customer_data = []
for customer in all_customers:
subscriptions = get_customer_subscriptions(customer.id)
for sub in subscriptions:
customer_data.append({
'customer_id': customer.id,
'customer_created': datetime.fromtimestamp(customer.created),
'subscription_id': sub.id,
'subscription_start': datetime.fromtimestamp(sub.created),
'subscription_status': sub.status,
'subscription_canceled_at': datetime.fromtimestamp(sub.canceled_at) if sub.canceled_at else None,
'plan_id': sub.plan.id if sub.plan else None,
'mrr': sub.plan.amount / 100 if sub.plan else 0
})
# Create DataFrame
df = pd.DataFrame(customer_data)
print(f"Extracted {len(df)} subscription records from {len(all_customers)} customers")
df.to_csv('stripe_customer_cohort_data.csv', index=False)
Expected Output
Extracting customer data...
Extracted 1,247 subscription records from 1,089 customers
Data saved to stripe_customer_cohort_data.csv
Option B: Using Stripe Dashboard Export
For smaller datasets or one-time analysis:
- Log into your Stripe Dashboard
- Navigate to Customers → Export
- Select "All customers" and download CSV
- Navigate to Billing → Subscriptions → Export
- Download subscription data CSV
- Join these datasets using customer_id
Step 2: Build Cohort Retention Tables
Now that you have your Stripe data extracted, the next step is to transform it into cohort retention tables. This involves grouping customers by their acquisition month and tracking their activity over subsequent months.
Define Your Cohort and Retention Criteria
First, decide on your cohort definition and what "retained" means for your business:
- Cohort Definition: Month of first subscription (most common) or first payment
- Retention Definition: Active subscription, made a payment, or generated revenue in a given month
Python Implementation
import pandas as pd
import numpy as np
# Load your extracted data
df = pd.read_csv('stripe_customer_cohort_data.csv')
df['customer_created'] = pd.to_datetime(df['customer_created'])
df['subscription_start'] = pd.to_datetime(df['subscription_start'])
df['subscription_canceled_at'] = pd.to_datetime(df['subscription_canceled_at'])
# Create cohort month (month of first subscription)
df['cohort_month'] = df.groupby('customer_id')['subscription_start'].transform('min').dt.to_period('M')
# For each customer, determine active months
def get_active_months(row):
start = row['subscription_start']
end = row['subscription_canceled_at'] if pd.notna(row['subscription_canceled_at']) else pd.Timestamp.now()
# Generate monthly periods between start and end
return pd.period_range(start=start.to_period('M'), end=end.to_period('M'), freq='M')
# Create a row for each customer-month combination
customer_months = []
for _, row in df.iterrows():
active_months = get_active_months(row)
for month in active_months:
customer_months.append({
'customer_id': row['customer_id'],
'cohort_month': row['cohort_month'],
'active_month': month
})
df_activity = pd.DataFrame(customer_months)
# Calculate cohort period (months since acquisition)
df_activity['cohort_period'] = (df_activity['active_month'] - df_activity['cohort_month']).apply(lambda x: x.n)
# Build retention table
cohort_counts = df_activity.groupby(['cohort_month', 'cohort_period'])['customer_id'].nunique().reset_index()
cohort_counts.columns = ['cohort_month', 'cohort_period', 'customers']
# Pivot to create retention matrix
retention_matrix = cohort_counts.pivot(index='cohort_month', columns='cohort_period', values='customers')
# Calculate retention percentages
cohort_sizes = retention_matrix.iloc[:, 0]
retention_pct = retention_matrix.divide(cohort_sizes, axis=0) * 100
print("\nCohort Retention Percentages:")
print(retention_pct.round(1))
Expected Output
Cohort Retention Percentages:
cohort_period 0 1 2 3 4 5 6
cohort_month
2024-01 100.0 85.3 78.4 72.1 68.9 65.2 62.8
2024-02 100.0 87.2 80.1 75.3 71.4 68.0 64.5
2024-03 100.0 88.5 82.7 77.9 73.6 70.1 NaN
2024-04 100.0 86.9 81.2 76.5 72.8 NaN NaN
2024-05 100.0 89.1 83.4 78.2 NaN NaN NaN
2024-06 100.0 87.8 82.0 NaN NaN NaN NaN
This table shows the percentage of customers from each cohort who remained active in subsequent months. For example, 85.3% of customers acquired in January 2024 were still active in month 1 (February), and 62.8% were still active in month 6 (July).
Step 3: Interpret Your Cohort Retention Results
Understanding your cohort retention data is where the real value emerges. Let's break down how to read your retention tables and extract actionable insights.
Key Metrics to Analyze
1. Cohort Retention Curves
Look at how retention declines over time for each cohort. A healthy SaaS business typically shows:
- Month 1 retention: 80-90% (strong onboarding)
- Month 3 retention: 70-80% (product-market fit)
- Month 6 retention: 60-75% (long-term value)
- Month 12 retention: 50-70% (loyal customer base)
2. Cohort Performance Comparison
Compare retention rates across different cohorts to identify trends:
# Calculate average retention by cohort period across all cohorts
avg_retention_by_period = retention_pct.mean(axis=0)
print("\nAverage Retention by Month:")
print(avg_retention_by_period.round(1))
# Identify best and worst performing cohorts at Month 3
month_3_retention = retention_pct[3].dropna().sort_values(ascending=False)
print("\nMonth 3 Retention by Cohort:")
print(month_3_retention.round(1))
# Calculate retention improvement/decline
recent_cohorts = retention_pct.iloc[-3:, 3].mean()
older_cohorts = retention_pct.iloc[:3, 3].mean()
improvement = ((recent_cohorts - older_cohorts) / older_cohorts) * 100
print(f"\nRetention trend: {improvement:+.1f}% change in Month 3 retention")
3. Retention Curve Shape Analysis
The shape of your retention curve reveals important insights:
- Steep initial drop, then flattening: Normal pattern; focus on improving onboarding to reduce early churn
- Gradual, consistent decline: Suggests lack of ongoing value; improve feature adoption and engagement
- Improving retention over time: Excellent sign; recent product improvements or market positioning are working
- Declining retention over time: Warning signal; investigate changes in customer quality, product issues, or market competition
Advanced Analysis: Segment Your Cohorts
For deeper insights, segment your cohorts by additional dimensions:
# Add segmentation by plan type, acquisition channel, etc.
# Assuming you have this metadata in your original Stripe data
# Example: Retention by plan tier
df_with_plan = df.merge(df[['customer_id', 'plan_id']].drop_duplicates(), on='customer_id')
# Segment cohort analysis by plan
for plan in df_with_plan['plan_id'].unique():
plan_data = df_with_plan[df_with_plan['plan_id'] == plan]
# Repeat cohort retention calculation for this segment
print(f"\nRetention for {plan}:")
# ... (use same cohort calculation logic)
This segmentation helps answer questions like:
- Do enterprise customers have better retention than self-service customers?
- Which acquisition channels bring the most loyal customers?
- Do annual plans have better retention than monthly plans?
For more advanced techniques on evaluating differences between cohorts, consider applying principles from A/B testing statistical significance to ensure observed differences are meaningful.
Step 4: Visualize Your Cohort Data
Visual representation makes cohort retention patterns immediately apparent. Here are effective visualization techniques:
Retention Heatmap
import matplotlib.pyplot as plt
import seaborn as sns
# Create heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(retention_pct, annot=True, fmt='.1f', cmap='RdYlGn',
cbar_kws={'label': 'Retention %'}, vmin=0, vmax=100)
plt.title('Customer Cohort Retention Analysis - Stripe', fontsize=16, fontweight='bold')
plt.xlabel('Months Since Acquisition', fontsize=12)
plt.ylabel('Cohort (Acquisition Month)', fontsize=12)
plt.tight_layout()
plt.savefig('cohort_retention_heatmap.png', dpi=300)
plt.show()
Retention Curves Line Chart
# Plot retention curves for each cohort
plt.figure(figsize=(12, 6))
for cohort in retention_pct.index:
plt.plot(retention_pct.columns, retention_pct.loc[cohort], marker='o', label=str(cohort))
plt.xlabel('Months Since Acquisition', fontsize=12)
plt.ylabel('Retention Rate (%)', fontsize=12)
plt.title('Cohort Retention Curves Over Time', fontsize=16, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('cohort_retention_curves.png', dpi=300)
plt.show()
These visualizations make it easy to spot trends, outliers, and patterns that might not be obvious in raw numbers.
Step 5: Automate Your Analysis with MCP Analytics
While building cohort retention analysis manually is educational, maintaining these analyses over time can be time-consuming. MCP Analytics provides automated cohort retention analysis specifically designed for Stripe data.
Ready to Automate Your Cohort Analysis?
Get instant cohort retention insights from your Stripe data without writing code. MCP Analytics automatically:
- Extracts and processes your Stripe customer and subscription data
- Builds cohort retention tables with customizable time periods
- Generates heatmaps and retention curves automatically
- Segments cohorts by plan, channel, and custom attributes
- Updates in real-time as your Stripe data changes
- Provides statistical significance testing for cohort differences
For businesses looking for comprehensive analytics coverage, explore our Stripe Customer Cohort Retention service for dedicated support and custom analysis.
Troubleshooting Common Issues
Issue 1: Incomplete or Missing Customer Data
Symptom: Your cohort table has unexpectedly low customer counts or missing cohorts.
Solution:
- Verify your Stripe API key has read access to all necessary resources
- Check if you're hitting API rate limits (Stripe limits to 100 requests per second)
- Ensure pagination is working correctly when fetching large customer lists
- Confirm that test mode vs. live mode data matches your expectations
# Add error handling and logging
import logging
logging.basicConfig(level=logging.INFO)
def get_all_customers_safe():
customers = []
has_more = True
starting_after = None
try:
while has_more:
response = stripe.Customer.list(limit=100, starting_after=starting_after)
customers.extend(response.data)
has_more = response.has_more
if has_more:
starting_after = response.data[-1].id
logging.info(f"Fetched {len(customers)} customers so far...")
except stripe.error.RateLimitError as e:
logging.error("Rate limit exceeded. Wait and retry.")
raise
except stripe.error.APIError as e:
logging.error(f"API error: {e}")
raise
return customers
Issue 2: Retention Percentages Over 100%
Symptom: Some cohort periods show retention rates above 100%.
Solution: This occurs when customers with multiple subscriptions are counted multiple times. Ensure you're counting unique customers, not subscriptions:
# Use nunique() instead of count()
cohort_counts = df_activity.groupby(['cohort_month', 'cohort_period'])['customer_id'].nunique().reset_index()
Issue 3: NaN Values in Recent Cohorts
Symptom: Recent cohorts show NaN for later time periods.
Solution: This is expected—recent cohorts haven't existed long enough to have data for later periods. When calculating averages, use .dropna() or focus on cohorts with sufficient maturity.
Issue 4: Inconsistent Retention Definitions
Symptom: Retention numbers don't align with business expectations or other reports.
Solution: Clearly define "active" for your business:
- Active subscription status in Stripe
- Successful payment in the period
- Revenue generated (excludes trial users)
- Engagement activity (requires integration with product analytics)
Issue 5: Subscription Status Complexity
Symptom: Customers with "past_due" or "unpaid" status are being counted inconsistently.
Solution: Define which subscription statuses count as "retained":
# Define active statuses
ACTIVE_STATUSES = ['active', 'trialing']
# Or include grace period statuses
ACTIVE_STATUSES = ['active', 'trialing', 'past_due']
# Filter subscriptions
df_filtered = df[df['subscription_status'].isin(ACTIVE_STATUSES)]
Understanding these nuances is similar to challenges faced in other analytical approaches like Accelerated Failure Time (AFT) modeling, where defining the event of interest precisely is critical.
Next Steps and Advanced Techniques
Once you've mastered basic cohort retention analysis, consider these advanced techniques to deepen your insights:
1. Revenue Cohort Analysis
Instead of tracking customer count retention, track revenue retention:
# Calculate MRR retention instead of customer retention
df_revenue = df_activity.merge(df[['customer_id', 'active_month', 'mrr']],
on=['customer_id', 'active_month'])
cohort_revenue = df_revenue.groupby(['cohort_month', 'cohort_period'])['mrr'].sum()
# Proceed with same pivot and percentage calculation
2. Predictive Retention Modeling
Use machine learning to predict which customers are at risk of churning. Techniques like AdaBoost can identify complex patterns in customer behavior that signal churn risk.
3. Multi-Dimensional Cohort Analysis
Segment cohorts by multiple dimensions simultaneously:
- Acquisition channel × Plan tier
- Geography × Industry vertical
- Company size × Use case
4. Integrate with Product Analytics
Combine Stripe retention data with product usage metrics to understand the relationship between feature adoption and retention. Modern AI-first data analysis pipelines can help automate these cross-platform insights.
5. Cohort-Based Forecasting
Use retention curves to forecast future revenue:
# Project future revenue based on retention patterns
def forecast_cohort_revenue(cohort_size, avg_mrr, retention_curve):
months = len(retention_curve)
forecast = []
for month in range(months):
retained_customers = cohort_size * (retention_curve[month] / 100)
revenue = retained_customers * avg_mrr
forecast.append(revenue)
return forecast
6. Automate Reporting
Set up automated dashboards and alerts:
- Weekly cohort retention reports sent to stakeholders
- Alerts when retention drops below thresholds
- Automated cohort comparison reports
Resources for Continued Learning
- MCP Analytics Stripe Cohort Retention Tool - Automated analysis
- Stripe Cohort Retention Service - Professional support and custom analysis
- Stripe API Documentation - Official reference for data extraction
- Cohort Analysis Best Practices - Industry benchmarks and standards
Conclusion
Customer cohort retention analysis is one of the most powerful analytical techniques for understanding the long-term health of your subscription business. By tracking how different groups of customers behave over time, you can:
- Identify which acquisition periods or channels produce the most loyal customers
- Measure the impact of product improvements on retention
- Forecast future revenue with greater accuracy
- Spot early warning signs of retention problems
- Make data-driven decisions about customer acquisition costs and lifetime value
While this tutorial walked you through the manual process of building cohort retention analysis from Stripe data, remember that automated solutions like MCP Analytics can save significant time while providing deeper, more frequent insights.
Start analyzing your Stripe cohort retention today, and use the insights to build a more sustainable, predictable revenue model for your business.