How to Use Customer Cohort Retention Analysis in Stripe: Step-by-Step Tutorial

Category: Stripe Analytics | Reading Time: 12 minutes

Introduction to Customer Cohort Retention Analysis

Understanding customer retention is one of the most critical metrics for any subscription-based or recurring revenue business using Stripe. While overall retention rates give you a snapshot, customer cohort retention analysis reveals the deeper story: which groups of customers stay loyal over time, and what characteristics drive long-term engagement and revenue.

A cohort is simply a group of customers who share a common characteristic during a specific time period—most commonly, customers who made their first purchase in the same month. By tracking how these cohorts behave over time, you can answer crucial questions:

This tutorial will walk you through performing customer cohort retention analysis using your Stripe data, from data extraction to actionable insights. Whether you're a product manager, data analyst, or founder, you'll learn how to build retention tables that reveal what truly drives customer loyalty in your business.

Prerequisites and Data Requirements

Before diving into cohort retention analysis, ensure you have the following prerequisites in place:

Technical Requirements

Data Requirements

Your cohort retention analysis will need the following Stripe data objects:

Knowledge Requirements

Familiarity with these concepts will help you interpret results more effectively:

Step 1: Extract Stripe Customer Data

The first step in cohort retention analysis is extracting the necessary customer and subscription data from Stripe. You can do this through the Stripe API or by exporting data from the dashboard.

Option A: Using Stripe API (Recommended for Automation)

Here's a Python example using the Stripe library to extract customer and subscription data:

import stripe
import pandas as pd
from datetime import datetime

# Initialize Stripe with your API key
stripe.api_key = 'sk_test_your_api_key_here'

# Function to extract all customers
def get_all_customers():
    customers = []
    has_more = True
    starting_after = None

    while has_more:
        if starting_after:
            response = stripe.Customer.list(limit=100, starting_after=starting_after)
        else:
            response = stripe.Customer.list(limit=100)

        customers.extend(response.data)
        has_more = response.has_more

        if has_more:
            starting_after = response.data[-1].id

    return customers

# Function to extract subscription data for each customer
def get_customer_subscriptions(customer_id):
    subscriptions = stripe.Subscription.list(customer=customer_id, limit=100)
    return subscriptions.data

# Extract all customers
print("Extracting customer data...")
all_customers = get_all_customers()

# Build dataset
customer_data = []
for customer in all_customers:
    subscriptions = get_customer_subscriptions(customer.id)

    for sub in subscriptions:
        customer_data.append({
            'customer_id': customer.id,
            'customer_created': datetime.fromtimestamp(customer.created),
            'subscription_id': sub.id,
            'subscription_start': datetime.fromtimestamp(sub.created),
            'subscription_status': sub.status,
            'subscription_canceled_at': datetime.fromtimestamp(sub.canceled_at) if sub.canceled_at else None,
            'plan_id': sub.plan.id if sub.plan else None,
            'mrr': sub.plan.amount / 100 if sub.plan else 0
        })

# Create DataFrame
df = pd.DataFrame(customer_data)
print(f"Extracted {len(df)} subscription records from {len(all_customers)} customers")
df.to_csv('stripe_customer_cohort_data.csv', index=False)

Expected Output

Extracting customer data...
Extracted 1,247 subscription records from 1,089 customers
Data saved to stripe_customer_cohort_data.csv

Option B: Using Stripe Dashboard Export

For smaller datasets or one-time analysis:

  1. Log into your Stripe Dashboard
  2. Navigate to Customers → Export
  3. Select "All customers" and download CSV
  4. Navigate to Billing → Subscriptions → Export
  5. Download subscription data CSV
  6. Join these datasets using customer_id

Step 2: Build Cohort Retention Tables

Now that you have your Stripe data extracted, the next step is to transform it into cohort retention tables. This involves grouping customers by their acquisition month and tracking their activity over subsequent months.

Define Your Cohort and Retention Criteria

First, decide on your cohort definition and what "retained" means for your business:

Python Implementation

import pandas as pd
import numpy as np

# Load your extracted data
df = pd.read_csv('stripe_customer_cohort_data.csv')
df['customer_created'] = pd.to_datetime(df['customer_created'])
df['subscription_start'] = pd.to_datetime(df['subscription_start'])
df['subscription_canceled_at'] = pd.to_datetime(df['subscription_canceled_at'])

# Create cohort month (month of first subscription)
df['cohort_month'] = df.groupby('customer_id')['subscription_start'].transform('min').dt.to_period('M')

# For each customer, determine active months
def get_active_months(row):
    start = row['subscription_start']
    end = row['subscription_canceled_at'] if pd.notna(row['subscription_canceled_at']) else pd.Timestamp.now()

    # Generate monthly periods between start and end
    return pd.period_range(start=start.to_period('M'), end=end.to_period('M'), freq='M')

# Create a row for each customer-month combination
customer_months = []
for _, row in df.iterrows():
    active_months = get_active_months(row)
    for month in active_months:
        customer_months.append({
            'customer_id': row['customer_id'],
            'cohort_month': row['cohort_month'],
            'active_month': month
        })

df_activity = pd.DataFrame(customer_months)

# Calculate cohort period (months since acquisition)
df_activity['cohort_period'] = (df_activity['active_month'] - df_activity['cohort_month']).apply(lambda x: x.n)

# Build retention table
cohort_counts = df_activity.groupby(['cohort_month', 'cohort_period'])['customer_id'].nunique().reset_index()
cohort_counts.columns = ['cohort_month', 'cohort_period', 'customers']

# Pivot to create retention matrix
retention_matrix = cohort_counts.pivot(index='cohort_month', columns='cohort_period', values='customers')

# Calculate retention percentages
cohort_sizes = retention_matrix.iloc[:, 0]
retention_pct = retention_matrix.divide(cohort_sizes, axis=0) * 100

print("\nCohort Retention Percentages:")
print(retention_pct.round(1))

Expected Output

Cohort Retention Percentages:
cohort_period    0      1      2      3      4      5      6
cohort_month
2024-01       100.0   85.3   78.4   72.1   68.9   65.2   62.8
2024-02       100.0   87.2   80.1   75.3   71.4   68.0   64.5
2024-03       100.0   88.5   82.7   77.9   73.6   70.1   NaN
2024-04       100.0   86.9   81.2   76.5   72.8   NaN    NaN
2024-05       100.0   89.1   83.4   78.2   NaN    NaN    NaN
2024-06       100.0   87.8   82.0   NaN    NaN    NaN    NaN

This table shows the percentage of customers from each cohort who remained active in subsequent months. For example, 85.3% of customers acquired in January 2024 were still active in month 1 (February), and 62.8% were still active in month 6 (July).

Step 3: Interpret Your Cohort Retention Results

Understanding your cohort retention data is where the real value emerges. Let's break down how to read your retention tables and extract actionable insights.

Key Metrics to Analyze

1. Cohort Retention Curves

Look at how retention declines over time for each cohort. A healthy SaaS business typically shows:

2. Cohort Performance Comparison

Compare retention rates across different cohorts to identify trends:

# Calculate average retention by cohort period across all cohorts
avg_retention_by_period = retention_pct.mean(axis=0)

print("\nAverage Retention by Month:")
print(avg_retention_by_period.round(1))

# Identify best and worst performing cohorts at Month 3
month_3_retention = retention_pct[3].dropna().sort_values(ascending=False)
print("\nMonth 3 Retention by Cohort:")
print(month_3_retention.round(1))

# Calculate retention improvement/decline
recent_cohorts = retention_pct.iloc[-3:, 3].mean()
older_cohorts = retention_pct.iloc[:3, 3].mean()
improvement = ((recent_cohorts - older_cohorts) / older_cohorts) * 100

print(f"\nRetention trend: {improvement:+.1f}% change in Month 3 retention")

3. Retention Curve Shape Analysis

The shape of your retention curve reveals important insights:

Advanced Analysis: Segment Your Cohorts

For deeper insights, segment your cohorts by additional dimensions:

# Add segmentation by plan type, acquisition channel, etc.
# Assuming you have this metadata in your original Stripe data

# Example: Retention by plan tier
df_with_plan = df.merge(df[['customer_id', 'plan_id']].drop_duplicates(), on='customer_id')

# Segment cohort analysis by plan
for plan in df_with_plan['plan_id'].unique():
    plan_data = df_with_plan[df_with_plan['plan_id'] == plan]
    # Repeat cohort retention calculation for this segment
    print(f"\nRetention for {plan}:")
    # ... (use same cohort calculation logic)

This segmentation helps answer questions like:

For more advanced techniques on evaluating differences between cohorts, consider applying principles from A/B testing statistical significance to ensure observed differences are meaningful.

Step 4: Visualize Your Cohort Data

Visual representation makes cohort retention patterns immediately apparent. Here are effective visualization techniques:

Retention Heatmap

import matplotlib.pyplot as plt
import seaborn as sns

# Create heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(retention_pct, annot=True, fmt='.1f', cmap='RdYlGn',
            cbar_kws={'label': 'Retention %'}, vmin=0, vmax=100)
plt.title('Customer Cohort Retention Analysis - Stripe', fontsize=16, fontweight='bold')
plt.xlabel('Months Since Acquisition', fontsize=12)
plt.ylabel('Cohort (Acquisition Month)', fontsize=12)
plt.tight_layout()
plt.savefig('cohort_retention_heatmap.png', dpi=300)
plt.show()

Retention Curves Line Chart

# Plot retention curves for each cohort
plt.figure(figsize=(12, 6))
for cohort in retention_pct.index:
    plt.plot(retention_pct.columns, retention_pct.loc[cohort], marker='o', label=str(cohort))

plt.xlabel('Months Since Acquisition', fontsize=12)
plt.ylabel('Retention Rate (%)', fontsize=12)
plt.title('Cohort Retention Curves Over Time', fontsize=16, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('cohort_retention_curves.png', dpi=300)
plt.show()

These visualizations make it easy to spot trends, outliers, and patterns that might not be obvious in raw numbers.

Step 5: Automate Your Analysis with MCP Analytics

While building cohort retention analysis manually is educational, maintaining these analyses over time can be time-consuming. MCP Analytics provides automated cohort retention analysis specifically designed for Stripe data.

Ready to Automate Your Cohort Analysis?

Get instant cohort retention insights from your Stripe data without writing code. MCP Analytics automatically:

  • Extracts and processes your Stripe customer and subscription data
  • Builds cohort retention tables with customizable time periods
  • Generates heatmaps and retention curves automatically
  • Segments cohorts by plan, channel, and custom attributes
  • Updates in real-time as your Stripe data changes
  • Provides statistical significance testing for cohort differences

Try Stripe Cohort Retention Analysis →

For businesses looking for comprehensive analytics coverage, explore our Stripe Customer Cohort Retention service for dedicated support and custom analysis.

Troubleshooting Common Issues

Issue 1: Incomplete or Missing Customer Data

Symptom: Your cohort table has unexpectedly low customer counts or missing cohorts.

Solution:

# Add error handling and logging
import logging
logging.basicConfig(level=logging.INFO)

def get_all_customers_safe():
    customers = []
    has_more = True
    starting_after = None

    try:
        while has_more:
            response = stripe.Customer.list(limit=100, starting_after=starting_after)
            customers.extend(response.data)
            has_more = response.has_more

            if has_more:
                starting_after = response.data[-1].id

            logging.info(f"Fetched {len(customers)} customers so far...")

    except stripe.error.RateLimitError as e:
        logging.error("Rate limit exceeded. Wait and retry.")
        raise
    except stripe.error.APIError as e:
        logging.error(f"API error: {e}")
        raise

    return customers

Issue 2: Retention Percentages Over 100%

Symptom: Some cohort periods show retention rates above 100%.

Solution: This occurs when customers with multiple subscriptions are counted multiple times. Ensure you're counting unique customers, not subscriptions:

# Use nunique() instead of count()
cohort_counts = df_activity.groupby(['cohort_month', 'cohort_period'])['customer_id'].nunique().reset_index()

Issue 3: NaN Values in Recent Cohorts

Symptom: Recent cohorts show NaN for later time periods.

Solution: This is expected—recent cohorts haven't existed long enough to have data for later periods. When calculating averages, use .dropna() or focus on cohorts with sufficient maturity.

Issue 4: Inconsistent Retention Definitions

Symptom: Retention numbers don't align with business expectations or other reports.

Solution: Clearly define "active" for your business:

Issue 5: Subscription Status Complexity

Symptom: Customers with "past_due" or "unpaid" status are being counted inconsistently.

Solution: Define which subscription statuses count as "retained":

# Define active statuses
ACTIVE_STATUSES = ['active', 'trialing']
# Or include grace period statuses
ACTIVE_STATUSES = ['active', 'trialing', 'past_due']

# Filter subscriptions
df_filtered = df[df['subscription_status'].isin(ACTIVE_STATUSES)]

Understanding these nuances is similar to challenges faced in other analytical approaches like Accelerated Failure Time (AFT) modeling, where defining the event of interest precisely is critical.

Next Steps and Advanced Techniques

Once you've mastered basic cohort retention analysis, consider these advanced techniques to deepen your insights:

1. Revenue Cohort Analysis

Instead of tracking customer count retention, track revenue retention:

# Calculate MRR retention instead of customer retention
df_revenue = df_activity.merge(df[['customer_id', 'active_month', 'mrr']],
                               on=['customer_id', 'active_month'])
cohort_revenue = df_revenue.groupby(['cohort_month', 'cohort_period'])['mrr'].sum()
# Proceed with same pivot and percentage calculation

2. Predictive Retention Modeling

Use machine learning to predict which customers are at risk of churning. Techniques like AdaBoost can identify complex patterns in customer behavior that signal churn risk.

3. Multi-Dimensional Cohort Analysis

Segment cohorts by multiple dimensions simultaneously:

4. Integrate with Product Analytics

Combine Stripe retention data with product usage metrics to understand the relationship between feature adoption and retention. Modern AI-first data analysis pipelines can help automate these cross-platform insights.

5. Cohort-Based Forecasting

Use retention curves to forecast future revenue:

# Project future revenue based on retention patterns
def forecast_cohort_revenue(cohort_size, avg_mrr, retention_curve):
    months = len(retention_curve)
    forecast = []
    for month in range(months):
        retained_customers = cohort_size * (retention_curve[month] / 100)
        revenue = retained_customers * avg_mrr
        forecast.append(revenue)
    return forecast

6. Automate Reporting

Set up automated dashboards and alerts:

Resources for Continued Learning

Conclusion

Customer cohort retention analysis is one of the most powerful analytical techniques for understanding the long-term health of your subscription business. By tracking how different groups of customers behave over time, you can:

While this tutorial walked you through the manual process of building cohort retention analysis from Stripe data, remember that automated solutions like MCP Analytics can save significant time while providing deeper, more frequent insights.

Start analyzing your Stripe cohort retention today, and use the insights to build a more sustainable, predictable revenue model for your business.