Step 1: Understand Fraud Pattern Detection Fundamentals

Fraud in payment systems follows predictable patterns. Understanding these patterns is the first step toward effective detection and prevention.

Common Fraud Patterns in Stripe Transactions

1. Card Testing (Carding)

Fraudsters use automated bots to test stolen card numbers by making small purchases. Key indicators include:

2. Velocity Abuse

Abnormally high transaction frequency indicating fraudulent activity:

3. Geographic Anomalies

Mismatches in geographic data suggesting fraudulent transactions:

4. Amount Pattern Irregularities

Unusual transaction amounts that deviate from normal patterns:

Expected Outcome

After completing this step, you should be able to list 3-5 specific fraud patterns most relevant to your business model and understand the key indicators for each pattern.

Step 2: Access and Export Your Stripe Transaction Data

To analyze fraud patterns effectively, you need comprehensive transaction data. Stripe provides multiple methods for accessing this information.

Method 1: Using the Stripe Dashboard

  1. Log into your Stripe Dashboard at dashboard.stripe.com
  2. Navigate to Payments in the left sidebar
  3. Click the Export button in the top right
  4. Select your date range (recommend starting with the last 90 days)
  5. Choose Detailed export to include all available fields
  6. Select CSV format and click Export

Method 2: Using the Stripe API

For programmatic access and automated analysis, use the Stripe API:

import stripe
import pandas as pd
from datetime import datetime, timedelta

stripe.api_key = 'sk_test_your_key_here'

# Retrieve charges from the last 90 days
ninety_days_ago = int((datetime.now() - timedelta(days=90)).timestamp())

charges = []
starting_after = None

while True:
    if starting_after:
        batch = stripe.Charge.list(
            limit=100,
            created={'gte': ninety_days_ago},
            starting_after=starting_after
        )
    else:
        batch = stripe.Charge.list(
            limit=100,
            created={'gte': ninety_days_ago}
        )

    charges.extend(batch.data)

    if not batch.has_more:
        break

    starting_after = batch.data[-1].id

# Convert to DataFrame for analysis
df = pd.DataFrame([{
    'id': charge.id,
    'amount': charge.amount / 100,  # Convert from cents
    'currency': charge.currency,
    'created': datetime.fromtimestamp(charge.created),
    'customer': charge.customer,
    'email': charge.billing_details.email if charge.billing_details else None,
    'ip': charge.metadata.get('ip_address'),
    'status': charge.status,
    'risk_score': charge.outcome.risk_score if charge.outcome else None,
    'card_country': charge.payment_method_details.card.country if charge.payment_method_details else None,
    'card_fingerprint': charge.payment_method_details.card.fingerprint if charge.payment_method_details else None,
} for charge in charges])

df.to_csv('stripe_transactions.csv', index=False)
print(f"Exported {len(df)} transactions")

Expected Output

You should now have a CSV file containing your transaction data with all relevant fields. A successful export will show:

Exported 1,247 transactions
File saved: stripe_transactions.csv
Columns: 12
Date range: 2024-09-27 to 2024-12-27

Verification

Open your exported file and verify it contains:

Step 3: Analyze Transaction Velocity Patterns

Velocity analysis identifies unusually rapid transaction sequences that often indicate fraud. Legitimate customers rarely make multiple purchases within minutes, while fraudsters testing cards or exploiting accounts do.

Calculate Velocity Metrics

Use this Python script to identify velocity-based fraud patterns:

import pandas as pd
from datetime import timedelta

# Load your transaction data
df = pd.read_csv('stripe_transactions.csv')
df['created'] = pd.to_datetime(df['created'])
df = df.sort_values('created')

# Analyze velocity by card fingerprint
def calculate_velocity(df, group_by, time_window_minutes):
    """
    Calculate transaction velocity for a given grouping
    """
    results = []

    for group_value, group_df in df.groupby(group_by):
        group_df = group_df.sort_values('created')

        for idx, row in group_df.iterrows():
            window_start = row['created'] - timedelta(minutes=time_window_minutes)
            window_transactions = group_df[
                (group_df['created'] >= window_start) &
                (group_df['created'] <= row['created'])
            ]

            velocity = len(window_transactions)

            if velocity > 1:  # Only flag multiple transactions
                results.append({
                    'group_type': group_by,
                    'group_value': group_value,
                    'transaction_id': row['id'],
                    'timestamp': row['created'],
                    'velocity': velocity,
                    'time_window': time_window_minutes,
                    'total_amount': window_transactions['amount'].sum()
                })

    return pd.DataFrame(results)

# Check velocity patterns
velocity_by_card = calculate_velocity(df, 'card_fingerprint', 60)  # 60-minute window
velocity_by_ip = calculate_velocity(df, 'ip', 60)
velocity_by_email = calculate_velocity(df, 'email', 120)  # 2-hour window

# Flag high-risk velocity patterns
high_risk_card_velocity = velocity_by_card[velocity_by_card['velocity'] >= 3]
high_risk_ip_velocity = velocity_by_ip[velocity_by_ip['velocity'] >= 5]

print(f"Suspicious card velocity patterns: {len(high_risk_card_velocity)}")
print(f"Suspicious IP velocity patterns: {len(high_risk_ip_velocity)}")

# Save flagged transactions
high_risk_card_velocity.to_csv('velocity_fraud_flags.csv', index=False)

Expected Output

Suspicious card velocity patterns: 23
Suspicious IP velocity patterns: 17

Sample flagged transactions:
card_fingerprint: Qx7d...kL9p | velocity: 4 transactions in 60 min | total: $187.50
ip: 192.168.1.45 | velocity: 6 transactions in 60 min | total: $423.00
email: test@example.com | velocity: 3 transactions in 120 min | total: $299.99

Interpreting Velocity Results

When reviewing velocity patterns, consider these thresholds:

For understanding the statistical significance of these patterns and implementing more sophisticated detection algorithms, machine learning approaches like AdaBoost can provide powerful fraud classification capabilities.

Step 4: Identify Geographic Anomalies

Geographic mismatches are strong fraud indicators. Fraudsters often use stolen cards from one country while operating from another, creating detectable geographic inconsistencies.

Analyze Geographic Patterns

import pandas as pd

df = pd.read_csv('stripe_transactions.csv')

# Define high-risk country combinations (customize for your business)
high_risk_combinations = [
    {'card_country': 'US', 'ip_country': 'NG'},  # US card, Nigeria IP
    {'card_country': 'GB', 'ip_country': 'RU'},  # UK card, Russia IP
    {'card_country': 'CA', 'ip_country': 'RO'},  # Canada card, Romania IP
]

def check_geographic_risk(row):
    """
    Assign risk scores based on geographic mismatches
    """
    risk_score = 0
    risk_factors = []

    # Card country vs IP country mismatch
    if pd.notna(row['card_country']) and pd.notna(row['ip_country']):
        if row['card_country'] != row['ip_country']:
            risk_score += 2
            risk_factors.append('card_ip_mismatch')

    # Billing country vs shipping country mismatch
    if pd.notna(row['billing_country']) and pd.notna(row['shipping_country']):
        if row['billing_country'] != row['shipping_country']:
            risk_score += 1
            risk_factors.append('billing_shipping_mismatch')

    # Check against high-risk combinations
    for combo in high_risk_combinations:
        if (row.get('card_country') == combo['card_country'] and
            row.get('ip_country') == combo['ip_country']):
            risk_score += 5
            risk_factors.append('high_risk_country_combo')

    # Multiple country mismatch (card, IP, billing all different)
    countries = set([
        row.get('card_country'),
        row.get('ip_country'),
        row.get('billing_country')
    ])
    if len([c for c in countries if pd.notna(c)]) >= 3:
        risk_score += 3
        risk_factors.append('multiple_country_mismatch')

    return pd.Series({
        'geo_risk_score': risk_score,
        'geo_risk_factors': ', '.join(risk_factors)
    })

# Apply geographic risk analysis
df[['geo_risk_score', 'geo_risk_factors']] = df.apply(check_geographic_risk, axis=1)

# Flag high-risk transactions
high_geo_risk = df[df['geo_risk_score'] >= 3].copy()
high_geo_risk = high_geo_risk.sort_values('geo_risk_score', ascending=False)

print(f"Transactions with geographic risk factors: {len(high_geo_risk)}")
print(f"\nTop geographic risk patterns:")
print(high_geo_risk[['id', 'amount', 'card_country', 'ip_country',
                      'geo_risk_score', 'geo_risk_factors']].head(10))

# Save flagged transactions
high_geo_risk.to_csv('geographic_fraud_flags.csv', index=False)

Expected Output

Transactions with geographic risk factors: 47

Top geographic risk patterns:
id                  amount  card_country  ip_country  geo_risk_score  geo_risk_factors
ch_1ABC...          299.99  US           NG          7               high_risk_country_combo, card_ip_mismatch
ch_2DEF...          189.50  GB           RU          7               high_risk_country_combo, card_ip_mismatch
ch_3GHI...          450.00  US           RO          5               multiple_country_mismatch, card_ip_mismatch
ch_4JKL...          125.75  CA           BR          3               card_ip_mismatch, billing_shipping_mismatch

Verification

Review the flagged transactions and verify:

Step 5: Review Amount and Behavior Patterns

Transaction amounts and purchasing behaviors often reveal fraud. Fraudsters typically exhibit different spending patterns than legitimate customers.

Detect Amount Anomalies

import pandas as pd
import numpy as np
from scipy import stats

df = pd.read_csv('stripe_transactions.csv')

# Calculate statistical baselines
mean_amount = df['amount'].mean()
std_amount = df['amount'].std()
median_amount = df['amount'].median()

print(f"Transaction Amount Statistics:")
print(f"Mean: ${mean_amount:.2f}")
print(f"Median: ${median_amount:.2f}")
print(f"Std Dev: ${std_amount:.2f}")

# Identify statistical outliers (Z-score method)
df['amount_zscore'] = np.abs(stats.zscore(df['amount']))
amount_outliers = df[df['amount_zscore'] > 3]  # 3 standard deviations

print(f"\nAmount outliers (>3 std dev): {len(amount_outliers)}")

# Check for round-number fraud pattern
def is_round_number(amount):
    """Detect suspiciously round numbers"""
    return amount in [10, 25, 50, 100, 200, 250, 500, 1000]

df['is_round_amount'] = df['amount'].apply(is_round_number)
round_amount_transactions = df[df['is_round_amount'] == True]

print(f"Round-number transactions: {len(round_amount_transactions)}")

# Analyze per-customer behavior
customer_behavior = df.groupby('customer').agg({
    'amount': ['mean', 'std', 'min', 'max', 'count'],
    'created': ['min', 'max']
}).reset_index()

customer_behavior.columns = ['customer', 'avg_amount', 'std_amount',
                              'min_amount', 'max_amount', 'transaction_count',
                              'first_transaction', 'last_transaction']

# Flag customers with unusual behavior
customer_behavior['amount_variance_ratio'] = (
    customer_behavior['std_amount'] / customer_behavior['avg_amount']
)

# High variance ratio indicates inconsistent spending
high_variance_customers = customer_behavior[
    (customer_behavior['amount_variance_ratio'] > 2) &
    (customer_behavior['transaction_count'] >= 3)
]

print(f"\nCustomers with unusual spending variance: {len(high_variance_customers)}")

# Identify first-time high-value purchases (common fraud pattern)
df['customer_transaction_number'] = df.groupby('customer').cumcount() + 1
first_transaction_high_value = df[
    (df['customer_transaction_number'] == 1) &
    (df['amount'] > mean_amount + (2 * std_amount))
]

print(f"First-time transactions with unusually high amounts: {len(first_transaction_high_value)}")

# Save behavioral fraud flags
behavior_flags = pd.concat([
    amount_outliers[['id', 'customer', 'amount', 'amount_zscore']].assign(flag_type='amount_outlier'),
    first_transaction_high_value[['id', 'customer', 'amount']].assign(flag_type='first_transaction_high_value')
])

behavior_flags.to_csv('behavior_fraud_flags.csv', index=False)

Expected Output

Transaction Amount Statistics:
Mean: $127.45
Median: $89.99
Std Dev: $98.32

Amount outliers (>3 std dev): 18
Round-number transactions: 142
Customers with unusual spending variance: 12
First-time transactions with unusually high amounts: 23

Behavioral fraud flags saved: 41 transactions

Key Behavioral Indicators

Step 6: Implement Automated Fraud Detection

Manual analysis is valuable for understanding patterns, but automated detection provides real-time protection. Integrate your findings into an automated system.

Create a Fraud Scoring System

import pandas as pd

def calculate_fraud_score(transaction):
    """
    Comprehensive fraud scoring combining all detection methods
    Returns score 0-100 (higher = more suspicious)
    """
    score = 0
    risk_factors = []

    # Velocity risk (0-25 points)
    if transaction.get('velocity_60min', 0) >= 5:
        score += 25
        risk_factors.append('extreme_velocity')
    elif transaction.get('velocity_60min', 0) >= 3:
        score += 15
        risk_factors.append('high_velocity')

    # Geographic risk (0-30 points)
    if transaction.get('geo_risk_score', 0) >= 5:
        score += 30
        risk_factors.append('high_geo_risk')
    elif transaction.get('geo_risk_score', 0) >= 3:
        score += 15
        risk_factors.append('medium_geo_risk')

    # Amount risk (0-20 points)
    if transaction.get('amount_zscore', 0) > 3:
        score += 20
        risk_factors.append('amount_outlier')
    elif transaction.get('amount_zscore', 0) > 2:
        score += 10
        risk_factors.append('high_amount')

    # Behavioral risk (0-25 points)
    if transaction.get('customer_transaction_number', 0) == 1:
        if transaction.get('amount', 0) > 500:
            score += 25
            risk_factors.append('high_first_purchase')
        elif transaction.get('amount', 0) > 200:
            score += 15
            risk_factors.append('elevated_first_purchase')

    # Additional risk factors (bonus points)
    if transaction.get('cvc_check') == 'fail':
        score += 10
        risk_factors.append('cvc_fail')

    if transaction.get('zip_check') == 'fail':
        score += 10
        risk_factors.append('zip_fail')

    if transaction.get('is_round_amount'):
        score += 5
        risk_factors.append('round_amount')

    # Cap at 100
    score = min(score, 100)

    return {
        'fraud_score': score,
        'risk_factors': ', '.join(risk_factors),
        'risk_level': 'HIGH' if score >= 70 else 'MEDIUM' if score >= 40 else 'LOW'
    }

# Apply to all transactions
df = pd.read_csv('stripe_transactions_enriched.csv')  # With all previous analysis
fraud_scores = df.apply(calculate_fraud_score, axis=1, result_type='expand')
df = pd.concat([df, fraud_scores], axis=1)

# Summary by risk level
print("Fraud Risk Summary:")
print(df['risk_level'].value_counts())
print(f"\nHigh-risk transactions requiring review: {len(df[df['risk_level'] == 'HIGH'])}")

# Save prioritized review list
high_risk = df[df['risk_level'] == 'HIGH'].sort_values('fraud_score', ascending=False)
high_risk[['id', 'customer', 'amount', 'fraud_score', 'risk_factors']].to_csv(
    'fraud_review_queue.csv',
    index=False
)

print("\nTop 5 highest-risk transactions:")
print(high_risk[['id', 'amount', 'fraud_score', 'risk_factors']].head())

Expected Output

Fraud Risk Summary:
LOW       1,156
MEDIUM      78
HIGH        13

High-risk transactions requiring review: 13

Top 5 highest-risk transactions:
id              amount  fraud_score  risk_factors
ch_1ABC...      299.99  85          extreme_velocity, high_geo_risk, cvc_fail
ch_2DEF...      850.00  80          high_first_purchase, high_geo_risk, amount_outlier
ch_3GHI...      125.00  75          extreme_velocity, medium_geo_risk, round_amount
ch_4JKL...      499.99  70          high_geo_risk, elevated_first_purchase, zip_fail
ch_5MNO...      200.00  70          high_velocity, high_geo_risk, round_amount

Integrate with Stripe Radar

For automated blocking, configure Stripe Radar rules based on your findings:

  1. Navigate to Radar → Rules in your Stripe Dashboard
  2. Create custom rules based on your risk thresholds:
    • Block if velocity exceeds 5 transactions per hour from same card
    • Review if card country differs from IP country
    • Block specific high-risk country combinations
    • Review first-time transactions over $500
  3. Set up webhooks to receive fraud alerts in real-time
  4. Configure automatic email notifications for high-risk transactions

For businesses looking to implement more advanced, AI-driven fraud detection systems, exploring AI-first data analysis pipelines can provide sophisticated pattern recognition capabilities that evolve with emerging fraud tactics.