Operations · Generic · Logs · Traffic Analysis
Overview

Overview

Analysis overview and configuration

Analysis TypeTraffic Analysis
CompanyNetwork Operations
ObjectiveAnalyze network traffic patterns, identify top sources and destinations, detect anomalous activity, and profile protocol distribution across the capture period
Analysis Date2026-03-01
Processing Idtest_1772383435
Total Observations8000
ParameterValue_row
top_n10top_n
anomaly_threshold2anomaly_threshold
significance_level0.05significance_level
min_group_size5min_group_size
time_granularityautotime_granularity
Interpretation

Network Traffic Analysis Insights

Purpose

This analysis examines 8,000 network events to characterize traffic composition, identify dominant sources and protocols, and detect anomalous activity patterns. Understanding these patterns is essential for network operations to baseline normal behavior, allocate resources efficiently, and flag potential security concerns or operational issues.

Key Findings

  • TCP Dominance: TCP accounts for 77.35% of all events, with TLSv1.3 adding 17.52%, indicating encrypted traffic is the norm. Remaining 16 protocol types comprise only 5.13% combined.
  • Extreme Source Concentration: The top source (192.167.7.162) generates 29.19% of all traffic; top 20% of sources account for 94.14% of events—a strong Pareto distribution indicating highly skewed traffic patterns.
  • Destination Asymmetry: 192.167.7.162 receives 68.47% of inbound traffic, suggesting it functions as a primary hub or aggregation point rather than a distributed network.
  • Low Protocol Diversity: Shannon entropy of 1.12 bits (28% of maximum) confirms traffic is concentrated in few protocol types rather than uniformly distributed.
  • Temporal Anomaly: One time period detected with z-score of 2.52, flagging

Data preprocessing and column mapping

Initial Rows8000
Final Rows8000
Rows Removed0
Retention Rate100
Interpretation

Purpose

This section evaluates the data preprocessing pipeline's effectiveness in preparing the network traffic dataset for analysis. A 100% retention rate indicates no data loss occurred during cleaning, which is critical for maintaining the integrity of statistical tests and pattern detection across the 8,000 events.

Key Findings

  • Retention Rate: 100% (8,000 of 8,000 rows retained) - No observations were removed during preprocessing, suggesting either pristine data quality or minimal validation criteria applied
  • Rows Removed: 0 - Complete dataset preservation indicates no missing values, duplicates, or outliers were flagged for exclusion
  • Data Integrity: Full dataset availability enables robust statistical testing, including chi-square independence tests (p < 0.001) and Kruskal-Wallis analysis across 16 event types

Interpretation

The perfect retention rate reflects a dataset that required minimal cleaning intervention. This is advantageous for maintaining statistical power in hypothesis testing and preserving the natural distribution of network traffic patterns. However, the absence of any filtering suggests either the source data was exceptionally clean or preprocessing thresholds were lenient. The complete dataset enabled reliable detection of the strong source-protocol association (Cramér's V = 0.729) and identification of temporal anomalies.

Context

No train-test split information is provided, limiting visibility into validation methodology

Executive Summary

Executive Summary

Executive summary of network traffic analysis

total_events
8000
unique_sources
259
num_event_types
16
dominant^Protocol_|_Protocol_|_Protocol$
TCP
dominant_event_pct
77.35
pareto_ratio
94.14
anomaly_count
1
findingdetail
Finding 1Dataset contains 8000 events across 16 event types from 259 unique sources.
Finding 2TCP is the dominant event type at 77.35%.
Finding 3Top 20% of sources account for 94.14% of all events (Pareto concentration).
Finding 4Chi-square test confirms event types are NOT uniformly distributed (p < 0.05).
Finding 5Shannon entropy = 1.12 bits (28% of max) indicates low diversity.
Finding 61 anomalous time periods detected (z-score threshold = 2).
Bottom Line: Analyzed 8000 traffic events across 16 event types from 259 unique sources. TCP dominates at 77.35%.

Key Findings:
• Dataset contains 8000 events across 16 event types from 259 unique sources.
• TCP is the dominant event type at 77.35%.
• Top 20% of sources account for 94.14% of all events (Pareto concentration).
• Chi-square test confirms event types are NOT uniformly distributed (p < 0.05).
• Shannon entropy = 1.12 bits (28% of max) indicates low diversity.
• 1 anomalous time periods detected (z-score threshold = 2).

Recommendation: Investigate the top sources generating most traffic for optimization or security concerns. Review the 1 anomalous periods for potential incidents.
Interpretation

Purpose

This analysis examines 8,000 network traffic events to understand traffic composition, concentration patterns, and anomalies. The findings reveal how traffic is distributed across protocols, sources, and time periods—critical for assessing network efficiency and identifying potential security or operational concerns.

Key Findings

  • TCP Dominance: 77.35% of all events are TCP protocol, indicating highly skewed traffic composition toward a single protocol type
  • Pareto Concentration: Top 20% of sources generate 94.14% of events, demonstrating extreme traffic concentration among few originators
  • Low Protocol Diversity: Shannon entropy of 1.12 bits (28% of maximum) confirms traffic lacks diversity across the 16 available event types
  • Statistical Significance: Chi-square test (p < 0.05) confirms protocol distribution is non-random and highly structured
  • Temporal Anomaly: One time period flagged with z-score of 2.52, indicating a statistically unusual traffic spike

Interpretation

The traffic profile exhibits extreme concentration both in protocol type and source distribution. TCP's overwhelming prevalence combined with 94% of traffic originating from just 20% of sources suggests either legitimate centralized operations or potential network bottlenecks. The low entropy and statistical significance indicate this pattern is not random but reflects systematic network behavior. The single detected anomaly war

Visualization

Protocol Distribution

Frequency distribution of event types (protocols, log levels, etc.)

Interpretation

Purpose

This section identifies which event types (protocols) dominate the traffic dataset and measures the diversity of protocol distribution. Understanding protocol dominance is critical for assessing network composition, identifying potential bottlenecks, and detecting anomalous traffic patterns that deviate from expected protocol mixes.

Key Findings

  • TCP Dominance: 77.35% of all 8,000 events are TCP, establishing it as the overwhelmingly prevalent protocol in the network.
  • Secondary Protocol: TLSv1.3 accounts for 17.52%, creating a two-tier distribution where these protocols represent 94.87% of all traffic.
  • Event Type Diversity: 16 distinct event types exist, but distribution is highly skewed with 14 remaining protocols collectively representing only 5.13% of events.
  • Shannon Entropy (1.12 bits): Normalized to 28% of maximum entropy indicates low diversity—the traffic is concentrated rather than evenly distributed across protocol types.

Interpretation

The traffic profile exhibits extreme protocol concentration, with TCP and TLS accounting for nearly 95% of events. This low-entropy distribution suggests a homogeneous network environment dominated by standard web/application traffic. The remaining 14 protocol types (ARP, DNS, ICMP, HTTP, etc.) appear as minor contributors, indicating either specialized use

Visualization

Top Sources (Pareto)

Top sources ranked by event frequency with Pareto cumulative analysis

Interpretation

Purpose

This section identifies which sources dominate traffic activity across the network. Understanding source concentration is critical for traffic analysis because it reveals whether communication patterns are distributed evenly or heavily skewed toward a few high-volume talkers. This informs network monitoring priorities and helps detect anomalous behavior.

Key Findings

  • Unique Sources: 259 total sources identified, but traffic is highly concentrated
  • Top Source Dominance: 192.167.7.162 generates 29.19% of all events (2,335 events), making it a critical focal point
  • Pareto Concentration: Top 20% of sources account for 94.14% of traffic, indicating extreme concentration
  • Distribution Skew: Mean traffic per top-10 source is 634.8 events, but median is only 398.5, showing right-skewed distribution with a few outliers

Interpretation

The traffic profile exhibits classic Pareto behavior: a small minority of sources drive the vast majority of network activity. The single dominant source (192.167.7.162) alone represents nearly one-third of all events, while the top 10 sources collectively account for approximately 79% of traffic. This concentration suggests either legitimate hub-and-spoke architecture or potential network bottlenecks requiring investigation.

Context

This analysis assumes all 259 sources are legitimate

Visualization

Temporal Patterns

Event frequency over time showing temporal patterns and traffic spikes

Interpretation

Purpose

This section identifies when network traffic peaks across a 1200-minute observation window divided into 21 time bins. Understanding temporal patterns reveals whether traffic is concentrated in specific periods or distributed evenly, which is critical for capacity planning, anomaly detection, and identifying potential security incidents tied to specific timeframes.

Key Findings

  • Peak Activity (Bin_21): 687 events at 1200 minutes—the highest concentration, representing 8.6% of total traffic
  • Minimum Activity (Bin_2): 8 events at 60 minutes—the lowest point, indicating a significant trough
  • Mean Event Count: 380.95 events per bin with standard deviation of 302.18, showing substantial variability across periods
  • Distribution Pattern: Slight negative skew (-0.11) suggests traffic concentrates toward the end of the observation window rather than the beginning

Interpretation

Traffic is not uniformly distributed across time. The sharp spike at the final bin (Bin_21) and the dramatic dip early (Bin_2) indicate non-random temporal clustering. This concentration pattern aligns with the overall dataset's low entropy (1.12 bits) and high Pareto concentration (94.14%), suggesting that both traffic sources and timing are heavily skewed. The variability (SD=302.18) is substantial relative

Visualization

Top Destinations

Top destinations ranked by event frequency

Interpretation

Purpose

This section identifies which network destinations receive the highest volume of traffic events, revealing concentration patterns that indicate critical service endpoints or potential attack surfaces. Understanding destination concentration helps assess network dependencies and vulnerability exposure across the infrastructure.

Key Findings

  • Primary Destination Concentration: 192.167.7.162 receives 68.47% of all events (5,478 of 8,000), indicating extreme centralization of traffic flow to a single endpoint
  • Top 10 Destinations: Account for approximately 85% of total traffic, while 191 remaining destinations share only ~15%
  • Distribution Skewness: High positive skew (0.97) and standard deviation (1,688.26) confirm highly unequal traffic distribution across destinations
  • Secondary Targets: The next four destinations (104.91.166.75, 23.33.29.79, 74.125.9.169, 173.194.133.202) collectively represent only 12.65% of traffic

Interpretation

The destination landscape exhibits extreme Pareto concentration, with a single IP address acting as the dominant hub. This pattern aligns with the overall traffic analysis showing 94.14% Pareto concentration at the source level, suggesting a hub-and-spoke network topology. The 68.47% concentration on one destination represents a critical single point of failure

Visualization

Protocol x Source Heatmap

Cross-tabulation heatmap showing event type distribution across top sources

Interpretation

Purpose

This section examines whether different traffic sources exhibit distinct patterns in their event type usage. Understanding source-protocol associations is critical for identifying potential security anomalies, characterizing traffic behavior, and detecting sources that deviate from expected communication patterns.

Key Findings

  • Cramér's V = 0.729: Strong statistical association between sources and event types, indicating sources do NOT use protocols uniformly. This is well above the threshold for meaningful association (>0.5).
  • TCP Dominance Across Sources: TCP appears in 5 of 9 event types in the cross-tabulation, with counts ranging from 392–967 events per source, confirming TCP's prevalence is source-independent.
  • Concentrated High-Volume Pairs: Source 104.91.166.75 generates 967 TCP events and 208 TLSv1.3 events, representing the brightest cells in the heatmap and indicating specialized traffic behavior.
  • Skewed Distribution: Mean count of 289 versus median of 52 reveals extreme concentration—a few source-protocol combinations dominate while most are sparse (min=1).

Interpretation

The strong Cramér's V value confirms that sources differ significantly in their event type usage patterns. Rather than all sources generating similar protocol mixes, certain sources show preference for specific protocols. This heter

Visualization

Anomaly Detection

Anomalous time periods or rare events detected via z-score analysis

Interpretation

Purpose

This section identifies statistically unusual traffic patterns that deviate significantly from normal behavior. Anomalies detected via z-score analysis can signal security threats (DDoS, port scans), system failures, or legitimate traffic spikes. Understanding these outliers is critical for network monitoring and incident response.

Key Findings

  • Anomaly Count: 1 period detected - A single time bin exceeded the z-score threshold of 2, indicating one statistically significant deviation from the mean event count
  • Z-Score Range: -1.23 to 2.52 - The maximum anomaly (z=2.52) represents approximately 2.5 standard deviations above the mean of 381 events per bin
  • Temporal Distribution: Anomalies are sparse across 21 time bins, suggesting generally stable traffic with minimal extreme fluctuations
  • Baseline Stability: Most bins cluster within ±1.2 standard deviations, indicating consistent traffic patterns overall

Interpretation

The single detected anomaly represents a rare event in an otherwise predictable traffic stream. With 8,000 events distributed across 21 time periods, the system exhibits strong temporal consistency. The low anomaly count relative to the threshold suggests the network operates within expected parameters, though the one flagged period warrants investigation to determine whether it reflects legitimate activity surge or potential security concern

Visualization

Flow Analysis

Top source-Destination communication pairs by event frequency

Interpretation

Purpose

This section identifies the dominant source-destination communication pairs in the network, revealing which endpoints exchange the most traffic. Understanding these flows is critical for identifying critical service dependencies, detecting potential bottlenecks, and spotting unauthorized or anomalous communication patterns that deviate from expected network behavior.

Key Findings

  • Primary Hub: 192.167.7.162 is the dominant destination, receiving 80% of top 10 flows, indicating it serves as a central aggregation point or critical service endpoint
  • Top Flow Volume: 104.91.166.75 → 192.167.7.162 accounts for 1,175 events (14.69%), representing the single largest communication pair
  • Flow Concentration: The top 5 pairs represent 56.24% of traffic in this subset, showing significant concentration around a few key relationships
  • Bidirectional Activity: Limited reverse flows (only 2 of 10 pairs show 192.167.7.162 as source), suggesting asymmetric communication patterns typical of client-server architectures

Interpretation

The traffic pattern reveals a hub-and-spoke topology centered on 192.167.7.162, which receives inbound traffic from 9 distinct sources while initiating minimal outbound communication. This asymmetry aligns with the overall dataset's TCP dominance (77.35

Visualization

Packet Size Distribution

Distribution of packet/message sizes (numeric measure)

Interpretation

Purpose

This section characterizes the packet/message size distribution across the 8,000 events, revealing whether traffic consists of small control messages, full-sized payloads, or a mix. Size profiles directly impact network efficiency, protocol behavior, and anomaly detection—particularly important for TCP-dominated traffic (77.35%) where bimodal patterns (small ACKs vs. full MTU packets) are expected.

Key Findings

  • Mean vs. Median Divergence: Mean of 962.95 vs. median of 1462 indicates left-skewed distribution with some very small packets pulling the average down, typical of TCP acknowledgments and control frames.
  • Concentration in Two Bins: 98.48% of traffic falls in 0–2000 byte range (40.85% in 0–1000, 57.63% in 1000–2000), showing tight clustering around standard packet sizes.
  • P90 at 1514 bytes: 90th percentile near 1514 aligns with standard Ethernet MTU (1500 bytes), confirming most packets are full-sized or near-maximum.
  • Extreme Outliers Rare: Only 0.04% of events exceed 5000 bytes, with negligible traffic above 11,000 bytes.

Interpretation

The

Data Table

Statistical Tests

Statistical hypothesis tests: chi-square, Shannon entropy, and Cramer's V

test_namestatisticp_valueeffect_sizeinterpretation_row
Chi-Square Goodness of Fit72548.980.000e+00Entropy: 0.28Event types are NOT uniformly distributed (significant)1
Chi-Square Independence63748.964.998e-04Cramers V: 0.7289Source and protocol are associated (Cramers V = 0.7289)X-squared
Kruskal-Wallis190.551.758e-32Groups: 16Packet sizes differ significantly across protocols11
Interpretation

Purpose

This section validates whether observed traffic patterns are statistically significant or due to random chance. The tests confirm that TCP dominance, source-protocol associations, and packet size variations are genuine structural features of the network, not artifacts. This establishes confidence in the concentration patterns identified elsewhere in the analysis.

Key Findings

  • Chi-Square Goodness of Fit (p ≈ 0): Event types are definitively non-uniform; TCP's 77.35% dominance is statistically significant, not random variation
  • Cramer's V (0.729): Strong association between source IPs and event types indicates systematic traffic patterns rather than independent distributions
  • Shannon Entropy (1.12 bits): At 28% of maximum entropy, the dataset exhibits low diversity—traffic is concentrated in few event types and source pairs
  • Kruskal-Wallis (p ≈ 1.76e-32): Packet sizes differ significantly across protocols, confirming protocol-specific behavioral signatures

Interpretation

The statistical tests collectively demonstrate that network traffic is highly structured and non-random. The near-zero p-values across multiple tests eliminate sampling error as an explanation for observed concentration. The strong Cramer's V indicates that certain sources preferentially generate specific event types, suggesting either intentional traffic patterns or systematic network behavior. Low entropy confirms that traffic

Data Table

Traffic KPIs

Key performance indicators summarizing the traffic dataset

metricvalue
Total Events8000
Unique Sources259
Unique Destinations201
Event Types16
Dominant Event TypeTCP
Dominant Event %77.35%
Top Source %29.19%
Pareto Ratio (top 20%)94.14%
Interpretation

Purpose

This section establishes the baseline scale and composition of the traffic dataset, answering fundamental questions about network scope and protocol distribution. Understanding these metrics is essential for assessing data quality, identifying concentration patterns, and contextualizing subsequent statistical findings about anomalies and associations.

Key Findings

  • Total Events: 8,000 observations provide a substantial sample for statistical analysis across multiple dimensions
  • Source-Destination Diversity: 259 sources and 201 destinations indicate moderate network complexity with potential for concentrated flows
  • Protocol Dominance: TCP represents 77.35% of all events, indicating a heavily skewed distribution toward a single protocol type
  • Event Type Range: 16 distinct event types suggest a multi-protocol environment (TCP, TLSv1.3, ARP, DNS, ICMP, etc.)
  • Pareto Concentration: Top 20% of sources account for 94.14% of events, revealing extreme traffic concentration

Interpretation

The dataset exhibits classic network traffic characteristics: high protocol concentration (TCP-heavy), moderate source-destination diversity, and extreme Pareto concentration. This skewness is typical of real-world networks where a small number of high-volume sources dominate traffic patterns. The 77.35% TCP dominance combined with 94.14% Pareto concentration suggests the network is driven by a few prol

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing