Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| top_n | 10 | top_n |
| anomaly_threshold | 2 | anomaly_threshold |
| significance_level | 0.05 | significance_level |
| min_group_size | 5 | min_group_size |
| time_granularity | auto | time_granularity |
This analysis examines 8,000 network events to characterize traffic composition, identify dominant sources and protocols, and detect anomalous activity patterns. Understanding these patterns is essential for network operations to baseline normal behavior, allocate resources efficiently, and flag potential security concerns or operational issues.
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 8,000 |
| Final Rows | 8,000 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section evaluates the data preprocessing pipeline's effectiveness in preparing the network traffic dataset for analysis. A 100% retention rate indicates no data loss occurred during cleaning, which is critical for maintaining the integrity of statistical tests and pattern detection across the 8,000 events.
The perfect retention rate reflects a dataset that required minimal cleaning intervention. This is advantageous for maintaining statistical power in hypothesis testing and preserving the natural distribution of network traffic patterns. However, the absence of any filtering suggests either the source data was exceptionally clean or preprocessing thresholds were lenient. The complete dataset enabled reliable detection of the strong source-protocol association (Cramér's V = 0.729) and identification of temporal anomalies.
No train-test split information is provided, limiting visibility into validation methodology
| finding | detail |
|---|---|
| Finding 1 | Dataset contains 8000 events across 16 event types from 259 unique sources. |
| Finding 2 | TCP is the dominant event type at 77.35%. |
| Finding 3 | Top 20% of sources account for 94.14% of all events (Pareto concentration). |
| Finding 4 | Chi-square test confirms event types are NOT uniformly distributed (p < 0.05). |
| Finding 5 | Shannon entropy = 1.12 bits (28% of max) indicates low diversity. |
| Finding 6 | 1 anomalous time periods detected (z-score threshold = 2). |
This analysis examines 8,000 network traffic events to understand traffic composition, concentration patterns, and anomalies. The findings reveal how traffic is distributed across protocols, sources, and time periods—critical for assessing network efficiency and identifying potential security or operational concerns.
The traffic profile exhibits extreme concentration both in protocol type and source distribution. TCP's overwhelming prevalence combined with 94% of traffic originating from just 20% of sources suggests either legitimate centralized operations or potential network bottlenecks. The low entropy and statistical significance indicate this pattern is not random but reflects systematic network behavior. The single detected anomaly war
Frequency distribution of event types (protocols, log levels, etc.)
This section identifies which event types (protocols) dominate the traffic dataset and measures the diversity of protocol distribution. Understanding protocol dominance is critical for assessing network composition, identifying potential bottlenecks, and detecting anomalous traffic patterns that deviate from expected protocol mixes.
The traffic profile exhibits extreme protocol concentration, with TCP and TLS accounting for nearly 95% of events. This low-entropy distribution suggests a homogeneous network environment dominated by standard web/application traffic. The remaining 14 protocol types (ARP, DNS, ICMP, HTTP, etc.) appear as minor contributors, indicating either specialized use
Top sources ranked by event frequency with Pareto cumulative analysis
This section identifies which sources dominate traffic activity across the network. Understanding source concentration is critical for traffic analysis because it reveals whether communication patterns are distributed evenly or heavily skewed toward a few high-volume talkers. This informs network monitoring priorities and helps detect anomalous behavior.
The traffic profile exhibits classic Pareto behavior: a small minority of sources drive the vast majority of network activity. The single dominant source (192.167.7.162) alone represents nearly one-third of all events, while the top 10 sources collectively account for approximately 79% of traffic. This concentration suggests either legitimate hub-and-spoke architecture or potential network bottlenecks requiring investigation.
This analysis assumes all 259 sources are legitimate
Event frequency over time showing temporal patterns and traffic spikes
This section identifies when network traffic peaks across a 1200-minute observation window divided into 21 time bins. Understanding temporal patterns reveals whether traffic is concentrated in specific periods or distributed evenly, which is critical for capacity planning, anomaly detection, and identifying potential security incidents tied to specific timeframes.
Traffic is not uniformly distributed across time. The sharp spike at the final bin (Bin_21) and the dramatic dip early (Bin_2) indicate non-random temporal clustering. This concentration pattern aligns with the overall dataset's low entropy (1.12 bits) and high Pareto concentration (94.14%), suggesting that both traffic sources and timing are heavily skewed. The variability (SD=302.18) is substantial relative
Top destinations ranked by event frequency
This section identifies which network destinations receive the highest volume of traffic events, revealing concentration patterns that indicate critical service endpoints or potential attack surfaces. Understanding destination concentration helps assess network dependencies and vulnerability exposure across the infrastructure.
The destination landscape exhibits extreme Pareto concentration, with a single IP address acting as the dominant hub. This pattern aligns with the overall traffic analysis showing 94.14% Pareto concentration at the source level, suggesting a hub-and-spoke network topology. The 68.47% concentration on one destination represents a critical single point of failure
Cross-tabulation heatmap showing event type distribution across top sources
This section examines whether different traffic sources exhibit distinct patterns in their event type usage. Understanding source-protocol associations is critical for identifying potential security anomalies, characterizing traffic behavior, and detecting sources that deviate from expected communication patterns.
The strong Cramér's V value confirms that sources differ significantly in their event type usage patterns. Rather than all sources generating similar protocol mixes, certain sources show preference for specific protocols. This heter
Anomalous time periods or rare events detected via z-score analysis
This section identifies statistically unusual traffic patterns that deviate significantly from normal behavior. Anomalies detected via z-score analysis can signal security threats (DDoS, port scans), system failures, or legitimate traffic spikes. Understanding these outliers is critical for network monitoring and incident response.
The single detected anomaly represents a rare event in an otherwise predictable traffic stream. With 8,000 events distributed across 21 time periods, the system exhibits strong temporal consistency. The low anomaly count relative to the threshold suggests the network operates within expected parameters, though the one flagged period warrants investigation to determine whether it reflects legitimate activity surge or potential security concern
Top source-Destination communication pairs by event frequency
This section identifies the dominant source-destination communication pairs in the network, revealing which endpoints exchange the most traffic. Understanding these flows is critical for identifying critical service dependencies, detecting potential bottlenecks, and spotting unauthorized or anomalous communication patterns that deviate from expected network behavior.
The traffic pattern reveals a hub-and-spoke topology centered on 192.167.7.162, which receives inbound traffic from 9 distinct sources while initiating minimal outbound communication. This asymmetry aligns with the overall dataset's TCP dominance (77.35
Distribution of packet/message sizes (numeric measure)
This section characterizes the packet/message size distribution across the 8,000 events, revealing whether traffic consists of small control messages, full-sized payloads, or a mix. Size profiles directly impact network efficiency, protocol behavior, and anomaly detection—particularly important for TCP-dominated traffic (77.35%) where bimodal patterns (small ACKs vs. full MTU packets) are expected.
The
Statistical hypothesis tests: chi-square, Shannon entropy, and Cramer's V
| test_name | statistic | p_value | effect_size | interpretation | _row |
|---|---|---|---|---|---|
| Chi-Square Goodness of Fit | 72548.98 | 0.000e+00 | Entropy: 0.28 | Event types are NOT uniformly distributed (significant) | 1 |
| Chi-Square Independence | 63748.96 | 4.998e-04 | Cramers V: 0.7289 | Source and protocol are associated (Cramers V = 0.7289) | X-squared |
| Kruskal-Wallis | 190.55 | 1.758e-32 | Groups: 16 | Packet sizes differ significantly across protocols | 11 |
This section validates whether observed traffic patterns are statistically significant or due to random chance. The tests confirm that TCP dominance, source-protocol associations, and packet size variations are genuine structural features of the network, not artifacts. This establishes confidence in the concentration patterns identified elsewhere in the analysis.
The statistical tests collectively demonstrate that network traffic is highly structured and non-random. The near-zero p-values across multiple tests eliminate sampling error as an explanation for observed concentration. The strong Cramer's V indicates that certain sources preferentially generate specific event types, suggesting either intentional traffic patterns or systematic network behavior. Low entropy confirms that traffic
Key performance indicators summarizing the traffic dataset
| metric | value |
|---|---|
| Total Events | 8000 |
| Unique Sources | 259 |
| Unique Destinations | 201 |
| Event Types | 16 |
| Dominant Event Type | TCP |
| Dominant Event % | 77.35% |
| Top Source % | 29.19% |
| Pareto Ratio (top 20%) | 94.14% |
This section establishes the baseline scale and composition of the traffic dataset, answering fundamental questions about network scope and protocol distribution. Understanding these metrics is essential for assessing data quality, identifying concentration patterns, and contextualizing subsequent statistical findings about anomalies and associations.
The dataset exhibits classic network traffic characteristics: high protocol concentration (TCP-heavy), moderate source-destination diversity, and extreme Pareto concentration. This skewness is typical of real-world networks where a small number of high-volume sources dominate traffic patterns. The 77.35% TCP dominance combined with 94.14% Pareto concentration suggests the network is driven by a few prol