Anomaly Detection
🧠 Anomaly Detection
InfraSight includes a machine learning–based anomaly detection module designed to automatically identify unseen attacks and abnormal container behavior in real time. It operates in an online learning fashion using the River Python library, which enables continuous adaptation to new data streams.
🎯 Overview & Purpose
The anomaly detection subsystem continuously monitors container activity to detect deviations from normal behavior. Two complementary models are used:
- Resource Usage Model: Detects unusual patterns in CPU, memory, and I/O usage.
- Syscall Frequency Model: Identifies abnormal syscall invocation rates.
Both models aim to detect zero day or previously unseen attack behaviors by learning the baseline behavior of each container directly from live event streams.
⚙️ Architecture & Workflow
The anomaly detection service consumes data directly from Kafka, where InfraSight’s eBPF-based client and server publishes telemetry events.
The workflow proceeds in three stages:
-
Data Ingestion: Events are streamed from Kafka topics corresponding to resource usage and syscall frequencies.
-
Learning Phase: For each container, a dedicated model is created. The model observes the first events for approximately 5–15 minutes, depending on the container’s activity level, to establish its baseline behavior.
-
Detection Phase: Once the warmup period ends, the model begins producing anomaly scores and emits alerts when unusual patterns are detected.
🧩 Each container maintains its own model instance, ensuring that detection remains context aware and sensitive to its unique behavior profile.
🧮 Model Details
InfraSight currently uses two online models for anomaly scoring:
| Model | Description | Parameters |
|---|---|---|
| One-Class SVM | Learns normal behavior and assigns a signed score to each event. | Default configuration |
| Quantile Filter | Flags events above the 99th percentile of previously seen scores. | q = 0.99 |
Feature Extraction
The features used by each model are defined in the database schema:
- See
resource_eventsandsyscall_freq_eventstables in Database Schema.
Warmup Phase
- The first 50 events per container are used exclusively for model initialization and are not evaluated for anomalies.
Scoring and Alerting
An event is classified as anomalous when:
- The Quantile Filter detects that the score exceeds the 99th percentile of previous scores, or
- The One-Class SVM score is below 0.
🚨 Alerting & Output
When an anomaly is detected:
- An alert is logged to standard output with details such as container ID, timestamp, and anomaly type.