§3

Mathematical Model

Mahalanobis Distance as anomaly detector, Welford incremental updates, EWMA smoothing, temporal derivatives, robustness analysis, and the bipolar regime taxonomy.

Based on Whitepaper v2.1 — Sections 4 & 6 · ~25 min read

1. State Representation

HOSA models the instantaneous state of a node as a vector x(t) ∈ ℝⁿ, where each component represents a system resource variable:

x(t) = [ x₁(t), x₂(t), …, xₙ(t) ]ᵀ

In the reference implementation, the variables include (but are not limited to):

  • CPU utilization (aggregate and per-core)
  • Memory pressure (utilization, swap, PSI some/full)
  • I/O throughput and latency
  • Network packet rate (ingress/egress)
  • Scheduler run queue depth
  • Page fault rate
  • Context switch counters

The exact composition of x(t) is determined during the warm-up phase (see §2 — Architecture, §3) based on the hardware topology discovered via proprioception. A typical server produces a vector with n = 8 to 12 dimensions.

2. The Mahalanobis Distance

2.1. Why Mahalanobis over Static Thresholds

Anomaly detection based on unidimensional static thresholds (e.g., "CPU > 90%") suffers from a fundamental limitation: it ignores the correlation structure between variables.

Scenario A — Legitimate

  • CPU: 85%
  • Memory: stable
  • I/O: low
  • Network: stable
  • → Intensive computation (video rendering)

Scenario B — Imminent Collapse

  • CPU: 85%
  • Memory pressure: rising
  • I/O: stalling
  • Network latency: spiking
  • → Cascading failure in progress

A static threshold on CPU produces the same alert for both scenarios. The Mahalanobis Distance distinguishes them because it measures the deviation from baseline in all dimensions simultaneously, weighted by their covariance. In Scenario A, CPU is high but all other variables are within their expected correlation with CPU — low DM. In Scenario B, the correlation structure itself is deformed — high DM.

2.2. Formal Definition

The Mahalanobis Distance [1] measures the distance of an observation x from the multivariate distribution defined by the mean vector μ and the Covariance Matrix Σ:

DM(x) = √[ (x − μ)ᵀ · Σ⁻¹ · (x − μ) ]
Equation 1 — Mahalanobis Distance

Where:

  • x — the current state vector x(t) ∈ ℝⁿ
  • μ — the baseline mean vector, accumulated during warm-up and updated incrementally
  • Σ — the n×n Covariance Matrix, capturing the correlations between all resource variables
  • Σ⁻¹ — the inverse (precision matrix), which weights dimensions by their variance and interdependence

The Covariance Matrix Σ is the critical element. It captures how variables co-vary under normal operation. Its inverse Σ⁻¹ ensures that a deviation along a high-variance dimension (e.g., CPU fluctuations during normal operation) contributes less to DM than the same magnitude deviation along a low-variance dimension (e.g., sudden memory pressure in a normally stable system).

Geometric Interpretation

DM can be understood as the Euclidean distance in a whitened coordinate system where the covariance matrix is transformed to the identity. In this transformed space, a unit deviation in every dimension has equal significance. The Mahalanobis Distance is therefore a measure of "how many standard deviations away from normal the system is, accounting for all correlations."

2.3. Dimensional Contribution Decomposition

To diagnose which resources are driving the deviation, HOSA decomposes DM² into per-dimension contributions. Given the deviation vector d = x(t) − μ:

DM² = dᵀ · Σ⁻¹ · d = Σⱼ cⱼ

where: cj = dj · (Σ⁻¹ · d)j
Equation 2 — Dimensional Contribution

The dimensions with the largest cj values are the dominant contributors to the anomaly. This decomposition enables HOSA to:

  1. Direct throttling actions to processes consuming the dominant resource
  2. Log the mathematical reason for every decision (auditability)
  3. Provide dimensional context in webhook payloads for operator diagnosis

In the walkthrough scenario from the whitepaper (§5.7), when the memory leak occurs, the decomposition shows mem_used contributing 68% of DM², mem_pressure contributing 19%, and io_latency contributing 8% — giving the operator an immediate understanding of the anomaly's nature.

3. Incremental Covariance Updates (Welford)

Batch computation of the Covariance Matrix Σ over accumulated data windows is computationally expensive — O(n² · k) for n variables and k samples — and requires memory allocation proportional to the window size. This violates Principle 3 (Predictable Footprint).

HOSA uses the generalized Welford algorithm [2] for online incremental updates of both μ and Σ. Each new sample x(t) updates the statistics in O(n²) with O(1) memory allocation, regardless of the number of accumulated samples:

For each new sample x(t), with count N:

δ = x(t) − μold
μnew = μold + δ / N
δ' = x(t) − μnew
M₂new = M₂old + δ · δ'ᵀ

Σ = M₂ / (N − 1)
Equation 3 — Welford Incremental Update (generalized to matrices)

This eliminates the need to store data windows and guarantees a predictable memory footprint. For n = 10 variables, the entire statistical state (μ as 10-vector, M₂ as 10×10 matrix, plus count N) occupies approximately 900 bytes.

Numerical Stability

The Welford algorithm is specifically designed for numerical stability in floating-point arithmetic. Unlike the naïve formula Var(X) = E[X²] − E[X]², which suffers from catastrophic cancellation when the mean is large relative to the variance, the Welford formulation accumulates the centered sum of squares directly, avoiding precision loss. This is critical for HOSA because resource metrics can span several orders of magnitude (e.g., bytes vs. percentages).

4. Covariance Matrix Inversion

The Mahalanobis Distance requires Σ⁻¹. The choice of inversion method involves trade-offs between computational cost, numerical stability, and handling of degenerate cases.

4.1. Cholesky Decomposition

For the expected dimensionality (n ≤ 15), HOSA computes Σ⁻¹ via Cholesky decomposition. Since Σ is symmetric positive semi-definite by construction, it admits the decomposition Σ = LLᵀ where L is lower triangular. The inverse is then computed via forward and backward substitution.

The computational cost is O(n³/3) — for n = 10, this is approximately 333 floating-point operations, completing in sub-microsecond on modern hardware. The Cholesky approach also provides a natural check for positive definiteness: if the decomposition fails (a diagonal element of L becomes non-positive), the matrix is degenerate and requires regularization.

4.2. Tikhonov Regularization

In systems with highly collinear variables (e.g., cpu_user and cpu_total), Σ can become singular or ill-conditioned. HOSA applies Tikhonov regularization:

Σreg = Σ + λI
Equation 4 — Tikhonov Regularization

where λ is a small positive constant and I is the identity matrix. This adds a small positive value to the diagonal, guaranteeing invertibility. The parameter λ controls the trade-off between fidelity to the observed covariance and numerical stability. In practice, λ = 10⁻⁶ is sufficient for metric data without noticeably distorting DM.

Sherman-Morrison-Woodbury for Higher Dimensions

For dimensionality greater than n = 15, HOSA can optionally use the Sherman-Morrison-Woodbury formula for incremental updates of Σ⁻¹ directly, avoiding full re-inversion at each sample. This reduces the per-sample cost of maintaining the precision matrix from O(n³) to O(n²). In the current reference implementation (n ≤ 12), full Cholesky re-inversion at each sample is computationally trivial.

5. Temporal Derivatives and EWMA Smoothing

HOSA does not act on the instantaneous value of DM, but on its temporal rate of change — the velocity and acceleration with which the system departs from homeostasis. This is a critical distinction: it allows HOSA to detect that a system is heading toward collapse, not merely that it has arrived at a critical state.

5.1. EWMA Smoothing

Before computing derivatives, the raw DM signal is smoothed via an Exponentially Weighted Moving Average:

M(t) = α · DM(t) + (1 − α) · D̄M(t−1)
Equation 5 — EWMA Smoothing

The factor α controls the fundamental trade-off between responsiveness (high α preserves rapid variations but retains noise) and stability (low α smooths the signal but introduces detection latency).

The calibration of α is performed during the warm-up phase and constitutes one of the critical parameters of the architecture. The documentation of α sensitivity analysis against synthetic and real collapse datasets is a planned deliverable of the experimental phase.

5.2. First Derivative — Velocity of Deviation

dD̄M/dt ≈ [ D̄M(t) − D̄M(t − Δt) ] / Δt
Equation 6 — First Derivative (finite difference)

The first derivative indicates the speed of departure from homeostasis. A sustained positive dD̄M/dt indicates that the system is moving away from its baseline — the anomaly is growing. A negative value indicates recovery.

5.3. Second Derivative — Acceleration

d²D̄M/dt² ≈ [ dD̄M/dt |t − dD̄M/dt |t−Δt ] / Δt
Equation 7 — Second Derivative (finite difference)

The second derivative is the most critical signal for response-level determination:

  • d²D̄M/dt² > 0 — the system is accelerating toward collapse. This is the activation condition for Level 3 (Active Containment).
  • d²D̄M/dt² < 0 — the system is decelerating. If containment is active, this indicates the mitigation is effective.
  • d²D̄M/dt² ≈ 0 — the system is in steady state (stable or stable-at-elevated). This is the condition for habituation evaluation.
The Ill-Posed Nature of Numerical Differentiation

Numerical differentiation is an ill-posed problem in the sense of Hadamard: small perturbations in input data produce large variations in the computed derivative. The second derivative amplifies this effect quadratically. Without treatment, the second derivative of noisy kernel time series oscillates violently, generating false positives. This is why EWMA smoothing before differentiation is mandatory, not optional. The smoothing factor α directly controls the stability-responsiveness trade-off of the derivative estimates.

5.4. Alternative Under Investigation: Kalman Filter

The unidimensional Kalman Filter offers optimal state estimation under Gaussian noise, with the advantage of dynamically adapting to the observed variance. It provides a more principled approach than EWMA because it explicitly models the process noise and measurement noise as separate parameters.

The comparative analysis of EWMA vs. Kalman will be presented in the experimental phase of the dissertation, measuring: detection latency, false positive rate, and computational overhead under identical collapse scenarios.

6. Robustness Under Non-Normality

The Mahalanobis Distance, as formulated, implicitly assumes that the baseline profile follows an approximately ellipsoidal (multivariate normal) distribution. This assumption deserves explicit analysis, as kernel metrics in real systems frequently exhibit characteristics that violate it.

6.1. Expected Violations in Kernel Metrics

Violation ClassExample in Kernel MetricsImpact on DM
Heavy tails I/O latency: most operations complete in microseconds, but outliers of hundreds of milliseconds occur more frequently than the normal distribution predicts. DM underestimates the frequency of legitimate extreme values, potentially generating false positives on tail events.
Skewness CPU utilization: distribution frequently concentrated near 0% (idle) or 100% (saturated), depending on operational regime. μ and Σ may not adequately represent the center and dispersion of the actual distribution, displacing the detector.
Multimodality Systems that alternate between two distinct operational regimes (e.g., batch server processing jobs hourly — alternation between total idleness and full load). μ computed as arithmetic mean locates itself between the two modes, where few real samples exist. DM classifies normal behavior of both modes as anomalous.

6.2. Evidence of Robustness in Literature

The robustness of DM as an outlier detector under moderate normality violations is documented in the literature:

  • Gnanadesikan & Kettenring (1972) [3] demonstrated that covariance-based estimators maintain discriminative capacity under elliptical non-normal distributions, losing exact probabilistic interpretation but preserving relative ordering — more anomalous observations continue producing higher DM.
  • Penny (1996) [4] analyzed DM performance under various non-Gaussian distributions, confirming graceful degradation: error rate increases under severe violations but the detector does not collapse abruptly.
  • Hubert, Debruyne & Rousseeuw (2018) [5] demonstrated that combining DM with robust location and dispersion estimators preserves detection efficacy even under up to 25% sample contamination by outliers.

A critical observation: HOSA operates primarily on the rate of change of DM (derivatives), not its absolute value. Even if the absolute value loses exact probabilistic interpretation under non-normality, the derivatives dDM/dt and d²DM/dt² remain valid indicators of acceleration toward collapse, because they reflect temporal dynamics, not probabilistic magnitude.

6.3. Mitigation Strategy: Robust Estimation

HOSA implements a two-level strategy for addressing severe normality violations:

Level 1 — Regularization (default). The Tikhonov regularization already applied (Σreg = Σ + λI) partially mitigates sensitivity to outliers by stabilizing the matrix inversion, functioning as a form of shrinkage.

Level 2 — Robust estimation (conditional activation). When HOSA detects that the baseline distribution severely violates normality — operationalized via continuous monitoring of Mardia's multivariate kurtosis [6]:

κM = (1/N) · Σᵢ [ (xᵢ − μ)ᵀ · Σ⁻¹ · (xᵢ − μ) ]²
Equation 8 — Mardia's Multivariate Kurtosis

— compared with the expected value under normality κexpected = n(n+2), the agent can substitute the estimators of μ and Σ with the Minimum Covariance Determinant (MCD) [7].

The MCD estimates location and dispersion using the subset of h observations (of N total, with h ≈ ⌈N/2⌉) whose covariance matrix has the smallest determinant, effectively discarding the influence of the N−h most extreme outliers from the baseline estimation. The incremental implementation via FAST-MCD [8] is computationally viable for the expected dimensionality (n ≤ 15).

Footprint Trade-Off

The MCD requires storing a window of recent samples (typically 100–500) to recalculate the optimal subset, partially violating the O(1) memory principle of Welford. The trade-off is explicit: statistical robustness versus predictable footprint. MCD activation is conditional — it only occurs when κM diverges significantly from κexpected. In practice, 500 samples with n = 10 variables occupy ~40KB — negligible.

7. Load Direction Index (φ)

The Mahalanobis Distance is inherently non-directional — it measures how far the state has departed from baseline, but not in which direction (overload vs. idleness). To position the system state on the bipolar regime spectrum, HOSA requires a directionality indicator.

φ(t) = (1/n) · Σⱼ sⱼ · dⱼ(t) / σⱼ
Equation 9 — Load Direction Index

Where:

  • dj(t) = xj(t) − μj — deviation of the j-th variable from its baseline mean
  • σj = √Σjj — baseline standard deviation of the j-th variable
  • sj ∈ {+1, −1} — the load sign of the variable: +1 if an increase indicates higher load (CPU utilization, memory usage), −1 if an increase indicates lower load (CPU idle, free memory)
Value of φ(t)MeaningSpectrum Position
φ ≈ 0 System near baseline Regime 0 (Homeostasis)
φ > 0 Deviation toward overload Positive semi-axis (+1 to +5)
φ < 0 Deviation toward idleness Negative semi-axis (−1 to −3)

Together, DM measures the magnitude of deviation and φ indicates the direction. They position the system state in a two-dimensional space (magnitude × direction) that maps directly to the regime spectrum.

8. Regime Taxonomy — The Bipolar Spectrum

HOSA models operational regimes as a continuous bipolar spectrum centered on homeostasis, where the sign indicates direction and the magnitude indicates severity:

Figure 1 The bipolar operational regime spectrum
−3 −2 −1 0 +1 +2 +3 +4 +5
Anomalous Silence Structural Idleness Legitimate Idleness Homeostasis Plateau Shift Seasonality Adversarial Local Failure Viral Propagation
← Sub-demand (deficit) BASELINE Over-demand (excess/pathology) →

8.1. Sub-Demand Regimes (−1, −2, −3)

The inclusion of the negative semi-axis addresses a systematic gap in the anomaly detection literature, which focuses almost exclusively on anomalies of excess. A server processing zero requests when it should be handling thousands is not in homeostasis — it is in anomalous silence.

RegimeDefinitionHOSA BehaviorHabituation
−1 Legitimate Idleness Demand reduction consistent with temporal context (e.g., overnight for a business application). Correlations preserved. Level 0. FinOps reporting. Energy optimization (CPU frequency reduction, sampling interval increase). Incorporated into seasonal profiles
−2 Structural Idleness Node permanently over-provisioned. No temporal window shows full utilization. IPE ≈ 1. Level 0. Critical FinOps signaling with right-sizing recommendations. Exposes hosa.io/structurally-idle annotation. Permitted (with persistent FinOps flag)
−3 Anomalous Silence Abrupt activity drop incoherent with temporal context. The node should be active and is not. Possible: DNS hijack, upstream failure, service crash without restart. Level 1–3. Active investigation: process checks, network verification, upstream health checks. Urgent webhook. Blocked

The critical discriminant between Regime −1 and −3 is temporal coherence. A drop in activity at 03:00 AM is coherent with an overnight profile (Regime −1). A drop at 10:00 AM on a Tuesday, when the profile predicts peak load, is incoherent (Regime −3).

8.2. Regime 0 — Homeostasis

The steady-state normal operation. DM is low and stable, φ oscillates around zero, derivatives are near zero, and the covariance structure is consistent. HOSA operates at Level 0 with the Thalamic Filter active — suppressing redundant telemetry to minimize data ingestion costs, emitting only a periodic heartbeat.

8.3. Over-Demand Regimes (+1 to +5)

RegimeDefinitionKey DiscriminantHabituation
+1 Plateau Shift Persistent, non-reverting elevation from legitimate workload change (new deploy, organic growth). Derivative → 0 while DM remains elevated. Covariance preserved (ρ low). Permitted if pre-conditions met
+2 Seasonality Periodic demand variations following temporal patterns (daily, weekly, monthly). Periodic oscillation in DM. Significant autocorrelation at seasonal lags. Intra-segment (each temporal window has its own profile)
+3 Adversarial Consumption from malicious activity deliberately mimicking legitimate patterns. Layer 7 DDoS, cryptomining, data exfiltration. Covariance deformation (ρ high). Altered syscall entropy ΔH. Reduced work efficiency WEI. Blocked
+4 Local Failure Resource deterioration confined to the local node. Memory leak, disk degradation, fork bomb. Sustained positive derivative. One or two dominant dimensions in decomposition. Low ICP. Blocked while derivative positive
+5 Viral Propagation Malicious activity or cascading failure with propagation component. Worms, lateral movement, cascade backpressure. High ICP — outbound connection explosion, destination entropy, anomalous forks, correlation DM↔net_out. Categorically blocked

8.4. Integrated Classification Matrix

Table 2 Discriminant indicators for each regime along the bipolar spectrum.
Regime DM dDM/dt d²DM/dt² φ ρ ΔH ICP
−3High (abrupt)SpikeVariableStrongly −VariableVariableVariable
−2Chronic low≈ 0≈ 0Persistent −LowLowLow
−1Low vs. temporal profile≈ 0≈ 0NegativeLowLowLow
0Low≈ 0≈ 0≈ 0LowLowLow
+1High, stable≈ 0 (post-transient)≈ 0PositiveLowLowLow
+2OscillatesOscillatesOscillatesOscillatesLowLowLow
+3AnyAnyAnyPositiveHighHighVariable
+4GrowingSustained +VariablePositiveVariableLowLow
+5VariableVariableVariableVariableVariableVariableHigh

9. Supplementary Metrics

Beyond DM and its derivatives, the regime taxonomy requires several additional metrics for complete classification:

MetricSymbolDefinitionSection
Load Direction Index φ(t) Normalized weighted projection of deviation onto load axis — determines semi-axis §7
Excess Provisioning Index IPE Ratio between provisioned capacity and maximum historical utilization — quantifies over-provisioning WP §6.4.3
Covariance Deformation Ratio ρ(t) Frobenius norm of the difference between recent and baseline covariance, normalized: ρ = ‖Σrecent − ΣbaselineF / ‖ΣbaselineF WP §6.5
Syscall Profile Entropy H(S,t), ΔH Shannon entropy of syscall distribution: H = −Σ pᵢ log₂ pᵢ. Change from baseline: ΔH = |H(t) − Hbaseline| WP §6.7
Work Efficiency Index WEI(t) Application throughput / computational resource consumption. Drops under parasitic activity (cryptomining, etc.). WP §6.7
Kernel/User Ratio Rku(t) CPU time in kernel mode / CPU time in user mode. Rises under network attacks and malware. WP §6.7
Propagation Behavior Index ICP(t) Weighted combination: ICP = w₁·Ĉout + w₂·Ĥdest + w₃·F̂anom + w₄·ρ̂DM↔netout WP §6.9
Dimensional Contribution cj Per-dimension decomposition of DM² §2.3

All supplementary metrics are calculated in user space by the mathematical engine. Metrics that depend on kernel counters (syscalls, context switches, network connections) are collected via existing eBPF probes. No external tools or libraries are required.

10. Habituation — Formal Pre-Conditions

The interaction between regimes and habituation is critical enough to warrant explicit formalization. Baseline recalibration is permitted if and only if all of the following conditions are satisfied simultaneously:

Habituation ⟺

  |dD̄M/dt| < εd     (stabilization)
∧ ρ(t) < ρthreshold     (covariance preserved)
∧ ΔH(t) < ΔHthreshold     (syscalls stable)
∧ ICP(t) < ICPthreshold     (no propagation)
∧ DM(t) < DM,safety     (safe plateau)
∧ tstable > Tmin     (sustained stabilization, default: 30 min)
∧ temporal coherence of φ(t)     (if φ < 0, coherent with seasonal profile)
Equation 10 — Habituation Pre-Conditions
Table 3 Habituation policy across the regime spectrum.
RegimeHabituation
−3 — Anomalous SilenceBlocked
−2 — Structural IdlenessPermitted (with persistent FinOps signaling)
−1 — Legitimate IdlenessIncorporated into seasonal profiles
0 — HomeostasisN/A (is the baseline)
+1 — Plateau ShiftPermitted if pre-conditions satisfied
+2 — SeasonalityIntra-segment
+3 — AdversarialBlocked
+4 — Local FailureBlocked while derivative positive
+5 — Viral/PropagationCategorically blocked

The pattern is symmetric: habituation is permitted in the central regimes (−2 to +2), where deviations are legitimate or structural; it is blocked at the extremes (−3, +3 to +5), where deviations are pathological or adversarial. HOSA adapts to legitimate variation but refuses to normalize pathology.

11. Known Limitations

  1. Distribution assumption. DM assumes an approximately ellipsoidal baseline. Workloads with multimodal distributions (e.g., systems alternating between distinct regimes without temporal pattern) may violate this assumption. The robust estimation strategy (§6.3) mitigates but does not eliminate this limitation. Mixture of Gaussians extension is documented as future research.
  2. Cold start. During warm-up (first minutes after initialization), the agent does not have a sufficient baseline. Detection confidence is low. HOSA operates in conservative mode (logging only, no mitigation).
  3. Adversarial evasion. An attacker with knowledge of HOSA's architecture could theoretically execute a "low-and-slow" attack that preserves the covariance structure and syscall distribution, keeping DM, ρ, and ΔH below detection thresholds. Formal adversarial resistance analysis is a future research topic.
  4. EWMA α calibration. The smoothing factor α controls the responsiveness-stability trade-off. Sub-optimal α can cause either false positives (too responsive) or delayed detection (too stable). The experimental phase will quantify sensitivity across workload types.
  5. Non-stationary multimodality. When multimodality is not temporally segregable (system alternates between modes randomly), the seasonal profiles mechanism does not apply. Extension to Mixture of Gaussians with streaming EM requires O(k·n²) per sample and introduces model selection complexity (determining k).

12. References

  1. Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1), 49–55.
  2. Welford, B. P. (1962). Note on a Method for Calculating Corrected Sums of Squares and Products. Technometrics, 4(3), 419–420.
  3. Gnanadesikan, R., & Kettenring, J. R. (1972). Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data. Biometrics, 28(1), 81–124.
  4. Penny, K. I. (1996). Appropriate Critical Values When Testing for a Single Multivariate Outlier by Using the Mahalanobis Distance. Journal of the Royal Statistical Society: Series C, 45(1), 73–81.
  5. Hubert, M., Debruyne, M., & Rousseeuw, P. J. (2018). Minimum Covariance Determinant and Extensions. WIREs Computational Statistics, 10(3), e1421.
  6. Mardia, K. V. (1970). Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika, 57(3), 519–530.
  7. Rousseeuw, P. J. (1984). Least Median of Squares Regression. Journal of the American Statistical Association, 79(388), 871–880.
  8. Rousseeuw, P. J., & Van Driessen, K. (1999). A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics, 41(3), 212–223.
  9. Henze, N., & Zirkler, B. (1990). A Class of Invariant Consistent Tests for Multivariate Normality. Communications in Statistics — Theory and Methods, 19(10), 3595–3617.
  10. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly Detection: A Survey. ACM Computing Surveys, 41(3), Article 15.
  11. Aggarwal, C. C. (2017). Outlier Analysis (2nd ed.). Springer.