Mathematical Model — HOSA Documentation

1. State Representation

HOSA models the instantaneous state of a node as a vector x(t) ∈ ℝⁿ, where each component represents a system resource variable:

x(t) = [ x₁(t), x₂(t), …, xₙ(t) ]ᵀ

In the reference implementation, the variables include (but are not limited to):

CPU utilization (aggregate and per-core)
Memory pressure (utilization, swap, PSI some/full)
I/O throughput and latency
Network packet rate (ingress/egress)
Scheduler run queue depth
Page fault rate
Context switch counters

The exact composition of x(t) is determined during the warm-up phase (see §2 — Architecture, §3) based on the hardware topology discovered via proprioception. A typical server produces a vector with n = 8 to 12 dimensions.

2. The Mahalanobis Distance

2.1. Why Mahalanobis over Static Thresholds

Anomaly detection based on unidimensional static thresholds (e.g., "CPU > 90%") suffers from a fundamental limitation: it ignores the correlation structure between variables.

Scenario A — Legitimate

CPU: 85%
Memory: stable
I/O: low
Network: stable
→ Intensive computation (video rendering)

Scenario B — Imminent Collapse

CPU: 85%
Memory pressure: rising
I/O: stalling
Network latency: spiking
→ Cascading failure in progress

A static threshold on CPU produces the same alert for both scenarios. The Mahalanobis Distance distinguishes them because it measures the deviation from baseline in all dimensions simultaneously, weighted by their covariance. In Scenario A, CPU is high but all other variables are within their expected correlation with CPU — low D_M. In Scenario B, the correlation structure itself is deformed — high D_M.

2.2. Formal Definition

The Mahalanobis Distance [1] measures the distance of an observation x from the multivariate distribution defined by the mean vector μ and the Covariance Matrix Σ:

D_M(x) = √[ (x − μ)ᵀ · Σ⁻¹ · (x − μ) ]

Equation 1 — Mahalanobis Distance

Where:

x — the current state vector x(t) ∈ ℝⁿ
μ — the baseline mean vector, accumulated during warm-up and updated incrementally
Σ — the n×n Covariance Matrix, capturing the correlations between all resource variables
Σ⁻¹ — the inverse (precision matrix), which weights dimensions by their variance and interdependence

The Covariance Matrix Σ is the critical element. It captures how variables co-vary under normal operation. Its inverse Σ⁻¹ ensures that a deviation along a high-variance dimension (e.g., CPU fluctuations during normal operation) contributes less to D_M than the same magnitude deviation along a low-variance dimension (e.g., sudden memory pressure in a normally stable system).

Geometric Interpretation

D_M can be understood as the Euclidean distance in a whitened coordinate system where the covariance matrix is transformed to the identity. In this transformed space, a unit deviation in every dimension has equal significance. The Mahalanobis Distance is therefore a measure of "how many standard deviations away from normal the system is, accounting for all correlations."

2.3. Dimensional Contribution Decomposition

To diagnose which resources are driving the deviation, HOSA decomposes D_M² into per-dimension contributions. Given the deviation vector d = x(t) − μ:

D_M² = dᵀ · Σ⁻¹ · d = Σⱼ cⱼ

where: c_j = d_j · (Σ⁻¹ · d)_j

Equation 2 — Dimensional Contribution

The dimensions with the largest c_j values are the dominant contributors to the anomaly. This decomposition enables HOSA to:

Direct throttling actions to processes consuming the dominant resource
Log the mathematical reason for every decision (auditability)
Provide dimensional context in webhook payloads for operator diagnosis

In the walkthrough scenario from the whitepaper (§5.7), when the memory leak occurs, the decomposition shows mem_used contributing 68% of D_M², mem_pressure contributing 19%, and io_latency contributing 8% — giving the operator an immediate understanding of the anomaly's nature.

3. Incremental Covariance Updates (Welford)

Batch computation of the Covariance Matrix Σ over accumulated data windows is computationally expensive — O(n² · k) for n variables and k samples — and requires memory allocation proportional to the window size. This violates Principle 3 (Predictable Footprint).

HOSA uses the generalized Welford algorithm [2] for online incremental updates of both μ and Σ. Each new sample x(t) updates the statistics in O(n²) with O(1) memory allocation, regardless of the number of accumulated samples:

For each new sample x(t), with count N:

δ = x(t) − μ_old
μ_new = μ_old + δ / N
δ' = x(t) − μ_new
M₂_new = M₂_old + δ · δ'ᵀ

Σ = M₂ / (N − 1)

Equation 3 — Welford Incremental Update (generalized to matrices)

This eliminates the need to store data windows and guarantees a predictable memory footprint. For n = 10 variables, the entire statistical state (μ as 10-vector, M₂ as 10×10 matrix, plus count N) occupies approximately 900 bytes.

Numerical Stability

The Welford algorithm is specifically designed for numerical stability in floating-point arithmetic. Unlike the naïve formula Var(X) = E[X²] − E[X]², which suffers from catastrophic cancellation when the mean is large relative to the variance, the Welford formulation accumulates the centered sum of squares directly, avoiding precision loss. This is critical for HOSA because resource metrics can span several orders of magnitude (e.g., bytes vs. percentages).

4. Covariance Matrix Inversion

The Mahalanobis Distance requires Σ⁻¹. The choice of inversion method involves trade-offs between computational cost, numerical stability, and handling of degenerate cases.

4.1. Cholesky Decomposition

For the expected dimensionality (n ≤ 15), HOSA computes Σ⁻¹ via Cholesky decomposition. Since Σ is symmetric positive semi-definite by construction, it admits the decomposition Σ = LLᵀ where L is lower triangular. The inverse is then computed via forward and backward substitution.

The computational cost is O(n³/3) — for n = 10, this is approximately 333 floating-point operations, completing in sub-microsecond on modern hardware. The Cholesky approach also provides a natural check for positive definiteness: if the decomposition fails (a diagonal element of L becomes non-positive), the matrix is degenerate and requires regularization.

4.2. Tikhonov Regularization

In systems with highly collinear variables (e.g., cpu_user and cpu_total), Σ can become singular or ill-conditioned. HOSA applies Tikhonov regularization:

Σ_reg = Σ + λI

Equation 4 — Tikhonov Regularization

where λ is a small positive constant and I is the identity matrix. This adds a small positive value to the diagonal, guaranteeing invertibility. The parameter λ controls the trade-off between fidelity to the observed covariance and numerical stability. In practice, λ = 10⁻⁶ is sufficient for metric data without noticeably distorting D_M.

Sherman-Morrison-Woodbury for Higher Dimensions

For dimensionality greater than n = 15, HOSA can optionally use the Sherman-Morrison-Woodbury formula for incremental updates of Σ⁻¹ directly, avoiding full re-inversion at each sample. This reduces the per-sample cost of maintaining the precision matrix from O(n³) to O(n²). In the current reference implementation (n ≤ 12), full Cholesky re-inversion at each sample is computationally trivial.

5. Temporal Derivatives and EWMA Smoothing

HOSA does not act on the instantaneous value of D_M, but on its temporal rate of change — the velocity and acceleration with which the system departs from homeostasis. This is a critical distinction: it allows HOSA to detect that a system is heading toward collapse, not merely that it has arrived at a critical state.

5.1. EWMA Smoothing

Before computing derivatives, the raw D_M signal is smoothed via an Exponentially Weighted Moving Average:

D̄_M(t) = α · D_M(t) + (1 − α) · D̄_M(t−1)

Equation 5 — EWMA Smoothing

The factor α controls the fundamental trade-off between responsiveness (high α preserves rapid variations but retains noise) and stability (low α smooths the signal but introduces detection latency).

The calibration of α is performed during the warm-up phase and constitutes one of the critical parameters of the architecture. The documentation of α sensitivity analysis against synthetic and real collapse datasets is a planned deliverable of the experimental phase.

5.2. First Derivative — Velocity of Deviation

dD̄_M/dt ≈ [ D̄_M(t) − D̄_M(t − Δt) ] / Δt

Equation 6 — First Derivative (finite difference)

The first derivative indicates the speed of departure from homeostasis. A sustained positive dD̄_M/dt indicates that the system is moving away from its baseline — the anomaly is growing. A negative value indicates recovery.

5.3. Second Derivative — Acceleration

d²D̄_M/dt² ≈ [ dD̄_M/dt |_t − dD̄_M/dt |_t−Δt ] / Δt

Equation 7 — Second Derivative (finite difference)

The second derivative is the most critical signal for response-level determination:

d²D̄_M/dt² > 0 — the system is accelerating toward collapse. This is the activation condition for Level 3 (Active Containment).
d²D̄_M/dt² < 0 — the system is decelerating. If containment is active, this indicates the mitigation is effective.
d²D̄_M/dt² ≈ 0 — the system is in steady state (stable or stable-at-elevated). This is the condition for habituation evaluation.

The Ill-Posed Nature of Numerical Differentiation

Numerical differentiation is an ill-posed problem in the sense of Hadamard: small perturbations in input data produce large variations in the computed derivative. The second derivative amplifies this effect quadratically. Without treatment, the second derivative of noisy kernel time series oscillates violently, generating false positives. This is why EWMA smoothing before differentiation is mandatory, not optional. The smoothing factor α directly controls the stability-responsiveness trade-off of the derivative estimates.

5.4. Alternative Under Investigation: Kalman Filter

The unidimensional Kalman Filter offers optimal state estimation under Gaussian noise, with the advantage of dynamically adapting to the observed variance. It provides a more principled approach than EWMA because it explicitly models the process noise and measurement noise as separate parameters.

The comparative analysis of EWMA vs. Kalman will be presented in the experimental phase of the dissertation, measuring: detection latency, false positive rate, and computational overhead under identical collapse scenarios.

6. Robustness Under Non-Normality

The Mahalanobis Distance, as formulated, implicitly assumes that the baseline profile follows an approximately ellipsoidal (multivariate normal) distribution. This assumption deserves explicit analysis, as kernel metrics in real systems frequently exhibit characteristics that violate it.

6.1. Expected Violations in Kernel Metrics

Violation Class	Example in Kernel Metrics	Impact on D_M
Heavy tails	I/O latency: most operations complete in microseconds, but outliers of hundreds of milliseconds occur more frequently than the normal distribution predicts.	D_M underestimates the frequency of legitimate extreme values, potentially generating false positives on tail events.
Skewness	CPU utilization: distribution frequently concentrated near 0% (idle) or 100% (saturated), depending on operational regime.	μ and Σ may not adequately represent the center and dispersion of the actual distribution, displacing the detector.
Multimodality	Systems that alternate between two distinct operational regimes (e.g., batch server processing jobs hourly — alternation between total idleness and full load).	μ computed as arithmetic mean locates itself between the two modes, where few real samples exist. D_M classifies normal behavior of both modes as anomalous.

6.2. Evidence of Robustness in Literature

The robustness of D_M as an outlier detector under moderate normality violations is documented in the literature:

Gnanadesikan & Kettenring (1972) [3] demonstrated that covariance-based estimators maintain discriminative capacity under elliptical non-normal distributions, losing exact probabilistic interpretation but preserving relative ordering — more anomalous observations continue producing higher D_M.
Penny (1996) [4] analyzed D_M performance under various non-Gaussian distributions, confirming graceful degradation: error rate increases under severe violations but the detector does not collapse abruptly.
Hubert, Debruyne & Rousseeuw (2018) [5] demonstrated that combining D_M with robust location and dispersion estimators preserves detection efficacy even under up to 25% sample contamination by outliers.

A critical observation: HOSA operates primarily on the rate of change of D_M (derivatives), not its absolute value. Even if the absolute value loses exact probabilistic interpretation under non-normality, the derivatives dD_M/dt and d²D_M/dt² remain valid indicators of acceleration toward collapse, because they reflect temporal dynamics, not probabilistic magnitude.

6.3. Mitigation Strategy: Robust Estimation

HOSA implements a two-level strategy for addressing severe normality violations:

Level 1 — Regularization (default). The Tikhonov regularization already applied (Σ_reg = Σ + λI) partially mitigates sensitivity to outliers by stabilizing the matrix inversion, functioning as a form of shrinkage.

Level 2 — Robust estimation (conditional activation). When HOSA detects that the baseline distribution severely violates normality — operationalized via continuous monitoring of Mardia's multivariate kurtosis [6]:

κ_M = (1/N) · Σᵢ [ (xᵢ − μ)ᵀ · Σ⁻¹ · (xᵢ − μ) ]²

Equation 8 — Mardia's Multivariate Kurtosis

— compared with the expected value under normality κ_expected = n(n+2), the agent can substitute the estimators of μ and Σ with the Minimum Covariance Determinant (MCD) [7].

The MCD estimates location and dispersion using the subset of h observations (of N total, with h ≈ ⌈N/2⌉) whose covariance matrix has the smallest determinant, effectively discarding the influence of the N−h most extreme outliers from the baseline estimation. The incremental implementation via FAST-MCD [8] is computationally viable for the expected dimensionality (n ≤ 15).

Footprint Trade-Off

The MCD requires storing a window of recent samples (typically 100–500) to recalculate the optimal subset, partially violating the O(1) memory principle of Welford. The trade-off is explicit: statistical robustness versus predictable footprint. MCD activation is conditional — it only occurs when κ_M diverges significantly from κ_expected. In practice, 500 samples with n = 10 variables occupy ~40KB — negligible.

7. Load Direction Index (φ)

The Mahalanobis Distance is inherently non-directional — it measures how far the state has departed from baseline, but not in which direction (overload vs. idleness). To position the system state on the bipolar regime spectrum, HOSA requires a directionality indicator.

φ(t) = (1/n) · Σⱼ sⱼ · dⱼ(t) / σⱼ

Equation 9 — Load Direction Index

Where:

d_j(t) = x_j(t) − μ_j — deviation of the j-th variable from its baseline mean
σ_j = √Σ_jj — baseline standard deviation of the j-th variable
s_j ∈ {+1, −1} — the load sign of the variable: +1 if an increase indicates higher load (CPU utilization, memory usage), −1 if an increase indicates lower load (CPU idle, free memory)

Value of φ(t)	Meaning	Spectrum Position
φ ≈ 0	System near baseline	Regime 0 (Homeostasis)
φ > 0	Deviation toward overload	Positive semi-axis (+1 to +5)
φ < 0	Deviation toward idleness	Negative semi-axis (−1 to −3)

Together, D_M measures the magnitude of deviation and φ indicates the direction. They position the system state in a two-dimensional space (magnitude × direction) that maps directly to the regime spectrum.

8. Regime Taxonomy — The Bipolar Spectrum

HOSA models operational regimes as a continuous bipolar spectrum centered on homeostasis, where the sign indicates direction and the magnitude indicates severity:

Figure 1 The bipolar operational regime spectrum

−3 −2 −1 0 +1 +2 +3 +4 +5

Anomalous Silence Structural Idleness Legitimate Idleness Homeostasis Plateau Shift Seasonality Adversarial Local Failure Viral Propagation

← Sub-demand (deficit) BASELINE Over-demand (excess/pathology) →

8.1. Sub-Demand Regimes (−1, −2, −3)

The inclusion of the negative semi-axis addresses a systematic gap in the anomaly detection literature, which focuses almost exclusively on anomalies of excess. A server processing zero requests when it should be handling thousands is not in homeostasis — it is in anomalous silence.

Regime	Definition	HOSA Behavior	Habituation
−1 Legitimate Idleness	Demand reduction consistent with temporal context (e.g., overnight for a business application). Correlations preserved.	Level 0. FinOps reporting. Energy optimization (CPU frequency reduction, sampling interval increase).	Incorporated into seasonal profiles
−2 Structural Idleness	Node permanently over-provisioned. No temporal window shows full utilization. IPE ≈ 1.	Level 0. Critical FinOps signaling with right-sizing recommendations. Exposes `hosa.io/structurally-idle` annotation.	Permitted (with persistent FinOps flag)
−3 Anomalous Silence	Abrupt activity drop incoherent with temporal context. The node should be active and is not. Possible: DNS hijack, upstream failure, service crash without restart.	Level 1–3. Active investigation: process checks, network verification, upstream health checks. Urgent webhook.	Blocked

The critical discriminant between Regime −1 and −3 is temporal coherence. A drop in activity at 03:00 AM is coherent with an overnight profile (Regime −1). A drop at 10:00 AM on a Tuesday, when the profile predicts peak load, is incoherent (Regime −3).

8.2. Regime 0 — Homeostasis

The steady-state normal operation. D_M is low and stable, φ oscillates around zero, derivatives are near zero, and the covariance structure is consistent. HOSA operates at Level 0 with the Thalamic Filter active — suppressing redundant telemetry to minimize data ingestion costs, emitting only a periodic heartbeat.

8.3. Over-Demand Regimes (+1 to +5)

Regime	Definition	Key Discriminant	Habituation
+1 Plateau Shift	Persistent, non-reverting elevation from legitimate workload change (new deploy, organic growth).	Derivative → 0 while D_M remains elevated. Covariance preserved (ρ low).	Permitted if pre-conditions met
+2 Seasonality	Periodic demand variations following temporal patterns (daily, weekly, monthly).	Periodic oscillation in D_M. Significant autocorrelation at seasonal lags.	Intra-segment (each temporal window has its own profile)
+3 Adversarial	Consumption from malicious activity deliberately mimicking legitimate patterns. Layer 7 DDoS, cryptomining, data exfiltration.	Covariance deformation (ρ high). Altered syscall entropy ΔH. Reduced work efficiency WEI.	Blocked
+4 Local Failure	Resource deterioration confined to the local node. Memory leak, disk degradation, fork bomb.	Sustained positive derivative. One or two dominant dimensions in decomposition. Low ICP.	Blocked while derivative positive
+5 Viral Propagation	Malicious activity or cascading failure with propagation component. Worms, lateral movement, cascade backpressure.	High ICP — outbound connection explosion, destination entropy, anomalous forks, correlation D_M↔net_out.	Categorically blocked

8.4. Integrated Classification Matrix

Table 2 Discriminant indicators for each regime along the bipolar spectrum.

Regime	D_M	dD_M/dt	d²D_M/dt²	φ	ρ	ΔH	ICP
−3	High (abrupt)	Spike	Variable	Strongly −	Variable	Variable	Variable
−2	Chronic low	≈ 0	≈ 0	Persistent −	Low	Low	Low
−1	Low vs. temporal profile	≈ 0	≈ 0	Negative	Low	Low	Low
0	Low	≈ 0	≈ 0	≈ 0	Low	Low	Low
+1	High, stable	≈ 0 (post-transient)	≈ 0	Positive	Low	Low	Low
+2	Oscillates	Oscillates	Oscillates	Oscillates	Low	Low	Low
+3	Any	Any	Any	Positive	High	High	Variable
+4	Growing	Sustained +	Variable	Positive	Variable	Low	Low
+5	Variable	Variable	Variable	Variable	Variable	Variable	High

9. Supplementary Metrics

Beyond D_M and its derivatives, the regime taxonomy requires several additional metrics for complete classification:

Metric	Symbol	Definition	Section
Load Direction Index	φ(t)	Normalized weighted projection of deviation onto load axis — determines semi-axis	§7
Excess Provisioning Index	IPE	Ratio between provisioned capacity and maximum historical utilization — quantifies over-provisioning	WP §6.4.3
Covariance Deformation Ratio	ρ(t)	Frobenius norm of the difference between recent and baseline covariance, normalized: ρ = ‖Σ_recent − Σ_baseline‖_F / ‖Σ_baseline‖_F	WP §6.5
Syscall Profile Entropy	H(S,t), ΔH	Shannon entropy of syscall distribution: H = −Σ pᵢ log₂ pᵢ. Change from baseline: ΔH = \|H(t) − H_baseline\|	WP §6.7
Work Efficiency Index	WEI(t)	Application throughput / computational resource consumption. Drops under parasitic activity (cryptomining, etc.).	WP §6.7
Kernel/User Ratio	R_ku(t)	CPU time in kernel mode / CPU time in user mode. Rises under network attacks and malware.	WP §6.7
Propagation Behavior Index	ICP(t)	Weighted combination: ICP = w₁·Ĉ_out + w₂·Ĥ_dest + w₃·F̂_anom + w₄·ρ̂_{D_M↔net_out}	WP §6.9
Dimensional Contribution	c_j	Per-dimension decomposition of D_M²	§2.3

All supplementary metrics are calculated in user space by the mathematical engine. Metrics that depend on kernel counters (syscalls, context switches, network connections) are collected via existing eBPF probes. No external tools or libraries are required.

10. Habituation — Formal Pre-Conditions

The interaction between regimes and habituation is critical enough to warrant explicit formalization. Baseline recalibration is permitted if and only if all of the following conditions are satisfied simultaneously:

Habituation ⟺

  |dD̄_M/dt| < ε_d     (stabilization)
∧ ρ(t) < ρ_threshold     (covariance preserved)
∧ ΔH(t) < ΔH_threshold     (syscalls stable)
∧ ICP(t) < ICP_threshold     (no propagation)
∧ D_M(t) < D_M,safety     (safe plateau)
∧ t_stable > T_min     (sustained stabilization, default: 30 min)
∧ temporal coherence of φ(t)     (if φ < 0, coherent with seasonal profile)

Equation 10 — Habituation Pre-Conditions

Table 3 Habituation policy across the regime spectrum.

Regime	Habituation
−3 — Anomalous Silence	Blocked
−2 — Structural Idleness	Permitted (with persistent FinOps signaling)
−1 — Legitimate Idleness	Incorporated into seasonal profiles
0 — Homeostasis	N/A (is the baseline)
+1 — Plateau Shift	Permitted if pre-conditions satisfied
+2 — Seasonality	Intra-segment
+3 — Adversarial	Blocked
+4 — Local Failure	Blocked while derivative positive
+5 — Viral/Propagation	Categorically blocked

The pattern is symmetric: habituation is permitted in the central regimes (−2 to +2), where deviations are legitimate or structural; it is blocked at the extremes (−3, +3 to +5), where deviations are pathological or adversarial. HOSA adapts to legitimate variation but refuses to normalize pathology.

11. Known Limitations

Distribution assumption. D_M assumes an approximately ellipsoidal baseline. Workloads with multimodal distributions (e.g., systems alternating between distinct regimes without temporal pattern) may violate this assumption. The robust estimation strategy (§6.3) mitigates but does not eliminate this limitation. Mixture of Gaussians extension is documented as future research.
Cold start. During warm-up (first minutes after initialization), the agent does not have a sufficient baseline. Detection confidence is low. HOSA operates in conservative mode (logging only, no mitigation).
Adversarial evasion. An attacker with knowledge of HOSA's architecture could theoretically execute a "low-and-slow" attack that preserves the covariance structure and syscall distribution, keeping D_M, ρ, and ΔH below detection thresholds. Formal adversarial resistance analysis is a future research topic.
EWMA α calibration. The smoothing factor α controls the responsiveness-stability trade-off. Sub-optimal α can cause either false positives (too responsive) or delayed detection (too stable). The experimental phase will quantify sensitivity across workload types.
Non-stationary multimodality. When multimodality is not temporally segregable (system alternates between modes randomly), the seasonal profiles mechanism does not apply. Extension to Mixture of Gaussians with streaming EM requires O(k·n²) per sample and introduces model selection complexity (determining k).

12. References

Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1), 49–55.
Welford, B. P. (1962). Note on a Method for Calculating Corrected Sums of Squares and Products. Technometrics, 4(3), 419–420.
Gnanadesikan, R., & Kettenring, J. R. (1972). Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data. Biometrics, 28(1), 81–124.
Penny, K. I. (1996). Appropriate Critical Values When Testing for a Single Multivariate Outlier by Using the Mahalanobis Distance. Journal of the Royal Statistical Society: Series C, 45(1), 73–81.
Hubert, M., Debruyne, M., & Rousseeuw, P. J. (2018). Minimum Covariance Determinant and Extensions. WIREs Computational Statistics, 10(3), e1421.
Mardia, K. V. (1970). Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika, 57(3), 519–530.
Rousseeuw, P. J. (1984). Least Median of Squares Regression. Journal of the American Statistical Association, 79(388), 871–880.
Rousseeuw, P. J., & Van Driessen, K. (1999). A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics, 41(3), 212–223.
Henze, N., & Zirkler, B. (1990). A Class of Invariant Consistent Tests for Multivariate Normality. Communications in Statistics — Theory and Methods, 19(10), 3595–3617.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly Detection: A Survey. ACM Computing Surveys, 41(3), Article 15.
Aggarwal, C. C. (2017). Outlier Analysis (2nd ed.). Springer.