Architecture — HOSA Documentation

1. Architectural Principles

The design of HOSA is governed by five non-negotiable principles. These are not aspirational guidelines — they are hard constraints that every design decision must satisfy.

#	Principle	Description
1	Local Autonomy	HOSA must execute its complete detection-and-mitigation cycle without dependency on network, external APIs, or human intervention for its primary function.
2	Zero External Runtime Dependencies	The agent does not depend on external services (TSDB, message brokers, cloud APIs) to operate. All dependencies are internal to the binary or the host kernel. Communication with external systems is opportunistic: performed when available, never required.
3	Predictable Computational Footprint	CPU and memory consumption must be constant and predictable — O(1) in memory, configurable and bounded CPU percentage. The agent must not become the cause of the problem it aims to solve.
4	Graduated Response	Mitigation is not binary. HOSA implements a spectrum of responses proportional to the severity and rate of change of the anomaly, from light priority adjustment to complete network isolation.
5	Decision Observability	Every autonomous action is logged locally with its mathematical justification — D_M value, derivative, threshold crossed, action executed. The agent is fully auditable.

On Principle 3 — Self-Containment

HOSA itself operates within a dedicated cgroup v2 with hard limits on memory.max and cpu.max. If the agent exceeds its own resource limits, the kernel contains it before it affects the system. HOSA practices what it preaches.

2. The Perceptive-Motor Cycle

HOSA operates in a continuous cycle with three functional layers, inspired by the biological separation between sensory system, nervous system, and motor system. These layers map directly to the reflex arc pattern described in §1 — Core Concepts.

Figure 1 The three-layer perceptive-motor cycle

Kernel Space — eBPF

Sensory Probes
tracepoints, kprobes, PSI Actuators
XDP, cgroup controllers

↕ Ring Buffer ↓ BPF Maps ↑ ↕

User Space — Go

Predictive Cortex
Welford → D_M → EWMA → derivatives → decision Opportunistic Comms
webhooks, metrics, audit log

2.1. Sensory Layer — eBPF in Kernel Space

The sensory layer collects system state via eBPF probes attached directly to kernel tracepoints and kprobes. This is fundamentally different from the polling model used by traditional monitoring agents:

Traditional Agent

Reads /proc files periodically
Parses text output, converts to numbers
Interval: 10–60 seconds
Misses transient events between polls
Each read involves syscalls and context switches

HOSA eBPF Probes

Attached to kernel tracepoints at load time
Receives structured data as events fire
Continuous — every relevant kernel event captured
No transient events missed
Data flows via ring buffer with μs latency

The probes collect data across five resource dimensions:

Dimension	Source	Variables
CPU	Tracepoints: `sched_switch`, `sched_process_fork`	Utilization (aggregate + per-core), context switches, run queue depth
Memory	Tracepoints: `mm_page_alloc`, `mm_page_free`; PSI hooks	Usage, pressure (PSI some/full), swap activity, page faults
I/O	Tracepoints: `block_rq_issue`, `block_rq_complete`	Throughput (IOPS), latency, queue depth
Network	Tracepoints: `net_dev_xmit`, `netif_receive_skb`	Packet rate (rx/tx), byte rate, connection count
Scheduler	Tracepoints: `sched_wakeup`, `sched_stat_runtime`	Run queue depth, scheduling latency

These variables compose the state vector x(t) ∈ ℝⁿ that feeds the mathematical engine. The dimensionality n is determined automatically during the warm-up phase based on hardware topology (see §3).

2.2. Cortex Layer — Mathematical Engine

The cortex is the decision-making core of HOSA. It executes a nine-step pipeline on every sample received from the sensory layer:

Receive events from the eBPF ring buffer
Update state vector x(t) with current values
Update μ and Σ incrementally via the Welford algorithm [1] — O(n²) per sample with O(1) memory allocation
Calculate D_M(x(t)) — the Mahalanobis Distance from the baseline profile (see §3 — Math Model for full derivation)
Apply EWMA smoothing → D̄_M(t) to suppress noise before differentiation
Calculate derivatives — dD̄_M/dt (velocity of deviation) and d²D̄_M/dt² (acceleration of deviation)
Evaluate against adaptive thresholds — θ₁, θ₂, θ₃, θ₄ calibrated during warm-up as multiples of the baseline standard deviation
Determine response level (0–5) based on the combination of D_M, its derivatives, and the Load Direction Index φ(t)
Send actuation command via BPF maps back to kernel space

The entire pipeline executes in user space with zero heap allocation on the hot path. All matrix operations use pre-allocated slices; the Welford algorithm updates in-place; and the EWMA filter operates on scalar registers. This ensures that the mathematical engine does not trigger garbage collection during critical decision windows.

Latency Budget

The target latency for the complete cortex pipeline (steps 1–9) is <500μs for a state vector of n=10 dimensions on commodity hardware. The dominant cost is the matrix-vector multiplication in step 4 (O(n²)), which for n=10 involves 100 floating-point operations — trivial on modern CPUs. The ring buffer read (step 1) and BPF map write (step 9) contribute ~1–10μs each.

2.3. Motor Layer — Actuation

The motor layer translates response-level decisions into concrete kernel actions via two primary mechanisms:

Mechanism	Interface	Actions	Latency
cgroups v2	Direct file writes to `/sys/fs/cgroup/`	`memory.high` — apply memory backpressure `cpu.max` — throttle CPU bandwidth `cgroup.freeze` — freeze process group	~10–100μs
XDP	eBPF programs at network driver level	`XDP_DROP` — drop packets before stack processing Selective filtering: new connections dropped, existing preserved Healthcheck traffic always allowed	~1–5μs per packet

A critical design distinction: HOSA's primary containment mechanism is throttling, not killing. Setting memory.high instructs the kernel to apply reclaim pressure on the target cgroup, slowing memory allocation without terminating the process. This preserves in-flight transactions — the process is degraded but alive.

2.4. On Kernel↔User Space Transition

The HOSA execution model involves transition between kernel space (eBPF collection and actuation) and user space (mathematical computation). This transition uses the eBPF ring buffer mechanism and BPF maps, with typical latency on the order of 1–10μs on modern hardware.

Terminology Clarification

The correct characterization of this model is "zero external runtime dependencies" — HOSA does not depend on processes, services, or infrastructure external to the agent binary and the host kernel. The kernel↔user transition is internal to the agent. An earlier version of the whitepaper incorrectly described this as "zero context switch," which has been corrected.

3. Warm-Up and Proprioceptive Calibration

Upon starting, HOSA executes a calibration phase termed Hardware Proprioception — a term borrowed from the biological sense by which an organism perceives its own body configuration. During this phase, HOSA learns both the hardware topology and the behavioral baseline of the node.

Topological discovery. Via reading /sys/devices/system/node/ and /sys/devices/system/cpu/, the agent identifies NUMA topology, physical and logical core count, L1/L2/L3 cache sizes, and memory configuration.
State vector definition. Based on topology, HOSA determines which variables to include in x(t) and their respective eBPF sources. A single-socket server with NVMe storage produces a different vector than a dual-socket machine with spinning disks.
Baseline accumulation. During a configurable period (default: 5 minutes), the agent collects samples without executing mitigation, accumulating the initial μ₀ and Σ₀ via Welford incremental updates. This is the node's baseline profile.
EWMA calibration. The smoothing factor α is calibrated for each resource based on the variance observed during warm-up. Higher-variance signals receive lower α (more smoothing) to prevent false derivatives.
Adaptive threshold definition. The thresholds θ₁ through θ₄ for each response level are calculated as multiples of the standard deviation observed in the baseline regime (e.g., Level 1 = 2σ, Level 3 = 4σ).

After warm-up, μ and Σ continue to be updated incrementally, allowing the baseline profile to evolve with legitimate workload changes (see §5 — Habituation).

Cold Start Vulnerability

During the warm-up period, the agent does not have a sufficient baseline profile for reliable detection. In this interval, HOSA operates in conservative mode — logging only, no mitigation. This constitutes a known vulnerability window and is documented as a limitation in the whitepaper (§9.2). The duration is configurable and can be reduced if the node has a pre-computed baseline from a previous execution.

4. Graduated Response System

The graduated response is one of HOSA's most critical architectural decisions. Rather than implementing a binary switch (healthy → kill), the system defines a spectrum of six proportional response levels, each with specific activation conditions, actions, and reversibility guarantees.

4.1. Response Levels 0–5

Table 1 Complete specification of graduated response levels with activation conditions.

Level	Name	Activation Condition	Action	Reversibility
0	Homeostasis	D_M < θ₁ and dD_M/dt ≤ 0	None. Suppress redundant telemetry (Thalamic Filter). Heartbeat only.	N/A
1	Vigilance	D_M > θ₁ or sustained dD_M/dt > 0	Increase sampling rate (100ms → 10ms). Local logging. No system intervention.	Automatic — returns to L0 when condition ceases
2	Soft Containment	D_M > θ₂ and dD_M/dt > 0	`renice` non-essential processes via cgroups. Webhook notification (opportunistic).	Automatic — gradual renice relaxation
3	Active Containment	D_M > θ₃ and d²D_M/dt² > 0 (positive acceleration)	CPU/memory throttling via cgroups on identified contributors. Partial load shedding via XDP (drop new connections, preserve existing). Urgent webhook.	Automatic with hysteresis — relaxation when D_M < θ₂ for sustained period
4	Severe Containment	D_M > θ₄ or convergence velocity indicates exhaustion within < T seconds	Aggressive throttling. XDP blocks all inbound traffic except orchestrator healthchecks. Freeze non-critical cgroups.	Requires sustained D_M reduction below θ₃ for extended period
5	Quarantine	Containment failure at prior levels. D_M in uncontrolled ascent despite active mitigations.	Network isolation. Non-essential processes frozen (SIGSTOP). Detailed log to persistent storage. Final webhook with quarantine state.	Manual — requires administrative intervention to restore

The key insight in the graduated response design is the use of both the value and the derivatives of D_M for level determination. Level 3 requires not just a high D_M, but positive acceleration — the system is not merely stressed but accelerating toward collapse. This prevents aggressive response during stable-but-elevated workloads (which are handled by the habituation mechanism instead).

4.2. Quarantine Modes by Environment Class

The autonomous quarantine (Level 5) involves network isolation of the compromised node. The feasibility and strategy of this isolation vary fundamentally by infrastructure class. HOSA implements differentiated quarantine modes, selected automatically during Hardware Proprioception or configured explicitly by the operator.

Environment	Detection	Quarantine Strategy	Recovery
Bare metal with IPMI	IPMI interface detection via `/sys/class/net/` and `ipmi_*` kernel modules	Disable all network interfaces except the out-of-band management interface (IPMI/iLO/iDRAC). Node remains accessible via management console.	Manual via IPMI console
Cloud VM (AWS, GCP, Azure)	DMI/SMBIOS, metadata service (169.254.169.254), hypervisor detection	Does not disable interfaces. Instead: (1) XDP drops all traffic except metadata service, DHCP, and orchestrator endpoint. (2) Signals quarantine via cloud-native mechanism (instance tag, SNS, healthcheck → HTTP 503). (3) Orchestrator decides terminate/replace.	Orchestrator terminates and replaces instance. Optional self-termination via cloud API if orchestrator doesn't act within 5 min (disabled by default).
Kubernetes	Container detection via `/proc/1/cgroup`, `KUBERNETES_SERVICE_HOST`	Does not isolate the host node (no permission). Instead: (1) Maximum cgroup containment on offending pods. (2) Applies taint `hosa.io/quarantine=true:NoExecute` and condition `HOSAQuarantine=True` via K8s API, causing pod evacuation. (3) Emits Warning Event.	Operator removes taint after investigation. Node returns to scheduling pool.
Edge/IoT with physical access	Explicit operator configuration	Complete network interface deactivation. Device operates in isolated mode. Logs preserved on local flash/eMMC. LED or display visual signaling if available.	Manual — field technician accesses device, collects logs, restores.
Edge/IoT without physical access	Explicit operator configuration	Network deactivation + hardware watchdog timer (default: 30 min). If no intervention occurs, watchdog reboots device with `quarantine_recovery=true` flag. Agent enters conservative mode post-reboot (logging only) to allow remote diagnosis.	Automatic via watchdog reboot with observation period
Air-gapped (SCADA/ICS)	Explicit operator configuration	Identical to bare metal, with all opportunistic communication permanently disabled. Logs written exclusively to encrypted local storage. Collected periodically by authorized personnel.	Manual via authorized physical access

Design Principle — Automatic Detection with Manual Override

HOSA attempts to automatically detect the environment class and select the appropriate quarantine mode. The operator can override this detection via explicit configuration. In case of ambiguity (e.g., a private cloud VM that doesn't respond to the standard metadata service), HOSA assumes the most conservative mode (cloud VM — does not disable interfaces), prioritizing recoverability over isolation.

4.3. Escalation and Hysteresis

Transitions between response levels are governed by two rules that prevent oscillation (flapping):

Escalation requires sustained condition. The activation condition for a higher level must be met for a minimum sustained period (configurable, default varies by level) before escalation occurs. HOSA does not jump from Level 0 to Level 4 in a single cycle.
De-escalation requires hysteresis. Returning to a lower level requires the condition for the lower level (not just the absence of the higher-level condition) to be sustained. For example, dropping from Level 3 to Level 2 requires D_M < θ₂ (not just D_M < θ₃) for a sustained period. This prevents rapid oscillation at threshold boundaries.

The combination of derivative-based escalation and hysteresis-based de-escalation produces a system that is fast to escalate (responds to acceleration, not just magnitude) but slow to de-escalate (requires confirmed recovery, not just momentary improvement).

5. Habituation — Adapting to the New Baseline

A recurring problem in anomaly detection systems is chronic false positives: when the legitimate workload changes permanently (e.g., deployment of a new application version that consumes more memory), the detector continues signaling anomaly indefinitely.

HOSA implements a habituation mechanism inspired by neuroplasticity:

If D_M remains elevated but stable (derivative near zero) for a configurable period without any real failure (no OOM, no timeout, no process crash);
And the covariance structure is preserved (the deformation ratio ρ(t) is below threshold — resources still correlate in the same proportions, just at higher magnitude);
And no indicators of compromise are present (syscall entropy ΔH and propagation index ICP below thresholds);
Then HOSA recalibrates μ and Σ with increasing weight on recent samples, effectively shifting the baseline profile to the new operational regime.

This is implemented via exponential decay of weights in the Welford algorithm, assigning lower influence to older samples and allowing Σ to reflect the contemporary covariance of the system.

Safeguards Against Premature Habituation

Habituation is blocked when: (a) stabilization occurs near the physical safety limit of a resource (e.g., memory > 90% — stabilizing at 92% is not a safe "new normal"); (b) the covariance deformation ratio ρ(t) exceeds threshold (structural change, not just magnitude change — potentially adversarial); (c) the propagation index ICP is elevated (viral behavior); (d) the derivative remains positive sustained (progressive failure, not stable plateau). The formal pre-condition is documented in the whitepaper §6.12.

6. Selectivity Policy — The Throttling Problem

While throttling via cgroups is an effective mitigation against resource exhaustion, it introduces secondary risks that must be explicitly addressed:

Cascading timeouts. A throttled HTTP backend can cause connection accumulation upstream, propagating degradation to healthy services.
Transaction deadlocks. A process throttled during a database transaction may hold locks indefinitely, blocking other processes.
Critical component starvation. If the Kubernetes kubelet is throttled, the node is marked NotReady and all pods are evacuated — potentially causing more damage than the original problem.

HOSA addresses these risks through a safelist — a protected list of processes and cgroups that are never targets of throttling:

Kernel processes (kthreadd, ksoftirqd, etc.)
The HOSA agent itself
Orchestration agents (kubelet, containerd, dockerd) when detected
Processes explicitly marked by the operator via configuration or cgroup label

Throttling is applied preferentially to the processes identified as greatest contributors to the anomaly, determined by the decomposition of x(t) — the processes whose resource consumption most contributes to the dimensions where D_M diverges from baseline. This dimensional contribution analysis (c_j decomposition) is documented in §3 — Math Model.

7. Project Structure

The codebase follows a layered organization that mirrors the biological metaphor:

hosa/
├── cmd/hosa/
│   └── main.go                # Entry point — agent initialization
├── internal/
│   ├── sysbpf/
│   │   └── syscall.go         # Custom eBPF loader via native syscalls
│   ├── linalg/
│   │   └── matrix.go          # Linear algebra (matrices, inversion, covariance)
│   ├── syscgroup/
│   │   └── file_edit.go       # Direct cgroup file manipulation via VFS
│   ├── bpf/
│   │   ├── sensors.c          # eBPF C code injected into the kernel
│   │   └── bpf_bpfeb.go       # Auto-generated Go↔C bridge (cilium/ebpf)
│   ├── sensor/                # ── The Sensory System
│   │   └── collector.go       # Reads eBPF maps → state vector x(t)
│   ├── brain/                 # ── The Predictive Cortex
│   │   ├── matrix.go          # Covariance matrix management (Welford)
│   │   ├── mahalanobis.go     # D_M calculation
│   │   └── predictor.go       # EWMA, derivatives, level determination
│   ├── motor/                 # ── The Reflex Arc (Actuators)
│   │   ├── cgroups.go         # Process throttling via cgroups v2
│   │   └── signals.go         # Process signaling (SIGSTOP/SIGCONT)
│   └── state/                 # ── The Limbic System
│       └── memory.go          # Short-term ring buffer for baseline
├── docs/
│   ├── whitepaper.pdf         # Full academic whitepaper v2.1
│   └── *.html                 # Documentation pages
└── Makefile                   # make build → compiles eBPF C + Go

The naming convention deliberately uses biological terms (brain/, sensor/, motor/, state/) to maintain the conceptual mapping between architecture and metaphor throughout the codebase.

8. Key Design Decisions

Decision	Rationale
Mahalanobis over ML/DL	O(n²) constant memory, no GPU, no training pipeline, runs on a Raspberry Pi with 512MB RAM. Produces interpretable results (dimensional contributions c_j). Full rationale in §3 — Math Model.
Welford incremental updates	O(n²) per sample with O(1) memory allocation. No data windows stored. Predictable footprint regardless of uptime. For n=10 variables, the entire statistical state occupies <2KB.
EWMA before derivatives	Numerical differentiation of discrete, noisy data is an ill-posed problem in the sense of Hadamard [2]. The second derivative amplifies noise quadratically. EWMA smoothing before differentiation is mandatory for stable derivative estimates.
Go for user space	Pragmatic choice for research velocity. Go 1.22+ GC pauses are sub-millisecond. The hot path uses zero-allocation patterns (`sync.Pool`, pre-allocated slices, `GOGC=off` during critical cycles). The `cilium/ebpf` library provides a mature eBPF ecosystem. If GC pauses prove problematic in benchmarks, the hot path can be migrated to C via CGo.
Throttle, not kill	OOM-Killer already exists and is destructive. HOSA's value proposition is preventing the need for kills by applying graduated pressure early. `memory.high` backpressure preserves in-flight transactions.
Complement, not replace	HOSA is the reflex arc; Prometheus/Datadog/Kubernetes are the cerebral cortex. Different temporal scales, different decision scopes. HOSA keeps the node alive during the Lethal Interval; the orchestrator handles strategic decisions afterward.

9. Self-Protection Mechanisms

A legitimate concern for any autonomous agent running with kernel privileges is: can the agent itself become the cause of the problem? HOSA addresses this through multiple layers of self-protection:

Self-contained footprint. HOSA operates within its own cgroup v2 with hard limits on memory.max and cpu.max. If the agent exceeds its own limits, the kernel constrains it before it affects the system.
Safelist self-inclusion. HOSA is the first entry in its own safelist — it never throttles itself. Kernel processes and orchestration agents are also protected by default.
Reversible mitigation. Levels 0–4 are automatically reversible. No destructive action (process kill, interface deactivation) is executed below Level 5.
Escalation hysteresis. Level transitions require sustained conditions, preventing oscillation. The agent cannot jump from Level 0 to Level 4 in a single cycle.
Dry-run mode. The agent can be executed in observation mode (logging and decision calculation without action execution), allowing validation of decision quality before enabling actuation.
Deterministic compilation. The binary is compiled statically with no dynamic dependencies. No risk of failure due to absent or incompatible shared libraries.
eBPF verifier as safety net. All eBPF programs are validated by the kernel's eBPF verifier before loading. A bug in the eBPF C code causes the program to be rejected at load time (fail-safe), not at runtime.

On the Impossibility of Total Risk Elimination

The total elimination of risk is impossible for any software that executes with kernel-space privileges. Bugs in user space can cause incorrect decisions. The mitigation is: extensive testing, dry-run mode, and the recognition that an agent that makes an incorrect throttling decision (effect: temporary latency increase) is categorically less destructive than the complete absence of mitigation (effect: OOM-Kill, crash, data loss).

10. References

Welford, B. P. (1962). Note on a Method for Calculating Corrected Sums of Squares and Products. Technometrics, 4(3), 419–420.
Hadamard, J. (1902). Sur les problèmes aux dérivées partielles et leur signification physique. Princeton University Bulletin, 13, 49–52.
Heo, T. (2015). Control Group v2. Linux Kernel Documentation. kernel.org
Gregg, B. (2019). BPF Performance Tools: Linux System and Application Observability. Addison-Wesley Professional.
Vieira, M. A., et al. (2020). Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges, and Applications. ACM Computing Surveys, 53(1), Article 16.
Horn, P. (2001). Autonomic Computing: IBM's Perspective on the State of Information Technology. IBM Corporation.
Hellerstein, J. L., Diao, Y., Parekh, S., & Tilbury, D. M. (2004). Feedback Control of Computing Systems. John Wiley & Sons.