1. Configuration Philosophy
HOSA is designed to operate with zero mandatory configuration. Every parameter has a carefully chosen default that works for general-purpose Linux servers. The agent discovers the hardware topology, calibrates its own thresholds during warm-up, and adapts its baselines through habituation — all automatically.
Configuration exists for three purposes:
- Constraining autonomy. Selecting the deployment mode (dry-run, partial, full) to control what the agent is allowed to do.
- Environment context. Providing information the agent cannot discover on its own — webhook URLs, safelist entries, event calendars.
- Expert tuning. Overriding auto-calibrated parameters for operators who understand their workload characteristics and want finer control.
Every parameter documented in this section has a default value that is calibrated during the warm-up phase or set to a conservative constant. Overriding a parameter is always optional and should be done with understanding of the trade-offs involved. When in doubt, use the defaults.
2. File Format and Location
HOSA reads configuration from a single YAML file. The default search order is:
./hosa.yaml— current working directory/etc/hosa/hosa.yaml— system-wide configuration$HOME/.config/hosa/hosa.yaml— user-level configuration
The path can be overridden via the --config CLI flag. All CLI
flags take precedence over file-based configuration.
CLI flags → Environment variables → Config file → Auto-calibrated defaults
Environment variables follow the pattern HOSA_<SECTION>_<KEY>
in uppercase with underscores. For example, HOSA_WARMUP_DURATION=300s
overrides warmup.duration in the YAML file.
3. Deployment Modes
The deployment mode controls the maximum level of autonomy granted to the agent. This is the single most important configuration decision.
3.1. Dry-Run (Observe Only)
The agent performs the full detection pipeline — eBPF collection, state vector construction, Welford updates, DM calculation, derivative estimation, regime classification — but never executes any actuation. Every decision that would have been made is logged with full mathematical context.
Recommended for: Initial evaluation, validation of detection quality, building confidence before enabling actuation.
| Capability | Status |
|---|---|
| eBPF metric collection | ✓ Active |
| Mahalanobis Distance calculation | ✓ Active |
| Derivative estimation | ✓ Active |
| Regime classification | ✓ Active |
| Decision logging | ✓ Active (logs what would happen) |
| Webhooks | ✓ Active (notifications only) |
| cgroup throttling | ✗ Disabled |
| XDP load shedding | ✗ Disabled |
| Process signaling | ✗ Disabled |
| Network isolation | ✗ Disabled |
Required capabilities: CAP_BPF only.
3.2. Partial Actuation
max_level: 3 # ceiling for autonomous action
The agent is permitted to actuate up to a configurable maximum response level. Levels above the ceiling are logged as recommendations but not executed.
Recommended for: Production environments where the operator wants automated soft containment (Levels 0–3) but reserves severe containment and quarantine (Levels 4–5) for human decision.
Required capabilities: CAP_BPF,
CAP_SYS_ADMIN, and CAP_NET_ADMIN if
max_level ≥ 3.
3.3. Full Actuation
The agent has full autonomy across all six response levels (0–5), including autonomous quarantine. This is the mode that fully implements the reflex arc architecture.
Recommended for: Environments where autonomous survival is critical — edge/IoT with intermittent connectivity, air-gapped networks, or nodes that must survive the Lethal Interval without human intervention.
Even in mode: full, the Level 5 quarantine (network isolation)
requires an explicit quarantine.enabled: true flag. This is a
deliberate safety measure: network isolation is the only irreversible
autonomous action, and should never be activated by accident.
4. Core Parameters
4.1. Warm-Up and Proprioception
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Duration | warmup.duration |
5m |
Time spent collecting baseline samples before detection activates. Longer warm-up → more accurate baseline. Shorter warm-up → faster time-to-protection. |
| Min samples | warmup.min_samples |
500 |
Minimum number of samples before the covariance matrix is considered reliable for inversion. |
| Conservative mode | warmup.conservative |
true |
During warm-up, log decisions but do not actuate (equivalent to dry-run for the warm-up period). |
During warm-up, the agent does not have a reliable baseline and operates
in conservative mode. This is an acknowledged vulnerability window. For
nodes where even the warm-up period is critical (high-value targets), the
operator can pre-seed the baseline by providing a previously computed
(μ, Σ) pair via warmup.seed_file — exported from a prior run
or from a similar node.
4.2. EWMA Smoothing (α)
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Alpha | ewma.alpha |
auto |
EWMA smoothing factor. auto calibrates per-resource during warm-up based on observed variance. Manual override: 0.0–1.0 (higher = more responsive, noisier; lower = smoother, slower detection). |
| Alpha range | ewma.alpha_minewma.alpha_max |
0.050.3 |
Bounds for auto-calibrated α. Prevents the algorithm from selecting extreme values. |
4.3. Adaptive Thresholds (θ₁–θ₄)
| Threshold | YAML Path | Default (σ multiplier) | Triggers |
|---|---|---|---|
| θ₁ | thresholds.theta1_sigma |
2.0 |
Level 0 → Level 1 (Vigilance) |
| θ₂ | thresholds.theta2_sigma |
3.0 |
Level 1 → Level 2 (Soft Containment) |
| θ₃ | thresholds.theta3_sigma |
4.0 |
Level 2 → Level 3 (Active Containment) |
| θ₄ | thresholds.theta4_sigma |
5.0 |
Level 3 → Level 4 (Severe Containment) |
Thresholds are expressed as multiples of σ (baseline standard deviation of DM), computed during warm-up. The absolute threshold values are derived as θn = multiplier × σDM. This ensures that thresholds are automatically adapted to the node's behavioral characteristics.
For operators who prefer absolute thresholds, the auto-calibration can be overridden:
mode: absolute # "sigma" (default) or "absolute"
theta1: 3.0
theta2: 5.0
theta3: 7.0
theta4: 9.0
4.4. Tikhonov Regularization (λ)
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Lambda | covariance.tikhonov_lambda |
1e-6 |
Regularization constant added to the diagonal of Σ before inversion. Prevents singularity in systems with collinear variables. Increase if Cholesky decomposition fails. |
5. Response Level Tuning
5.1. Hysteresis Hold Times
Hold times control how long DM must remain below the de-escalation threshold before the response level decreases. Longer hold times prevent oscillation; shorter hold times allow faster recovery.
| Parameter | YAML Path | Default |
|---|---|---|
| Level 1→0 | response.hold_time_1_to_0 | 10s |
| Level 2→1 | response.hold_time_2_to_1 | 30s |
| Level 3→2 | response.hold_time_3_to_2 | 60s |
| Level 4→3 | response.hold_time_4_to_3 | 5m |
5.2. XDP Load Shedding
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Enabled | xdp.enabled |
true |
Enable XDP-based network load shedding at Level 3+. Disable if the NIC driver does not support XDP. |
| Mode | xdp.mode |
native |
native (driver-level, fastest) or generic (SKB-based, universal fallback). |
| Healthcheck sources | xdp.healthcheck_cidrs |
[] |
CIDR blocks that are never dropped, even during full inbound block (Level 4). Typically: load balancer IPs, Kubernetes API server. |
| Interface | xdp.interface |
auto |
Network interface for XDP attachment. auto selects the default route interface. |
6. Safelist Configuration
The safelist defines processes and cgroups that are never targeted for throttling (see §4 — Safelist). Kernel processes and the HOSA agent itself are always protected regardless of configuration.
auto_detect: true # auto-detect kubelet, containerd, dockerd
processes:
- name: "postgres"
- name: "etcd"
- pid_file: "/var/run/nginx.pid"
cgroups:
- "/system.slice/sshd.service"
- "/kubepods/burstable/pod-kube-proxy-*"
labels:
- "hosa.io/protected=true" # cgroup label match
Safelist entries support three matching modes:
- Process name — matched against
/proc/[pid]/comm - PID file — reads the PID from the specified file
- cgroup path — glob pattern matched against the cgroup hierarchy (supports
*wildcards) - cgroup label — matched against labels/annotations on the cgroup (Kubernetes pod labels are propagated to cgroup labels)
The following are protected regardless of configuration and cannot
be removed from the safelist: all kernel threads
(kthreadd descendants), the HOSA agent process itself, and
the init process (PID 1).
7. Habituation Parameters
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Enabled | habituation.enabled |
true |
Enable automatic baseline recalibration. Disable for environments where the baseline should never change after warm-up. |
| Min stabilization | habituation.min_stable_time |
30m |
Tmin — minimum continuous stabilization before habituation activates. |
| Decay rate (λ) | habituation.decay_rate |
0.001 |
Exponential decay rate for weighted Welford. Higher = faster adaptation. |
| Safety ceiling | habituation.dm_safety_max |
auto |
DM,safety — maximum DM that permits habituation. auto sets it to θ₃ × 0.8. |
| ρ threshold | habituation.rho_threshold |
0.25 |
Maximum covariance deformation ratio that permits habituation. |
| ΔH threshold | habituation.delta_h_threshold |
0.5 |
Maximum syscall entropy change that permits habituation. |
| ICP threshold | habituation.icp_threshold |
0.3 |
Maximum propagation index that permits habituation. |
8. Webhook Configuration
Webhooks are opportunistic — dispatched when network connectivity is available, but never required for the agent's primary function. If the webhook endpoint is unreachable, the event is logged locally and the agent continues operating.
enabled: true
endpoints:
- url: "https://hooks.slack.com/services/T00/B00/xxx"
min_level: 2 # only send Level 2+ events
format: slack
- url: "https://api.pagerduty.com/v2/enqueue"
min_level: 3 # only send Level 3+ events
format: pagerduty
auth_token_env: "PAGERDUTY_TOKEN"
- url: "http://localhost:9093/api/v1/alerts"
min_level: 1
format: alertmanager
timeout: 5s
retry_count: 2
retry_delay: 1s
Supported formats:
| Format | Description |
|---|---|
json | Generic JSON payload with full state vector (default) |
alertmanager | Prometheus Alertmanager-compatible alert format |
slack | Slack incoming webhook format with formatted message |
pagerduty | PagerDuty Events API v2 format |
opsgenie | Opsgenie Alert API format |
9. Thalamic Filter
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Enabled | thalamic_filter.enabled |
true |
Suppress redundant telemetry during homeostasis. Disable if external monitoring requires continuous metric flow. |
| Heartbeat interval | thalamic_filter.heartbeat_interval |
60s |
Interval between minimal heartbeat emissions during homeostasis. |
| Metrics endpoint | thalamic_filter.metrics_port |
9100 |
Port for the Prometheus-compatible /metrics endpoint. Set 0 to disable. |
10. Environment Detection and Override
HOSA auto-detects the environment class during proprioception to select the appropriate quarantine strategy (see §4 — Quarantine Modes). The auto-detection can be overridden:
type: auto # auto | bare-metal | cloud | kubernetes | edge-physical | edge-remote | airgap
cloud_provider: auto # auto | aws | gcp | azure | none
self_termination: false # allow cloud instance self-termination (Level 5)
self_termination_timeout: 5m
| Environment | Auto-Detection Method | Quarantine Strategy |
|---|---|---|
bare-metal | IPMI interface in /sys/class/net/ | Deactivate all interfaces except IPMI |
cloud | Metadata service at 169.254.169.254 | XDP drop + cloud-native signaling |
kubernetes | KUBERNETES_SERVICE_HOST env var | cgroup containment + taint + K8s Event |
edge-physical | Explicit config only | Full network deactivation |
edge-remote | Explicit config only | Network deactivation + watchdog timer |
airgap | Explicit config only | Full isolation, no external communication |
11. Self-Containment (Agent Resource Limits)
HOSA practices what it preaches: the agent itself operates within a dedicated cgroup v2 with hard resource limits. If the agent exceeds its own limits, the kernel contains it before it can affect the system.
memory_max: 128M # hard memory ceiling for the agent
cpu_max: "50000 100000" # 50% of one CPU core (50ms per 100ms period)
cgroup_path: /sys/fs/cgroup/hosa.service
This is a deliberate architectural decision, not just good practice. The most common objection to autonomous agents is: "What if the agent itself becomes the problem?" By operating within hard kernel-enforced limits, HOSA cannot consume more than 128MB of memory or 50% of a CPU core, regardless of bugs or unexpected conditions. The kernel enforces these limits — not the agent itself.
12. Logging Configuration
| Parameter | YAML Path | Default | Description |
|---|---|---|---|
| Directory | logging.directory |
/var/log/hosa |
Directory for decision logs and audit trail. |
| Decision log | logging.decision_file |
decisions.log |
File for structured decision log (JSON Lines format). |
| Max file size | logging.max_size |
100M |
Maximum size before log rotation. |
| Retention | logging.max_files |
5 |
Number of rotated files to retain. |
| Level | logging.level |
info |
Agent operational log level: debug, info, warn, error. |
| Include state vector | logging.include_state_vector |
true |
Include the full x(t) vector in every decision log entry. Disable to reduce log volume. |
13. Kubernetes Deployment
13.1. DaemonSet Manifest
HOSA is deployed as a DaemonSet — one instance per node, with access to the host's cgroup filesystem and network namespace.
kind: DaemonSet
metadata:
name: hosa
namespace: kube-system
spec:
selector:
matchLabels:
app: hosa
template:
metadata:
labels:
app: hosa
spec:
hostPID: true
hostNetwork: true
tolerations:
- operator: Exists
containers:
- name: hosa
image: ghcr.io/bricio-sr/hosa:latest
securityContext:
privileged: false
capabilities:
add: [BPF, SYS_ADMIN, NET_ADMIN]
volumeMounts:
- name: sys
mountPath: /sys
- name: cgroup
mountPath: /sys/fs/cgroup
resources:
limits:
memory: 128Mi
cpu: 500m
volumes:
- name: sys
hostPath:
path: /sys
- name: cgroup
hostPath:
path: /sys/fs/cgroup
13.2. Capabilities and Security Context
Following the principle of least privilege, deploy with only the capabilities required for the configured deployment mode:
| Mode | Required Capabilities | Host Access |
|---|---|---|
dry-run |
CAP_BPF |
hostPID: true, read-only /sys |
partial (max_level: 2) |
CAP_BPF, CAP_SYS_ADMIN |
hostPID: true, read-write /sys/fs/cgroup |
partial (max_level: 3–4) |
CAP_BPF, CAP_SYS_ADMIN, CAP_NET_ADMIN |
hostPID: true, hostNetwork: true, read-write /sys/fs/cgroup |
full |
CAP_BPF, CAP_SYS_ADMIN, CAP_NET_ADMIN |
hostPID: true, hostNetwork: true, read-write /sys/fs/cgroup, K8s API access for taints |
The recommended deployment path: start with mode: dry-run and
CAP_BPF only. Review decision logs for 1–2 weeks to validate
detection quality. Then escalate to mode: partial, max_level: 2
for soft containment. Only enable mode: full after building
confidence in the agent's behavior on your specific workload.
14. CLI Reference
| Flag | Short | Default | Description |
|---|---|---|---|
--config |
-c |
auto |
Path to configuration file. |
--mode |
-m |
dry-run |
Deployment mode: dry-run, partial, full. |
--max-level |
5 |
Maximum response level (0–5). Only used with --mode=partial. |
|
--warmup |
-w |
5m |
Warm-up duration (e.g., 300s, 5m, 10m). |
--environment |
-e |
auto |
Environment override. |
--log-level |
info |
Operational log level. | |
--log-dir |
/var/log/hosa |
Log output directory. | |
--metrics-port |
9100 |
Prometheus-compatible metrics endpoint port. 0 to disable. |
|
--version |
-v |
Print version and exit. | |
--validate |
Validate configuration file and exit without starting the agent. | ||
--dump-baseline |
Run warm-up, dump the computed (μ, Σ) baseline to file, and exit. Useful for seeding new nodes. |
Examples:
sudo ./hosa --mode=dry-run
# Partial actuation up to Level 3, custom warm-up
sudo ./hosa --mode=partial --max-level=3 --warmup=10m
# Full actuation with explicit config file
sudo ./hosa --mode=full -c /etc/hosa/production.yaml
# Validate configuration without starting
./hosa --validate -c /etc/hosa/hosa.yaml
# Export baseline for seeding other nodes
sudo ./hosa --dump-baseline --warmup=30m -c /etc/hosa/hosa.yaml
15. Full Configuration Reference
Below is the complete annotated configuration file with all parameters and their default values. This serves as both documentation and a starting template — copy it, uncomment the sections you need, and adjust.
# HOSA Configuration — Full Reference (all defaults shown)
# ============================================================
# --- Deployment Mode ---
mode: dry-run # dry-run | partial | full
max_level: 5 # max response level (partial mode)
# --- Warm-Up ---
warmup:
duration: 5m
min_samples: 500
conservative: true
seed_file: "" # path to pre-computed baseline
# --- EWMA ---
ewma:
alpha: auto # auto | 0.0–1.0
alpha_min: 0.05
alpha_max: 0.3
# --- Thresholds ---
thresholds:
mode: sigma # sigma | absolute
theta1_sigma: 2.0
theta2_sigma: 3.0
theta3_sigma: 4.0
theta4_sigma: 5.0
# --- Covariance ---
covariance:
tikhonov_lambda: 1e-6
# --- Response ---
response:
hold_time_1_to_0: 10s
hold_time_2_to_1: 30s
hold_time_3_to_2: 60s
hold_time_4_to_3: 5m
# --- XDP ---
xdp:
enabled: true
mode: native # native | generic
interface: auto
healthcheck_cidrs: []
# --- Safelist ---
safelist:
auto_detect: true
processes: []
cgroups: []
labels: []
# --- Habituation ---
habituation:
enabled: true
min_stable_time: 30m
decay_rate: 0.001
dm_safety_max: auto
rho_threshold: 0.25
delta_h_threshold: 0.5
icp_threshold: 0.3
# --- Webhooks ---
webhooks:
enabled: false
endpoints: []
timeout: 5s
retry_count: 2
retry_delay: 1s
# --- Thalamic Filter ---
thalamic_filter:
enabled: true
heartbeat_interval: 60s
metrics_port: 9100
# --- Environment ---
environment:
type: auto
cloud_provider: auto
self_termination: false
self_termination_timeout: 5m
# --- Quarantine ---
quarantine:
enabled: false # Level 5 requires explicit opt-in
watchdog_timeout: 30m # edge-remote only
# --- Self-Containment ---
self_containment:
memory_max: 128M
cpu_max: "50000 100000"
# --- Logging ---
logging:
directory: /var/log/hosa
decision_file: decisions.log
max_size: 100M
max_files: 5
level: info
include_state_vector: true
# --- Seasonal Profiles ---
seasonal:
enabled: true
min_observation_days: 7
autocorrelation_threshold: 0.3