Configuration — HOSA Documentation

1. Configuration Philosophy

HOSA is designed to operate with zero mandatory configuration. Every parameter has a carefully chosen default that works for general-purpose Linux servers. The agent discovers the hardware topology, calibrates its own thresholds during warm-up, and adapts its baselines through habituation — all automatically.

Configuration exists for three purposes:

Constraining autonomy. Selecting the deployment mode (dry-run, partial, full) to control what the agent is allowed to do.
Environment context. Providing information the agent cannot discover on its own — webhook URLs, safelist entries, event calendars.
Expert tuning. Overriding auto-calibrated parameters for operators who understand their workload characteristics and want finer control.

Principle: Sensible Defaults, Explicit Overrides

Every parameter documented in this section has a default value that is calibrated during the warm-up phase or set to a conservative constant. Overriding a parameter is always optional and should be done with understanding of the trade-offs involved. When in doubt, use the defaults.

2. File Format and Location

HOSA reads configuration from a single YAML file. The default search order is:

./hosa.yaml — current working directory
/etc/hosa/hosa.yaml — system-wide configuration
$HOME/.config/hosa/hosa.yaml — user-level configuration

The path can be overridden via the --config CLI flag. All CLI flags take precedence over file-based configuration.

Precedence (highest → lowest):

CLI flags → Environment variables → Config file → Auto-calibrated defaults

Configuration Precedence

Environment variables follow the pattern HOSA_<SECTION>_<KEY> in uppercase with underscores. For example, HOSA_WARMUP_DURATION=300s overrides warmup.duration in the YAML file.

3. Deployment Modes

The deployment mode controls the maximum level of autonomy granted to the agent. This is the single most important configuration decision.

3.1. Dry-Run (Observe Only)

mode: dry-run

The agent performs the full detection pipeline — eBPF collection, state vector construction, Welford updates, D_M calculation, derivative estimation, regime classification — but never executes any actuation. Every decision that would have been made is logged with full mathematical context.

Recommended for: Initial evaluation, validation of detection quality, building confidence before enabling actuation.

Capability	Status
eBPF metric collection	✓ Active
Mahalanobis Distance calculation	✓ Active
Derivative estimation	✓ Active
Regime classification	✓ Active
Decision logging	✓ Active (logs what would happen)
Webhooks	✓ Active (notifications only)
cgroup throttling	✗ Disabled
XDP load shedding	✗ Disabled
Process signaling	✗ Disabled
Network isolation	✗ Disabled

Required capabilities: CAP_BPF only.

3.2. Partial Actuation

mode: partial
max_level: 3 # ceiling for autonomous action

The agent is permitted to actuate up to a configurable maximum response level. Levels above the ceiling are logged as recommendations but not executed.

Recommended for: Production environments where the operator wants automated soft containment (Levels 0–3) but reserves severe containment and quarantine (Levels 4–5) for human decision.

Required capabilities: CAP_BPF, CAP_SYS_ADMIN, and CAP_NET_ADMIN if max_level ≥ 3.

3.3. Full Actuation

mode: full

The agent has full autonomy across all six response levels (0–5), including autonomous quarantine. This is the mode that fully implements the reflex arc architecture.

Recommended for: Environments where autonomous survival is critical — edge/IoT with intermittent connectivity, air-gapped networks, or nodes that must survive the Lethal Interval without human intervention.

Level 5 Requires Explicit Opt-In

Even in mode: full, the Level 5 quarantine (network isolation) requires an explicit quarantine.enabled: true flag. This is a deliberate safety measure: network isolation is the only irreversible autonomous action, and should never be activated by accident.

4. Core Parameters

4.1. Warm-Up and Proprioception

Parameter	YAML Path	Default	Description
Duration	`warmup.duration`	`5m`	Time spent collecting baseline samples before detection activates. Longer warm-up → more accurate baseline. Shorter warm-up → faster time-to-protection.
Min samples	`warmup.min_samples`	`500`	Minimum number of samples before the covariance matrix is considered reliable for inversion.
Conservative mode	`warmup.conservative`	`true`	During warm-up, log decisions but do not actuate (equivalent to dry-run for the warm-up period).

Cold Start Trade-Off

During warm-up, the agent does not have a reliable baseline and operates in conservative mode. This is an acknowledged vulnerability window. For nodes where even the warm-up period is critical (high-value targets), the operator can pre-seed the baseline by providing a previously computed (μ, Σ) pair via warmup.seed_file — exported from a prior run or from a similar node.

4.2. EWMA Smoothing (α)

Parameter	YAML Path	Default	Description
Alpha	`ewma.alpha`	`auto`	EWMA smoothing factor. `auto` calibrates per-resource during warm-up based on observed variance. Manual override: 0.0–1.0 (higher = more responsive, noisier; lower = smoother, slower detection).
Alpha range	`ewma.alpha_min` `ewma.alpha_max`	`0.05` `0.3`	Bounds for auto-calibrated α. Prevents the algorithm from selecting extreme values.

4.3. Adaptive Thresholds (θ₁–θ₄)

Threshold	YAML Path	Default (σ multiplier)	Triggers
θ₁	`thresholds.theta1_sigma`	`2.0`	Level 0 → Level 1 (Vigilance)
θ₂	`thresholds.theta2_sigma`	`3.0`	Level 1 → Level 2 (Soft Containment)
θ₃	`thresholds.theta3_sigma`	`4.0`	Level 2 → Level 3 (Active Containment)
θ₄	`thresholds.theta4_sigma`	`5.0`	Level 3 → Level 4 (Severe Containment)

Thresholds are expressed as multiples of σ (baseline standard deviation of D_M), computed during warm-up. The absolute threshold values are derived as θ_n = multiplier × σ_{D_M}. This ensures that thresholds are automatically adapted to the node's behavioral characteristics.

For operators who prefer absolute thresholds, the auto-calibration can be overridden:

thresholds:
  mode: absolute     # "sigma" (default) or "absolute"
  theta1: 3.0
  theta2: 5.0
  theta3: 7.0
  theta4: 9.0

4.4. Tikhonov Regularization (λ)

Parameter	YAML Path	Default	Description
Lambda	`covariance.tikhonov_lambda`	`1e-6`	Regularization constant added to the diagonal of Σ before inversion. Prevents singularity in systems with collinear variables. Increase if Cholesky decomposition fails.

5. Response Level Tuning

5.1. Hysteresis Hold Times

Hold times control how long D_M must remain below the de-escalation threshold before the response level decreases. Longer hold times prevent oscillation; shorter hold times allow faster recovery.

Parameter	YAML Path	Default
Level 1→0	`response.hold_time_1_to_0`	`10s`
Level 2→1	`response.hold_time_2_to_1`	`30s`
Level 3→2	`response.hold_time_3_to_2`	`60s`
Level 4→3	`response.hold_time_4_to_3`	`5m`

5.2. XDP Load Shedding

Parameter	YAML Path	Default	Description
Enabled	`xdp.enabled`	`true`	Enable XDP-based network load shedding at Level 3+. Disable if the NIC driver does not support XDP.
Mode	`xdp.mode`	`native`	`native` (driver-level, fastest) or `generic` (SKB-based, universal fallback).
Healthcheck sources	`xdp.healthcheck_cidrs`	`[]`	CIDR blocks that are never dropped, even during full inbound block (Level 4). Typically: load balancer IPs, Kubernetes API server.
Interface	`xdp.interface`	`auto`	Network interface for XDP attachment. `auto` selects the default route interface.

6. Safelist Configuration

The safelist defines processes and cgroups that are never targeted for throttling (see §4 — Safelist). Kernel processes and the HOSA agent itself are always protected regardless of configuration.

safelist:
  auto_detect: true     # auto-detect kubelet, containerd, dockerd
  processes:
    - name: "postgres"
    - name: "etcd"
    - pid_file: "/var/run/nginx.pid"
  cgroups:
    - "/system.slice/sshd.service"
    - "/kubepods/burstable/pod-kube-proxy-*"
  labels:
    - "hosa.io/protected=true"     # cgroup label match

Safelist entries support three matching modes:

Process name — matched against /proc/[pid]/comm
PID file — reads the PID from the specified file
cgroup path — glob pattern matched against the cgroup hierarchy (supports * wildcards)
cgroup label — matched against labels/annotations on the cgroup (Kubernetes pod labels are propagated to cgroup labels)

Always Protected (Implicit Safelist)

The following are protected regardless of configuration and cannot be removed from the safelist: all kernel threads (kthreadd descendants), the HOSA agent process itself, and the init process (PID 1).

7. Habituation Parameters

Parameter	YAML Path	Default	Description
Enabled	`habituation.enabled`	`true`	Enable automatic baseline recalibration. Disable for environments where the baseline should never change after warm-up.
Min stabilization	`habituation.min_stable_time`	`30m`	T_min — minimum continuous stabilization before habituation activates.
Decay rate (λ)	`habituation.decay_rate`	`0.001`	Exponential decay rate for weighted Welford. Higher = faster adaptation.
Safety ceiling	`habituation.dm_safety_max`	`auto`	D_M,safety — maximum D_M that permits habituation. `auto` sets it to θ₃ × 0.8.
ρ threshold	`habituation.rho_threshold`	`0.25`	Maximum covariance deformation ratio that permits habituation.
ΔH threshold	`habituation.delta_h_threshold`	`0.5`	Maximum syscall entropy change that permits habituation.
ICP threshold	`habituation.icp_threshold`	`0.3`	Maximum propagation index that permits habituation.

8. Webhook Configuration

Webhooks are opportunistic — dispatched when network connectivity is available, but never required for the agent's primary function. If the webhook endpoint is unreachable, the event is logged locally and the agent continues operating.

webhooks:
  enabled: true
  endpoints:
    - url: "https://hooks.slack.com/services/T00/B00/xxx"
      min_level: 2     # only send Level 2+ events
      format: slack
    - url: "https://api.pagerduty.com/v2/enqueue"
      min_level: 3     # only send Level 3+ events
      format: pagerduty
      auth_token_env: "PAGERDUTY_TOKEN"
    - url: "http://localhost:9093/api/v1/alerts"
      min_level: 1
      format: alertmanager
  timeout: 5s
  retry_count: 2
  retry_delay: 1s

Supported formats:

Format	Description
`json`	Generic JSON payload with full state vector (default)
`alertmanager`	Prometheus Alertmanager-compatible alert format
`slack`	Slack incoming webhook format with formatted message
`pagerduty`	PagerDuty Events API v2 format
`opsgenie`	Opsgenie Alert API format

9. Thalamic Filter

Parameter	YAML Path	Default	Description
Enabled	`thalamic_filter.enabled`	`true`	Suppress redundant telemetry during homeostasis. Disable if external monitoring requires continuous metric flow.
Heartbeat interval	`thalamic_filter.heartbeat_interval`	`60s`	Interval between minimal heartbeat emissions during homeostasis.
Metrics endpoint	`thalamic_filter.metrics_port`	`9100`	Port for the Prometheus-compatible `/metrics` endpoint. Set `0` to disable.

10. Environment Detection and Override

HOSA auto-detects the environment class during proprioception to select the appropriate quarantine strategy (see §4 — Quarantine Modes). The auto-detection can be overridden:

Environment	Auto-Detection Method	Quarantine Strategy
`bare-metal`	IPMI interface in `/sys/class/net/`	Deactivate all interfaces except IPMI
`cloud`	Metadata service at 169.254.169.254	XDP drop + cloud-native signaling
`kubernetes`	`KUBERNETES_SERVICE_HOST` env var	cgroup containment + taint + K8s Event
`edge-physical`	Explicit config only	Full network deactivation
`edge-remote`	Explicit config only	Network deactivation + watchdog timer
`airgap`	Explicit config only	Full isolation, no external communication

11. Self-Containment (Agent Resource Limits)

HOSA practices what it preaches: the agent itself operates within a dedicated cgroup v2 with hard resource limits. If the agent exceeds its own limits, the kernel contains it before it can affect the system.

self_containment:
  memory_max: 128M     # hard memory ceiling for the agent
  cpu_max: "50000 100000"     # 50% of one CPU core (50ms per 100ms period)
  cgroup_path: /sys/fs/cgroup/hosa.service

The Agent That Limits Itself

This is a deliberate architectural decision, not just good practice. The most common objection to autonomous agents is: "What if the agent itself becomes the problem?" By operating within hard kernel-enforced limits, HOSA cannot consume more than 128MB of memory or 50% of a CPU core, regardless of bugs or unexpected conditions. The kernel enforces these limits — not the agent itself.

12. Logging Configuration

Parameter	YAML Path	Default	Description
Directory	`logging.directory`	`/var/log/hosa`	Directory for decision logs and audit trail.
Decision log	`logging.decision_file`	`decisions.log`	File for structured decision log (JSON Lines format).
Max file size	`logging.max_size`	`100M`	Maximum size before log rotation.
Retention	`logging.max_files`	`5`	Number of rotated files to retain.
Level	`logging.level`	`info`	Agent operational log level: `debug`, `info`, `warn`, `error`.
Include state vector	`logging.include_state_vector`	`true`	Include the full x(t) vector in every decision log entry. Disable to reduce log volume.

13. Kubernetes Deployment

13.1. DaemonSet Manifest

HOSA is deployed as a DaemonSet — one instance per node, with access to the host's cgroup filesystem and network namespace.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: hosa
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: hosa
  template:
    metadata:
      labels:
        app: hosa
    spec:
      hostPID: true
      hostNetwork: true
      tolerations:
        - operator: Exists
      containers:
        - name: hosa
          image: ghcr.io/bricio-sr/hosa:latest
          securityContext:
            privileged: false
            capabilities:
              add: [BPF, SYS_ADMIN, NET_ADMIN]
          volumeMounts:
            - name: sys
              mountPath: /sys
            - name: cgroup
              mountPath: /sys/fs/cgroup
          resources:
            limits:
              memory: 128Mi
              cpu: 500m
      volumes:
        - name: sys
          hostPath:
            path: /sys
        - name: cgroup
          hostPath:
            path: /sys/fs/cgroup

Kubernetes DaemonSet — Minimal Manifest

13.2. Capabilities and Security Context

Following the principle of least privilege, deploy with only the capabilities required for the configured deployment mode:

Mode	Required Capabilities	Host Access
`dry-run`	`CAP_BPF`	`hostPID: true`, read-only `/sys`
`partial` (max_level: 2)	`CAP_BPF`, `CAP_SYS_ADMIN`	`hostPID: true`, read-write `/sys/fs/cgroup`
`partial` (max_level: 3–4)	`CAP_BPF`, `CAP_SYS_ADMIN`, `CAP_NET_ADMIN`	`hostPID: true`, `hostNetwork: true`, read-write `/sys/fs/cgroup`
`full`	`CAP_BPF`, `CAP_SYS_ADMIN`, `CAP_NET_ADMIN`	`hostPID: true`, `hostNetwork: true`, read-write `/sys/fs/cgroup`, K8s API access for taints

Start Small, Escalate with Confidence

The recommended deployment path: start with mode: dry-run and CAP_BPF only. Review decision logs for 1–2 weeks to validate detection quality. Then escalate to mode: partial, max_level: 2 for soft containment. Only enable mode: full after building confidence in the agent's behavior on your specific workload.

14. CLI Reference

Flag	Short	Default	Description
`--config`	`-c`	`auto`	Path to configuration file.
`--mode`	`-m`	`dry-run`	Deployment mode: `dry-run`, `partial`, `full`.
`--max-level`		`5`	Maximum response level (0–5). Only used with `--mode=partial`.
`--warmup`	`-w`	`5m`	Warm-up duration (e.g., `300s`, `5m`, `10m`).
`--environment`	`-e`	`auto`	Environment override.
`--log-level`		`info`	Operational log level.
`--log-dir`		`/var/log/hosa`	Log output directory.
`--metrics-port`		`9100`	Prometheus-compatible metrics endpoint port. `0` to disable.
`--version`	`-v`		Print version and exit.
`--validate`			Validate configuration file and exit without starting the agent.
`--dump-baseline`			Run warm-up, dump the computed (μ, Σ) baseline to file, and exit. Useful for seeding new nodes.

Examples:

# Observe and log, no actuation
sudo ./hosa --mode=dry-run

# Partial actuation up to Level 3, custom warm-up
sudo ./hosa --mode=partial --max-level=3 --warmup=10m

# Full actuation with explicit config file
sudo ./hosa --mode=full -c /etc/hosa/production.yaml

# Validate configuration without starting
./hosa --validate -c /etc/hosa/hosa.yaml

# Export baseline for seeding other nodes
sudo ./hosa --dump-baseline --warmup=30m -c /etc/hosa/hosa.yaml

15. Full Configuration Reference

Below is the complete annotated configuration file with all parameters and their default values. This serves as both documentation and a starting template — copy it, uncomment the sections you need, and adjust.

# ============================================================
# HOSA Configuration — Full Reference (all defaults shown)
# ============================================================

# --- Deployment Mode ---
mode: dry-run                 # dry-run | partial | full
max_level: 5                 # max response level (partial mode)

# --- Warm-Up ---
warmup:
  duration: 5m
  min_samples: 500
  conservative: true
  seed_file: ""             # path to pre-computed baseline

# --- EWMA ---
ewma:
  alpha: auto               # auto | 0.0–1.0
  alpha_min: 0.05
  alpha_max: 0.3

# --- Thresholds ---
thresholds:
  mode: sigma               # sigma | absolute
  theta1_sigma: 2.0
  theta2_sigma: 3.0
  theta3_sigma: 4.0
  theta4_sigma: 5.0

# --- Covariance ---
covariance:
  tikhonov_lambda: 1e-6

# --- Response ---
response:
  hold_time_1_to_0: 10s
  hold_time_2_to_1: 30s
  hold_time_3_to_2: 60s
  hold_time_4_to_3: 5m

# --- XDP ---
xdp:
  enabled: true
  mode: native              # native | generic
  interface: auto
  healthcheck_cidrs: []

# --- Safelist ---
safelist:
  auto_detect: true
  processes: []
  cgroups: []
  labels: []

# --- Habituation ---
habituation:
  enabled: true
  min_stable_time: 30m
  decay_rate: 0.001
  dm_safety_max: auto
  rho_threshold: 0.25
  delta_h_threshold: 0.5
  icp_threshold: 0.3

# --- Webhooks ---
webhooks:
  enabled: false
  endpoints: []
  timeout: 5s
  retry_count: 2
  retry_delay: 1s

# --- Thalamic Filter ---
thalamic_filter:
  enabled: true
  heartbeat_interval: 60s
  metrics_port: 9100

# --- Environment ---
environment:
  type: auto
  cloud_provider: auto
  self_termination: false
  self_termination_timeout: 5m

# --- Quarantine ---
quarantine:
  enabled: false            # Level 5 requires explicit opt-in
  watchdog_timeout: 30m      # edge-remote only

# --- Self-Containment ---
self_containment:
  memory_max: 128M
  cpu_max: "50000 100000"

# --- Logging ---
logging:
  directory: /var/log/hosa
  decision_file: decisions.log
  max_size: 100M
  max_files: 5
  level: info
  include_state_vector: true

# --- Seasonal Profiles ---
seasonal:
  enabled: true
  min_observation_days: 7
  autocorrelation_threshold: 0.3

Complete hosa.yaml — All Defaults

Coming Soon