HOSA / Documentation / Configuration

Coming Soon

This documentation page is currently being finalized. Full configuration reference will be available shortly.

Under Development
Back to Home
§7

Configuration

Deployment modes, core parameters, response level tuning, safelist, webhooks, environment detection, Kubernetes DaemonSet deployment, self-containment, logging, and CLI reference.

Reference Implementation — Phase 1 · ~20 min read

1. Configuration Philosophy

HOSA is designed to operate with zero mandatory configuration. Every parameter has a carefully chosen default that works for general-purpose Linux servers. The agent discovers the hardware topology, calibrates its own thresholds during warm-up, and adapts its baselines through habituation — all automatically.

Configuration exists for three purposes:

  1. Constraining autonomy. Selecting the deployment mode (dry-run, partial, full) to control what the agent is allowed to do.
  2. Environment context. Providing information the agent cannot discover on its own — webhook URLs, safelist entries, event calendars.
  3. Expert tuning. Overriding auto-calibrated parameters for operators who understand their workload characteristics and want finer control.
Principle: Sensible Defaults, Explicit Overrides

Every parameter documented in this section has a default value that is calibrated during the warm-up phase or set to a conservative constant. Overriding a parameter is always optional and should be done with understanding of the trade-offs involved. When in doubt, use the defaults.

2. File Format and Location

HOSA reads configuration from a single YAML file. The default search order is:

  1. ./hosa.yaml — current working directory
  2. /etc/hosa/hosa.yaml — system-wide configuration
  3. $HOME/.config/hosa/hosa.yaml — user-level configuration

The path can be overridden via the --config CLI flag. All CLI flags take precedence over file-based configuration.

Precedence (highest → lowest):

CLI flags → Environment variables → Config file → Auto-calibrated defaults
Configuration Precedence

Environment variables follow the pattern HOSA_<SECTION>_<KEY> in uppercase with underscores. For example, HOSA_WARMUP_DURATION=300s overrides warmup.duration in the YAML file.

3. Deployment Modes

The deployment mode controls the maximum level of autonomy granted to the agent. This is the single most important configuration decision.

3.1. Dry-Run (Observe Only)

mode: dry-run

The agent performs the full detection pipeline — eBPF collection, state vector construction, Welford updates, DM calculation, derivative estimation, regime classification — but never executes any actuation. Every decision that would have been made is logged with full mathematical context.

Recommended for: Initial evaluation, validation of detection quality, building confidence before enabling actuation.

CapabilityStatus
eBPF metric collection✓ Active
Mahalanobis Distance calculation✓ Active
Derivative estimation✓ Active
Regime classification✓ Active
Decision logging✓ Active (logs what would happen)
Webhooks✓ Active (notifications only)
cgroup throttling✗ Disabled
XDP load shedding✗ Disabled
Process signaling✗ Disabled
Network isolation✗ Disabled

Required capabilities: CAP_BPF only.

3.2. Partial Actuation

mode: partial
max_level: 3    # ceiling for autonomous action

The agent is permitted to actuate up to a configurable maximum response level. Levels above the ceiling are logged as recommendations but not executed.

Recommended for: Production environments where the operator wants automated soft containment (Levels 0–3) but reserves severe containment and quarantine (Levels 4–5) for human decision.

Required capabilities: CAP_BPF, CAP_SYS_ADMIN, and CAP_NET_ADMIN if max_level ≥ 3.

3.3. Full Actuation

mode: full

The agent has full autonomy across all six response levels (0–5), including autonomous quarantine. This is the mode that fully implements the reflex arc architecture.

Recommended for: Environments where autonomous survival is critical — edge/IoT with intermittent connectivity, air-gapped networks, or nodes that must survive the Lethal Interval without human intervention.

Level 5 Requires Explicit Opt-In

Even in mode: full, the Level 5 quarantine (network isolation) requires an explicit quarantine.enabled: true flag. This is a deliberate safety measure: network isolation is the only irreversible autonomous action, and should never be activated by accident.

4. Core Parameters

4.1. Warm-Up and Proprioception

ParameterYAML PathDefaultDescription
Duration warmup.duration 5m Time spent collecting baseline samples before detection activates. Longer warm-up → more accurate baseline. Shorter warm-up → faster time-to-protection.
Min samples warmup.min_samples 500 Minimum number of samples before the covariance matrix is considered reliable for inversion.
Conservative mode warmup.conservative true During warm-up, log decisions but do not actuate (equivalent to dry-run for the warm-up period).
Cold Start Trade-Off

During warm-up, the agent does not have a reliable baseline and operates in conservative mode. This is an acknowledged vulnerability window. For nodes where even the warm-up period is critical (high-value targets), the operator can pre-seed the baseline by providing a previously computed (μ, Σ) pair via warmup.seed_file — exported from a prior run or from a similar node.

4.2. EWMA Smoothing (α)

ParameterYAML PathDefaultDescription
Alpha ewma.alpha auto EWMA smoothing factor. auto calibrates per-resource during warm-up based on observed variance. Manual override: 0.0–1.0 (higher = more responsive, noisier; lower = smoother, slower detection).
Alpha range ewma.alpha_min
ewma.alpha_max
0.05
0.3
Bounds for auto-calibrated α. Prevents the algorithm from selecting extreme values.

4.3. Adaptive Thresholds (θ₁–θ₄)

ThresholdYAML PathDefault (σ multiplier)Triggers
θ₁ thresholds.theta1_sigma 2.0 Level 0 → Level 1 (Vigilance)
θ₂ thresholds.theta2_sigma 3.0 Level 1 → Level 2 (Soft Containment)
θ₃ thresholds.theta3_sigma 4.0 Level 2 → Level 3 (Active Containment)
θ₄ thresholds.theta4_sigma 5.0 Level 3 → Level 4 (Severe Containment)

Thresholds are expressed as multiples of σ (baseline standard deviation of DM), computed during warm-up. The absolute threshold values are derived as θn = multiplier × σDM. This ensures that thresholds are automatically adapted to the node's behavioral characteristics.

For operators who prefer absolute thresholds, the auto-calibration can be overridden:

thresholds:
  mode: absolute     # "sigma" (default) or "absolute"
  theta1: 3.0
  theta2: 5.0
  theta3: 7.0
  theta4: 9.0

4.4. Tikhonov Regularization (λ)

ParameterYAML PathDefaultDescription
Lambda covariance.tikhonov_lambda 1e-6 Regularization constant added to the diagonal of Σ before inversion. Prevents singularity in systems with collinear variables. Increase if Cholesky decomposition fails.

5. Response Level Tuning

5.1. Hysteresis Hold Times

Hold times control how long DM must remain below the de-escalation threshold before the response level decreases. Longer hold times prevent oscillation; shorter hold times allow faster recovery.

ParameterYAML PathDefault
Level 1→0response.hold_time_1_to_010s
Level 2→1response.hold_time_2_to_130s
Level 3→2response.hold_time_3_to_260s
Level 4→3response.hold_time_4_to_35m

5.2. XDP Load Shedding

ParameterYAML PathDefaultDescription
Enabled xdp.enabled true Enable XDP-based network load shedding at Level 3+. Disable if the NIC driver does not support XDP.
Mode xdp.mode native native (driver-level, fastest) or generic (SKB-based, universal fallback).
Healthcheck sources xdp.healthcheck_cidrs [] CIDR blocks that are never dropped, even during full inbound block (Level 4). Typically: load balancer IPs, Kubernetes API server.
Interface xdp.interface auto Network interface for XDP attachment. auto selects the default route interface.

6. Safelist Configuration

The safelist defines processes and cgroups that are never targeted for throttling (see §4 — Safelist). Kernel processes and the HOSA agent itself are always protected regardless of configuration.

safelist:
  auto_detect: true     # auto-detect kubelet, containerd, dockerd
  processes:
    - name: "postgres"
    - name: "etcd"
    - pid_file: "/var/run/nginx.pid"
  cgroups:
    - "/system.slice/sshd.service"
    - "/kubepods/burstable/pod-kube-proxy-*"
  labels:
    - "hosa.io/protected=true"     # cgroup label match

Safelist entries support three matching modes:

  • Process name — matched against /proc/[pid]/comm
  • PID file — reads the PID from the specified file
  • cgroup path — glob pattern matched against the cgroup hierarchy (supports * wildcards)
  • cgroup label — matched against labels/annotations on the cgroup (Kubernetes pod labels are propagated to cgroup labels)
Always Protected (Implicit Safelist)

The following are protected regardless of configuration and cannot be removed from the safelist: all kernel threads (kthreadd descendants), the HOSA agent process itself, and the init process (PID 1).

7. Habituation Parameters

ParameterYAML PathDefaultDescription
Enabled habituation.enabled true Enable automatic baseline recalibration. Disable for environments where the baseline should never change after warm-up.
Min stabilization habituation.min_stable_time 30m Tmin — minimum continuous stabilization before habituation activates.
Decay rate (λ) habituation.decay_rate 0.001 Exponential decay rate for weighted Welford. Higher = faster adaptation.
Safety ceiling habituation.dm_safety_max auto DM,safety — maximum DM that permits habituation. auto sets it to θ₃ × 0.8.
ρ threshold habituation.rho_threshold 0.25 Maximum covariance deformation ratio that permits habituation.
ΔH threshold habituation.delta_h_threshold 0.5 Maximum syscall entropy change that permits habituation.
ICP threshold habituation.icp_threshold 0.3 Maximum propagation index that permits habituation.

8. Webhook Configuration

Webhooks are opportunistic — dispatched when network connectivity is available, but never required for the agent's primary function. If the webhook endpoint is unreachable, the event is logged locally and the agent continues operating.

webhooks:
  enabled: true
  endpoints:
    - url: "https://hooks.slack.com/services/T00/B00/xxx"
      min_level: 2     # only send Level 2+ events
      format: slack
    - url: "https://api.pagerduty.com/v2/enqueue"
      min_level: 3     # only send Level 3+ events
      format: pagerduty
      auth_token_env: "PAGERDUTY_TOKEN"
    - url: "http://localhost:9093/api/v1/alerts"
      min_level: 1
      format: alertmanager
  timeout: 5s
  retry_count: 2
  retry_delay: 1s

Supported formats:

FormatDescription
jsonGeneric JSON payload with full state vector (default)
alertmanagerPrometheus Alertmanager-compatible alert format
slackSlack incoming webhook format with formatted message
pagerdutyPagerDuty Events API v2 format
opsgenieOpsgenie Alert API format

9. Thalamic Filter

ParameterYAML PathDefaultDescription
Enabled thalamic_filter.enabled true Suppress redundant telemetry during homeostasis. Disable if external monitoring requires continuous metric flow.
Heartbeat interval thalamic_filter.heartbeat_interval 60s Interval between minimal heartbeat emissions during homeostasis.
Metrics endpoint thalamic_filter.metrics_port 9100 Port for the Prometheus-compatible /metrics endpoint. Set 0 to disable.

10. Environment Detection and Override

HOSA auto-detects the environment class during proprioception to select the appropriate quarantine strategy (see §4 — Quarantine Modes). The auto-detection can be overridden:

environment:
  type: auto     # auto | bare-metal | cloud | kubernetes | edge-physical | edge-remote | airgap
  cloud_provider: auto     # auto | aws | gcp | azure | none
  self_termination: false     # allow cloud instance self-termination (Level 5)
  self_termination_timeout: 5m
EnvironmentAuto-Detection MethodQuarantine Strategy
bare-metalIPMI interface in /sys/class/net/Deactivate all interfaces except IPMI
cloudMetadata service at 169.254.169.254XDP drop + cloud-native signaling
kubernetesKUBERNETES_SERVICE_HOST env varcgroup containment + taint + K8s Event
edge-physicalExplicit config onlyFull network deactivation
edge-remoteExplicit config onlyNetwork deactivation + watchdog timer
airgapExplicit config onlyFull isolation, no external communication

11. Self-Containment (Agent Resource Limits)

HOSA practices what it preaches: the agent itself operates within a dedicated cgroup v2 with hard resource limits. If the agent exceeds its own limits, the kernel contains it before it can affect the system.

self_containment:
  memory_max: 128M     # hard memory ceiling for the agent
  cpu_max: "50000 100000"     # 50% of one CPU core (50ms per 100ms period)
  cgroup_path: /sys/fs/cgroup/hosa.service
The Agent That Limits Itself

This is a deliberate architectural decision, not just good practice. The most common objection to autonomous agents is: "What if the agent itself becomes the problem?" By operating within hard kernel-enforced limits, HOSA cannot consume more than 128MB of memory or 50% of a CPU core, regardless of bugs or unexpected conditions. The kernel enforces these limits — not the agent itself.

12. Logging Configuration

ParameterYAML PathDefaultDescription
Directory logging.directory /var/log/hosa Directory for decision logs and audit trail.
Decision log logging.decision_file decisions.log File for structured decision log (JSON Lines format).
Max file size logging.max_size 100M Maximum size before log rotation.
Retention logging.max_files 5 Number of rotated files to retain.
Level logging.level info Agent operational log level: debug, info, warn, error.
Include state vector logging.include_state_vector true Include the full x(t) vector in every decision log entry. Disable to reduce log volume.

13. Kubernetes Deployment

13.1. DaemonSet Manifest

HOSA is deployed as a DaemonSet — one instance per node, with access to the host's cgroup filesystem and network namespace.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: hosa
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: hosa
  template:
    metadata:
      labels:
        app: hosa
    spec:
      hostPID: true
      hostNetwork: true
      tolerations:
        - operator: Exists
      containers:
        - name: hosa
          image: ghcr.io/bricio-sr/hosa:latest
          securityContext:
            privileged: false
            capabilities:
              add: [BPF, SYS_ADMIN, NET_ADMIN]
          volumeMounts:
            - name: sys
              mountPath: /sys
            - name: cgroup
              mountPath: /sys/fs/cgroup
          resources:
            limits:
              memory: 128Mi
              cpu: 500m
      volumes:
        - name: sys
          hostPath:
            path: /sys
        - name: cgroup
          hostPath:
            path: /sys/fs/cgroup
Kubernetes DaemonSet — Minimal Manifest

13.2. Capabilities and Security Context

Following the principle of least privilege, deploy with only the capabilities required for the configured deployment mode:

ModeRequired CapabilitiesHost Access
dry-run CAP_BPF hostPID: true, read-only /sys
partial (max_level: 2) CAP_BPF, CAP_SYS_ADMIN hostPID: true, read-write /sys/fs/cgroup
partial (max_level: 3–4) CAP_BPF, CAP_SYS_ADMIN, CAP_NET_ADMIN hostPID: true, hostNetwork: true, read-write /sys/fs/cgroup
full CAP_BPF, CAP_SYS_ADMIN, CAP_NET_ADMIN hostPID: true, hostNetwork: true, read-write /sys/fs/cgroup, K8s API access for taints
Start Small, Escalate with Confidence

The recommended deployment path: start with mode: dry-run and CAP_BPF only. Review decision logs for 1–2 weeks to validate detection quality. Then escalate to mode: partial, max_level: 2 for soft containment. Only enable mode: full after building confidence in the agent's behavior on your specific workload.

14. CLI Reference

FlagShortDefaultDescription
--config -c auto Path to configuration file.
--mode -m dry-run Deployment mode: dry-run, partial, full.
--max-level 5 Maximum response level (0–5). Only used with --mode=partial.
--warmup -w 5m Warm-up duration (e.g., 300s, 5m, 10m).
--environment -e auto Environment override.
--log-level info Operational log level.
--log-dir /var/log/hosa Log output directory.
--metrics-port 9100 Prometheus-compatible metrics endpoint port. 0 to disable.
--version -v Print version and exit.
--validate Validate configuration file and exit without starting the agent.
--dump-baseline Run warm-up, dump the computed (μ, Σ) baseline to file, and exit. Useful for seeding new nodes.

Examples:

# Observe and log, no actuation
sudo ./hosa --mode=dry-run

# Partial actuation up to Level 3, custom warm-up
sudo ./hosa --mode=partial --max-level=3 --warmup=10m

# Full actuation with explicit config file
sudo ./hosa --mode=full -c /etc/hosa/production.yaml

# Validate configuration without starting
./hosa --validate -c /etc/hosa/hosa.yaml

# Export baseline for seeding other nodes
sudo ./hosa --dump-baseline --warmup=30m -c /etc/hosa/hosa.yaml

15. Full Configuration Reference

Below is the complete annotated configuration file with all parameters and their default values. This serves as both documentation and a starting template — copy it, uncomment the sections you need, and adjust.

# ============================================================
# HOSA Configuration — Full Reference (all defaults shown)
# ============================================================

# --- Deployment Mode ---
mode: dry-run                 # dry-run | partial | full
max_level: 5                 # max response level (partial mode)

# --- Warm-Up ---
warmup:
  duration: 5m
  min_samples: 500
  conservative: true
  seed_file: ""             # path to pre-computed baseline

# --- EWMA ---
ewma:
  alpha: auto               # auto | 0.0–1.0
  alpha_min: 0.05
  alpha_max: 0.3

# --- Thresholds ---
thresholds:
  mode: sigma               # sigma | absolute
  theta1_sigma: 2.0
  theta2_sigma: 3.0
  theta3_sigma: 4.0
  theta4_sigma: 5.0

# --- Covariance ---
covariance:
  tikhonov_lambda: 1e-6

# --- Response ---
response:
  hold_time_1_to_0: 10s
  hold_time_2_to_1: 30s
  hold_time_3_to_2: 60s
  hold_time_4_to_3: 5m

# --- XDP ---
xdp:
  enabled: true
  mode: native              # native | generic
  interface: auto
  healthcheck_cidrs: []

# --- Safelist ---
safelist:
  auto_detect: true
  processes: []
  cgroups: []
  labels: []

# --- Habituation ---
habituation:
  enabled: true
  min_stable_time: 30m
  decay_rate: 0.001
  dm_safety_max: auto
  rho_threshold: 0.25
  delta_h_threshold: 0.5
  icp_threshold: 0.3

# --- Webhooks ---
webhooks:
  enabled: false
  endpoints: []
  timeout: 5s
  retry_count: 2
  retry_delay: 1s

# --- Thalamic Filter ---
thalamic_filter:
  enabled: true
  heartbeat_interval: 60s
  metrics_port: 9100

# --- Environment ---
environment:
  type: auto
  cloud_provider: auto
  self_termination: false
  self_termination_timeout: 5m

# --- Quarantine ---
quarantine:
  enabled: false            # Level 5 requires explicit opt-in
  watchdog_timeout: 30m      # edge-remote only

# --- Self-Containment ---
self_containment:
  memory_max: 128M
  cpu_max: "50000 100000"

# --- Logging ---
logging:
  directory: /var/log/hosa
  decision_file: decisions.log
  max_size: 100M
  max_files: 5
  level: info
  include_state_vector: true

# --- Seasonal Profiles ---
seasonal:
  enabled: true
  min_observation_days: 7
  autocorrelation_threshold: 0.3
Complete hosa.yaml — All Defaults