Changelog

All notable changes to Dativo Talon are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Release Note Quality Bar

For user-facing entries, include:

why this change matters (problem solved),
who should care (operator/developer persona),
how to verify quickly (command or path),
any upgrade/migration impact,
at least one share artifact reference (screenshot, GIF, or snippet) when applicable.

Fixed

Streams are no longer hard-cut at request_timeout; stream_idle_timeout is now enforced (#217). gateway.timeouts.stream_idle_timeout was parsed and validated but never used, and the only bound on a streaming response was the whole-request request_timeout (default 120s), which killed healthy long generations — routine coding-agent traffic. A live SSE stream is now bounded by silence instead: chunks keep it alive past request_timeout, and a gap longer than stream_idle_timeout (default 60s) aborts it with the family-correct terminal error event, classified as a transient timeout so pre-commit failover still engages. Non-streaming requests keep the exact request_timeout contract. Upgrade impact: a stream that previously survived 60–120s of provider silence is now aborted at 60s — raise stream_idle_timeout for slow local providers (CPU inference can pause >60s before the first token), or set 0 to disable idle enforcement. Verify: go test ./internal/gateway/ -run 'TestForward_HealthyStreamOutlivesRequestTimeout|TestForward_StreamIdleTimeoutAborts'.
Response-PII scanning now covers stream error paths (#392). When a PII-scanned stream (response_pii_action ≠ allow) died mid-way — upstream failure or idle abort — the truncated buffer was flushed to the client raw and unscanned, bypassing the enforced response control. The partial buffer now passes through the same scan as a complete stream: PII-bearing truncated deltas are redacted or blocked per policy, clean partials and buffered error envelopes pass through unchanged, and scanner-unavailable stays fail-closed for block/redact. Covers all stream families — including Responses-API string-shaped deltas — and redaction of a dead stream preserves the gateway's terminal error event instead of presenting the truncation as a completed generation. Who cares: any operator relying on response_pii_action: block|redact — the control previously did not cover error paths. Verify: go test ./internal/gateway/ -run 'TestGateway_StreamingPIIBlock_UpstreamDiesMidStream'.
Terminal SSE error events can no longer merge into a truncated partial event (#393). When a stream died mid-event, the unterminated partial was flushed and the terminal error event (event: error / response.failed) followed with no separator — SSE parsers folded the error lines into the pending partial and never dispatched it, defeating the failure signal for exactly the clients it exists for. The terminal event is now always preceded by a blank-line separator (protocol-neutral when no partial is pending), on every emission path including deferred failover delivery. Verify: go test ./internal/gateway/ -run 'TestStreamCopy_MidEventAbort_TerminalEventStaysParseable'.

[1.9.3] - 2026-07-20

Governance-completeness patch, shaped by real-integrator feedback from the talon-full-demo validation: the MCP endpoints become spec-conformant and fully consumable by standard clients, every denial carries a stable machine code, the vendor path gets vault-backed upstream auth, and the last evidence-correctness gaps close. Issues #356, #357, #358, #360, #351, #367, #368, #369, #370; follow-ups filed as #363, #371–#373, #378. Release gated on a live 7-point walkthrough.

BREAKING

MCP proxy method surface is closed (#356). Non-lifecycle, non-tools methods (resources/read, prompts/get, …) are rejected with -32601 + error.data.talon_code: TALON_METHOD_NOT_ALLOWED + an attributed proxy_method_rejected deny record, in every mode — previously they were forwarded verbatim with no policy, no PII scanning, and no evidence. Who cares: anyone whose vendor uses MCP resources — that lane was silently ungoverned. Governed resource reads are a future feature. Verify: POST resources/read to /mcp/proxy → -32601 with the talon_code.
One proxied call = one request-class record (#357). The allowed PII "note" record is folded into the call's terminal record, correcting talon agents traffic counts and session summaries (previously a PII-carrying allowed call counted twice). Upstream failures — transport, non-JSON body, vendor JSON-RPC error — now produce proxy_upstream_error records (policy-ALLOWED with Status: failed, so vendor outages don't inflate denial rates); transport failures deliberately carry no data-flow item (a signed flow entry must never assert delivery the wire may not have made).
NewProxyHandler constructor changed (gains the secrets store, #358) — embedders update per the no-installed-base convention.

Added

MCP handshake (#367). Both /mcp and /mcp/proxy answer initialize locally (tools capability only, client protocolVersion echoed, build version in serverInfo, never forwarded upstream) and accept notifications/initialized with HTTP 202 — spec-conformant clients (Copilot CLI, Claude Code, MCP Inspector, SDKs) now connect. This unblocks the talon-full-demo Copilot scene. Verify: initialize → notifications/initialized → tools/call sequence against either endpoint.
Vault-backed upstream auth for the MCP proxy (#358). proxy.upstream.auth: {secret_name, header, scheme} — per-request vault resolution (rotation via talon secrets set lands on the next request, proven live), fail-closed on retrieval failure (generic Service configuration error to the vendor, typed reason in signed evidence, nothing egressed), vault ACL keyed to the proxy's own agent.name, upstream_auth_mode: "secret" in evidence. The dead ProxyRuntimeConfig.AuthHeader is retired. Who cares: every real vendor endpoint requires auth — this was the vendor-path capability gap.
Stable denial codes (#369). Every Talon-shaped proxy error carries error.data.talon_code (TALON_TOOL_FORBIDDEN, TALON_POLICY_DENIED, TALON_PII_BLOCKED, TALON_SCANNER_UNAVAILABLE, TALON_METHOD_NOT_ALLOWED, TALON_UPSTREAM_ERROR) — integrators key on codes, not prose. Table in ARCHITECTURE_MCP_PROXY.md. Gateway-side codes remain phase 2 of #369.
talon serve --gateway-mode shadow|enforce|log_only (#368). Validated override of gateway.mode — flipping enforcement no longer requires YAML surgery (the downstream demo's worst config bug becomes a non-problem). Typos and missing --gateway fail before any file loads.
talon doctor warns on unrecognized config keys (#351). Advisory WARN (never changes exit code) naming each dead top-level talon.config.yaml key with its real surface (tenants → agent files) or a nearest-key typo hint; the consumed-key set is shared with the pack-template CI pin (config.ConsumedTopLevelKeys()) so guard and pin cannot drift.

Fixed

graph_governance registered as a canonical explanation stage (#360); the stage set is now closed and pinned (source-walking literal test + constant registration test).

Documentation

Canonical "verify a running gateway" snippet (#370): talon agents --url existed with exactly the right semantics but no doc showed it — QUICKSTART and both coding-agent guides now carry it.
Method surface + denial-code contract documented in ARCHITECTURE_MCP_PROXY.md, the vendor guide, and the minimal example.

Known issues

MCP notifications other than notifications/initialized still receive JSON-RPC error bodies (#363); SDK-client conformance test (#372), evidence schema reference (#371), budget-calibration note (#373), and talon secrets delete + unknown-subcommand exit codes (#378) are filed follow-ups.

[1.9.2] - 2026-07-20

Governance-integrity and adoption patch, continuing the v1.9.1 trust theme: the MCP proxy no longer fails open on an unset mode, its evidence tells the truth about what was blocked versus forwarded and who actually made the call, the pack templates no longer ship configuration keys nothing reads, macOS users get release binaries for the first time, and four remaining false-path docs/examples are fixed. Issues #342, #345, #346, #350, #326, #328, #329, #330, #359; follow-up guard filed as #351.

BREAKING — MCP proxy mode fails closed (#346)

An unset proxy.mode now enforces (intercept) instead of silently forwarding, and a value outside intercept | passthrough | shadow refuses to start. Previously the serve loader never defaulted the mode and every enforcement gate compared literals, so a config that omitted mode: forwarded explicitly forbidden tools to the upstream while recording them as "blocked" — a false audit trail around a real data exposure. Who cares: anyone running talon serve --proxy-config; if you relied (knowingly or not) on the fail-open behavior, set mode: "passthrough" explicitly. Verify: remove mode: from a proxy config with forbidden_tools and confirm a forbidden tools/call returns a JSON-RPC policy error; set mode: "intercpt" and confirm talon serve refuses to start. Both loaders now share one contract (constants in internal/policy), the handler constructor normalizes unknown modes to intercept, and the forbidden-tools gate forwards only under explicit passthrough — fail-closed at three layers.

Fixed

Shadow and passthrough modes now record would-have-denied decisions (#346). Policy and PII denials in non-intercept modes previously produced no evidence at all — shadow mode's entire purpose. They now land as proxy_shadow_violation records: ALLOWED (the call was forwarded), with ObservationModeOverride and a ShadowViolations entry saying what enforce mode would have done — the gateway's shadow vocabulary, never a fake "blocked" record. Registered as a non-request record class so traffic counts in talon agents stay one-call-one-request. Verify: shadow mode, call a tool absent from allowed_tools, then talon audit list shows the shadow violation alongside the forwarded call.
MCP proxy evidence attributes to the real caller (#350). Records previously carried a hardcoded agent_id: "mcp-proxy" and a fresh correlation ID per record — a request authenticated with the coding-assistant agent key produced evidence that could not answer which use case made the tool call, and the intent/result records of one call were not even joinable with each other. Now: authenticated agent identity (key → agent → tenant, #266) wins, falling back to the proxy config's agent.name; a validated X-Talon-Session-ID lands in session_id (client-asserted — attribution, not authentication; never synthesized, never a policy input); an inbound X-Correlation-ID is preserved or one request-scoped ID is shared by every record of the call and echoed on the response; X-Talon-Agent-ID/Parent/Client populate the orchestration block under the gateway's emission rule; denied tool calls carry the deterministic POLICY_DENIED_TOOL explanation. Header hygiene is the gateway contract (128-byte cap, RFC 7230 charset, reject-never-truncate) with a single shared implementation in internal/evidence — the two ingestion surfaces can no longer diverge. Who cares: anyone joining MCP tool governance to coding-session LLM traffic (the Copilot demo path). Upgrade note: dashboards or queries filtering agent_id = 'mcp-proxy' will see real agent attribution from this release. Verify: talon audit list --session <id> after sending X-Talon-Session-ID on a proxied tool call.
Pack templates no longer ship dead config keys (#342). llm_provider, evidence.type/path, secrets_key_env, and the tenants: block (with budgets and rate limits!) in the coding-agents and crewai templates were read by nothing — evidence.path was actively misleading since state location derives solely from data_dir, and tenants implied budgets live in the wrong file. Confirmed by an exhaustive audit of all five config-bearing templates against both real loaders; the legacy init templates were clean. The templates now state the real surfaces (data_dir, agent files, gateway.organization_policy) where the dead keys used to be. Regression pin: TestInitPack_GeneratedConfigKeysAreConsumed asserts every top-level key of every generated talon.config.yaml has a named reader — dead-key drift now fails CI, not the operator. Verify: talon init --pack coding-agents and grep the generated config for tenants: (absent).

Added

macOS release binaries (#359, #326). Every release from this one ships talon_<version>_darwin_arm64.tar.gz and _darwin_amd64.tar.gz (native CGO builds on a macOS runner, per-file .sha256 checksums) alongside the linux_amd64 archive and Docker images. Who cares: Mac evaluators previously had no release binary and a documented go install linker failure — the coding-agents install path now starts with a download. The release workflow comments now state truthfully what builds where; Windows and linux/arm64 remain unshipped (tracked in #359's thread). Verify: the v1.9.2 release page lists four archives.

Documentation

LangChain example base URLs fixed (#345). The three OpenAI-client base_urls gain the required trailing /v1 (the client appends /chat/completions; without it every request 404s and skips Talon's path-gated handling). The Anthropic line is annotated as deliberately without /v1 — that client appends /v1/messages itself. Verify: python examples/langchain-integration/langchain_stateless.py against a running gateway.
Starter policy library reframed to shipped truth (#328). examples/policies/README.md claimed custom Rego is copied to policies/rego/ and "loaded automatically" — no such mechanism exists (policies are compiled in; custom Rego is a v2 surface). The README now says exactly that, names the .talon.yaml controls that deliver the same outcomes today, and keeps standalone opa eval/opa test usage.
Stale source pointers fixed (#329, #330). what-talon-does-to-your-request.md and ARCHITECTURE_MCP_PROXY.md pointed at nonexistent classifier files; both now reference the real ones (patterns.go, pii.go, redact_guard.go), and every internal/*.go path in both docs is verified to exist.

Known issues

talon doctor does not yet warn on unrecognized keys in operator-authored talon.config.yaml files (the root-cause guard for the #342 class) — tracked as #351, post-freeze.

[1.9.1] - 2026-07-19

Trust-blocker patch: no new features. Closes the credibility issues an evaluator hits first — a demo that contaminated real local state, vendor/MCP documentation describing unshipped functionality, stale coding-agent guides, and runtime errors buried under usage dumps. Issues #319, #332, #336, #337, #339 plus three defects found while fixing them (#340, #341, #342 — #342 remains open).

BREAKING — unknown keys in `--proxy-config` files now fail closed (#332)

talon serve --proxy-config rejects YAML keys that are not part of the shipped schema (ProxyPolicyConfig, internal/policy/proxy.go). Previously a pasted proxy.auth, proxy.tls, or misnested pii_handling block loaded silently — the operator believed a security control was active when nothing enforced it, the worst failure mode for a governance product. Who cares: anyone running the MCP proxy. Upgrade impact: a proxy config carrying unknown keys now fails at startup with a parse error naming the field — delete or fix the flagged keys (the shipped schema is documented in docs/ARCHITECTURE_MCP_PROXY.md, working examples in examples/vendor-proxy/ and examples/mcp-proxy-minimal/). Verify: add auth: {required: true} under proxy: in any proxy config and talon serve --proxy-config <file> refuses to start. Same fail-closed discipline as agents_dir discovery.

Fixed

fleet-ops demo no longer writes into the real ~/.talon evidence store (#319). The demo exported TALON_HOME, which nothing reads; its signed evidence (throwaway demo key) landed in the operator's real store, showed up in real cost/health projections, and rendered as a false ✗ INVALID (tampered) record in talon audit show. The demo now pins data_dir inside its temp workspace (env + generated config, the product-demo pattern). Who cares: anyone who ran or will run examples/fleet-ops/demo.sh. Verify: run the demo, then confirm ~/.talon/evidence.db mtime is unchanged and the demo's fleet shows $0.00 COST. Note: records leaked by pre-1.9.1 runs remain in the real store; removing them is an operator decision (talon audit list --tenant acme).
Runtime errors print only the error — no more 30-line cobra usage dump (#339). SilenceUsage on the root command; flag parse errors (unknown flag) still get usage via a FlagErrorFunc, unknown commands keep the --help hint. Who cares: every CLI user; the operator's fix hint (an evidence id, a path) is now the last line on screen, not buried. Verify: talon audit verify --session does-not-exist prints one error line; talon serve --no-such-flag still prints usage.
examples/mcp-proxy-minimal never loaded (#340). The "smallest working proxy" config used plain-string allowed_tools (schema: name/upstream_name mappings) and nested pii_handling under proxy: where the loader dropped it — talon serve failed on the example's own happy path. Rewritten to the real schema; both shipped proxy examples are now pinned to the strict parser by TestLoadProxyConfig_ShippedExamples, so this drift class fails CI, not the user. Verify: talon serve --port 8080 --proxy-config examples/mcp-proxy-minimal/proxy.talon.yaml starts with mcp_proxy=true.
talon init prints the files it actually wrote (#341). The hardcoded "Created files" list omitted pack-declared files (coding-agents: agents/codex/agent.talon.yaml; crewai: both role agents). Packs with declared files now print their real list. Verify: talon init --pack coding-agents lists four files.

Documentation

docs/VENDOR_INTEGRATION_GUIDE.md rewritten to shipped truth (#336). The "Webhook Interception" pattern documented an entirely unshipped feature (webhook_interceptor agents, forward/redact config, a fabricated talon logs --follow transcript) and the guide's launch commands did not exist (talon server, --mcp-proxy). Now: the shipped LLM API Gateway is the second pattern, Shadow Mode documents the real proxy.mode: shadow semantics, every command is runnable as printed (talon serve --proxy-config, POST /mcp/proxy), and the guide states honestly when Talon cannot govern a vendor (no passive vendor-log monitoring exists). Who cares: evaluators following the README's "MCP / vendor proxy" front door.
docs/ARCHITECTURE_MCP_PROXY.md no longer presents never-shipped config as usable (#332). --mcp-proxy, proxy.auth vendor tokens, proxy.tls mTLS, per-vendor rate limits with burst, upstream.auth, and PostgreSQL evidence storage are gone or explicitly marked Roadmap; mode descriptions match internal/mcp/proxy.go (shadow blocks forbidden tools; PII-scanner failure is fail-closed in every mode); the design-sketch code listing is labeled as such. Inbound auth is documented as it ships: agent/admin-key middleware on POST /mcp/proxy.
Coding-agent guides match the pack (#337). "Creates three files" → four (pricing/models.yaml included) in all three guides; the stale "#235 wizard omits /v1" claim removed (the wizard prints the correct base URL); the single-file pricing-resolution surprise is documented with the three ways to make pricing edits count (fleet mode resolves from the project root; single-file mode resolves from the policy file's directory — the pre-fleet contract deliberately kept in #267), plus the same caveat as a comment in the pack's talon.config.yaml template.

Known issues

The coding-agents pack template still ships config keys nothing reads (evidence.type/path, secrets_key_env, tenants, llm_provider) — tracked as #342; the real state-location surface is data_dir.

1.9.0 - 2026-07-17

Fleet Operations v1 (#265 milestone): discover, inspect, stop, and safely reconfigure multiple AI use cases through one control plane. Gate: the 8-point walkthrough (discovery, attention queue, disable enforcement across gateway and native runs, live reload, last-known-good rejection, generation consistency, signed lifecycle evidence) — codified in TestFleetOps_Walkthrough, TestReload_EndToEnd, and TestAgentsDir_* (go test -tags integration ./tests/integration), and runnable live via examples/fleet-ops/demo.sh.

BREAKING — pack layouts move to the agents_dir discovery convention (#308)

talon init packs now scaffold secondary agents as agents/<name>/agent.talon.yaml (was flat agents/<name>.talon.yaml). agents_dir discovery (#267) matches only files named exactly agent.talon.yaml, so the old pack layout was invisible to the fleet serving it exists to enable — talon validate --dir . on a scaffolded coding-agents pack silently found only the primary agent. Affected: coding-agents (codex → agents/codex/agent.talon.yaml) and crewai (crew-writer, crew-reviewer → agents/<role>/agent.talon.yaml); both pack configs ship a commented agents_dir: "." fleet-mode toggle. No compatibility period (no installed base): projects scaffolded from earlier packs must move each flat agents/<name>.talon.yaml to agents/<name>/agent.talon.yaml (or re-scaffold) before enabling agents_dir; single-file default_policy/TALON_DEFAULT_POLICY activation keeps working at the new paths. Who cares: anyone who scaffolded coding-agents or crewai and wants one gateway serving the whole pack. Verify: talon init --pack coding-agents --skip-verify && talon validate --dir . → 2 agents valid. Also fixed under #309: runtime strings (doctor policy_valid, the talon costs offline note, pack/example/demo comments) no longer describe shipped agents_dir discovery as future work.

Added

make product-demo: the canonical product demo — three AI use cases operated through one Talon. A new examples/product-demo/ runs customer-support, coding-assistant, and document-summary as three agent.talon.yaml files under one agents_dir, on real providers (OpenAI + Anthropic + a local model), and walks the four pillars in one operating period: a policy-valid failover when the local model is down (it skips a healthy provider the use case isn't allowed to use — the skip is proven from the signed failover.skipped_candidates), PII redaction before the provider ([EMAIL]/[IBAN], classification preserved), an organization tool boundary the agent cannot weaken (constraints.forbidden_tools: [admin_*]), a projected-cost session-budget stop (spend + estimate vs limit) before spend, a fleet talon agents view where the exhausted use case shows blocked, a session drill-down (audit list --session), and a signed export + offline verify --file close. Every on-screen receipt is parsed from Talon's own signed evidence — nothing is faked — and strict mode fails the run if any beat's outcome is unexpected. About $0.02–0.05/run on cheap models (denials cost $0); no Docker; state stays in a throwaway dir. Who cares: anyone evaluating Talon as the operating layer for a company's AI use cases rather than a single gateway feature. Verify: export OPENAI_API_KEY=… ANTHROPIC_API_KEY=… (stop Ollama), then make product-demo. The README hero GIF is recorded from this demo (scripts/record-hero.sh).
Documentation reset to the product story. The README front door now leads with one operating layer for a company's AI use cases — three named use cases, one organization policy, one operating view — with the four pillars (cost control, reliability, shared policy, session understanding) over a signed-evidence proof layer, and positions four demos (product demo, no-key quickstart, governed-session, hero). Correctness sweep across the docs to match shipped Fleet Operations v1: the fleet view / agents_dir / agent.enabled / periodic reload are now described as shipped (control-plane.md, ROADMAP.md, gateway-dashboard.md, codex-cli-integration.md); stale "periodic reload is #269" rotation parentheticals and a pinned v1.8.0 install example are corrected; the honest not-yet-shipped list (same-provider retries #139, cost-warning webhooks #144, complete session-summary contract #271, dashboard projection #143) is preserved.
talon agents attention queue: see which AI use cases need attention, and why (closes #270, completes Fleet Operations v1). talon agents is now the primary fleet view — every discovered agent with STATE (the configured enabled/stopped), HEALTH (the evaluated healthy/needs-attention/stopped/blocked), COST (month-to-date spend vs the effective cap), and WHY (the concrete cause). STATE and HEALTH are deliberately distinct: STATE is what the operator set, HEALTH is what the runtime observed. HEALTH is never an opaque score — every needs-attention cause has a fixed window, threshold, and recovery rule (budget ≥80% of a cap; ≥3 fallback dispatches in 1h; denials ≥20% of ≥10 requests in 1h; ≥1 failed/timed-out session in 24h; a current config rejected by the last reload) shown in priority order with +N more; blocked is persistent-only (a period cap exhausted, or agent-wide policy invalidity) — a single PII block or model-allowlist deny never blocks an agent. talon agents show <name> adds the operational summary (state/health/causes, spend vs daily+monthly caps, rolling-window signal counts, last run, config path/digest); the layered (unflattened) effective-policy summary and the recent-session/denial/fallback/retry lists are a scoped follow-up (#305). It is server-first: an explicit --url is authoritative (a reachable-but-failing server is a hard error, never a silent local view), else an implicitly-detected localhost Talon server (the #293 /health marker) is authoritative, else the local config is projected offline, prominently labeled OFFLINE — CONFIG VIEW (no running gateway found; runtime state may differ). Crucially, the CLI, the GET /v1/agents/fleet endpoint, and the dashboard all compute health/budget/session state through one shared projection (internal/fleet, over a single request-class record taxonomy) — a parity test asserts the endpoint's rows equal a direct projection, so the numbers can never disagree. Never-valid files stay a separate fleet issues section addressed by path, never synthesized agent rows; --json emits the typed rows; --tenant filters. Who cares: operators running more than one AI use case — this turns talon agents from a registry into a triage queue. Verify: go test ./internal/fleet/ && go test ./internal/server/ -run Fleet && go test -tags=integration ./tests/integration -run TestFleetOps_Walkthrough; or live: talon agents against a running talon serve, then talon agents disable <name> and watch it flip to STOPPED, or talon agents with no server for the labeled offline view (examples/fleet-ops/demo.sh).
agent.enabled + talon agents enable/disable + periodic safe reload: stop and reconfigure a live fleet without restarts (closes #268 and #269, Fleet Operations v1). The new agent.enabled field (default true) is the config-backed operational kill switch: false denies NEW work at every entry point — gateway requests get an attributed 403 with machine code agent_disabled in every mode (shadow/log_only never bypass an operator decision), native runs, the run API, and trigger dispatch are refused before any lifecycle state exists — while in-flight work finishes; every refusal is signed evidence. talon agents enable/disable <name> toggles it host-locally (remote administration stays out of scope): an atomic, comment-preserving structural YAML edit, verified by re-parse before commit, wrapped in intent + completion signed records — a failed completion rolls the file back so recorded and actual state can never silently diverge; valid agents stay toggleable even when a sibling file is broken, and never-valid files are addressed by path, not by a raw-parsed name. talon serve now re-scans the agent source every agents_reload_interval (default 30s; unchanged scans are digest-compare-free): a valid edit activates as ONE atomic generation swap (catalog + compiled bundles + identity registry together) recorded as a signed config_reload fact — and rolled back if that record cannot be written — while an invalid edit never takes a working fleet offline: last-known-good keeps serving, the rejection is recorded once per distinct broken state, and reverting the edit recovers explicitly. GET /v1/agents/fleet (admin) reports the active generation, membership with enabled flags, and the last rejection with per-path causes from ONE coherent read (a rolled-back generation is never reported active). Correctness hardening from review: reload registry construction is mode-aware and vault-independent for unchanged bindings — an emergency disable reuses the existing key from the previous generation, so it works even after the vault secret is deleted, and native-only single-file agents (no key binding) reload without a registry; duplicate agent names fail closed for the whole name (neither file produces a valid agent; the CLI refuses to toggle an ambiguous name and modifies nothing); a failed rejection-evidence write is retried on the next tick (never permanently lost); a disabled native run is recorded as blocked/agent_disabled, not an internal failure; the enable/disable YAML rewrite is crash-durable (parent-dir fsync) and concurrency-safe (refuses to clobber a change another process made since the read), with intent/completion/rollback sharing one correlation ID; agents_reload_interval rejects negatives and floors sub-second values. Removed agents never become retention orphans: memory and session sweeps age orphaned rows out under a fixed org-level floor (orphan_retention_days, default 90), independent of the live fleet — so orphaned data can never persist indefinitely. Hot vs restart-required is documented (docs/reference/configuration.md): trigger/webhook definitions stay restart-only (#297); key rotation still requires a restart or a file touch. Who cares: operators running fleets — talon agents disable followed by "now restart the gateway" is not a control plane; this closes the loop. Verify: go test ./internal/agentcatalog/ -run TestReloader && go test -tags=integration ./tests/integration -run TestReload_EndToEnd; or live: talon serve --gateway with a 1s interval, flip enabled: false on disk, watch the 403 arrive with no restart.
Multi-agent native runtime: every execution surface now resolves the discovered fleet (closes #267, Fleet Operations v1). A resolved agent is a compiled runtime bundle — its own OPA engine (compiled once per generation, no longer once per run: the per-run Rego recompile of 11 modules is gone), its own policy-aware PII scanner (semantic enrichment at build time; external engines derive their entity set from THAT agent's policy), and its own router (routing rules + cost limits over the shared provider clients) — published with the identity registry as ONE atomic RuntimeSnapshot behind one pointer. talon run --agent <name>, /v1/agents/run, native chat, trigger/schedule dispatch, and plan dispatch all resolve agents from that catalog by name; the pinned PolicyPath plumbing is gone, and a run captures one fleet generation at entry and completes under it — a reload activating a new generation never changes the engine, scanner, or routing of an in-flight run or request (the gateway's mid-request registry re-read for cache tenant scoping is fixed the same way). Schedules and webhook routes register at startup for EVERY discovered agent (webhook names are one fleet-wide namespace — duplicates fail closed naming both agents; definition changes stay restart-required, #297); memory retention now sweeps per agent under that agent's own retention policy (agent A's retention_days can no longer purge agent B's rows), and the session sweep uses the fleet's maximum declared retention so a global purge never deletes rows an agent still retains. Unknown agents fail loudly before any lifecycle state, listing the discovered agents; the ""/"default" sentinel resolves when exactly one agent is discovered and errors as ambiguous otherwise; a declared agent.tenant_id stays authoritative. Review hardening (same PR): gateway and server authentication read the registry through a VIEW over the one runtime holder (no independently swappable registry — a stale gen-A authentication is explicitly rejected at run resolution when gen B moved the agent); circuit-breaker and tool-failure thresholds are each agent's own rate_limits (never another policy's); manual talon plan execute ignores the path captured at plan creation in fleet mode (an approved plan can never bypass later policy tightening); pricing resolves from the project root in fleet mode (CLI and server sign identical estimates); relative shared-context mounts resolve beneath the DECLARING agent's directory with traversal protection (two agents' ./context never cross); session retention purges per agent under its own audit.retention_days. Who cares: anyone operating more than one AI use case — this completes the fleet object model (#265) across the gateway AND native execution. Verify: go test ./internal/agent/ -run 'TestRun_CatalogResolvesPerAgentBundle|TestRun_GenerationConsistency|TestRunFromTrigger_CatalogResolution'; or end-to-end: set agents_dir, talon run --agent <each> and watch each run route per its own policy.
agents_dir discovery: one gateway now serves a fleet of AI use cases (Fleet Operations v1, first slice of #267). Set agents_dir: ./agents in talon.config.yaml (or TALON_AGENTS_DIR) and Talon recursively discovers every file named exactly agent.talon.yaml — one file per AI use case, each with its own vault-bound key — and builds the gateway identity registry from the full set, so each discovered agent's key routes and attributes to its own agent in signed evidence. The scan is fail-closed: a schema-invalid file, an unknown key (a typo that would silently drop a control), or two files sharing an agent.name reject the whole scan with an error naming the offending paths — startup refuses rather than serving a partial fleet; per-file causes are reported by path, and no identity is ever synthesized from a broken file. When set, agents_dir is authoritative for fleet membership (default_policy no longer defines an agent; no mode merging); without it, single-file mode is unchanged. talon validate --dir (or plain talon validate when agents_dir is configured) validates the whole directory with one ✓/✗ line per file, and talon doctor preflights the identical scan + full-set registry dry-run serve startup runs. The new internal/agentcatalog package is the ONE catalog every execution surface resolves against; the multi-agent native runtime (talon run --agent, triggers, server runs) ships in this same release — see the entry above, which closes #267. Who cares: any operator running more than one AI use case — this is the object model of the #265 reset (one agent.talon.yaml = one AI use case = one Talon traffic identity = one active key) made real at the gateway. Verify: go test ./internal/agentcatalog/ && go test -tags=integration ./tests/integration -run TestAgentsDir; or end-to-end: create agents/<name>/agent.talon.yaml files, set agents_dir, mint keys, talon serve --gateway, curl with each key and read talon audit attribution.

Fixed

talon costs budget lines: sub-cent caps no longer render as $0.00 (closes #323, the #311 defect class). The budget denominator was hard-coded to two decimals while the used amount kept full precision, so a $0.001 daily cap printed Daily budget: 42.1% ($0.000421 / $0.00) — a non-zero percent of a zero-looking budget. Used and limit now share the one costs money formatter, so the same line reads 42.1% ($0.000421 / $0.001000). Found triaging the v1.9.0 release-candidate smoke run. Who cares: FinOps/operators setting per-request-scale caps. Verify: go test ./internal/cmd/ -run TestPrintBudgetUtilization_SubCentCapKeepsPrecision.
Fleet money readability: sub-dime amounts no longer round into meaningless equality (closes #311). talon agents COST cells and WHY cause details rendered every amount at a fixed two decimals, so real LLM spend — routinely under a cent per request — degraded into daily budget exhausted ($0.00 / $0.00); a field run with spend $0.0126 against a cap of $0.0114 displayed ($0.01 / $0.01), visually claiming spend equals cap. Fleet money formatting is now adaptive: two decimals for ordinary amounts, four for non-zero amounts under $0.10 — the same row now reads ($0.0126 / $0.0114). Who cares: operators triaging the talon agents attention queue at real per-request LLM cost scale. Verify: go test ./internal/fleet/ -run TestFormatMoneyAdaptivePrecision.
Plan review store timestamps are UTC-normalized (#292), closing the store's exemption from the #264 invariant. mattn/go-sqlite3 serializes a time.Time keeping its offset and SQLite compares those strings lexicographically, so the plan store's local-time writes (timeout_at, reviewed_at, dispatched_at) and local-time binds (GetPending, GetApprovedUndispatched) mis-compared across host timezone changes and DST transitions — a pending plan could appear expired (or an expired one dispatchable) by up to the offset delta, and the dashboard review history could mis-order. Every write and query bind now normalizes to UTC (Save also normalizes the plan struct in place so plan_json agrees with the DATETIME columns). Existing rows keep their stored offset; comparisons against them behave as before on an unchanged host. Who cares: anyone running plan review (human_oversight) on a non-UTC host or moving the SQLite file between hosts. Verify: go test ./internal/agent/ -run TestPlanReviewStore_ -count=1 (new timezone regression tests mirror internal/evidence/timezone_test.go).
talon costs identifies Talon before trusting the default :8080 probe (#293). /health now carries a product marker (X-Talon-Service: talon header + "service":"talon" body), and the implicit localhost budget probe uses it to classify a reachable-but-failing answer: a responder that identifies as Talon and still rejects the query (e.g. 401 without TALON_ADMIN_KEY) is now a hard error — a real runtime's refusal can no longer end in local numbers that may describe a different deployment — while a non-Talon port squatter keeps the loudly-warned local fallback, so offline talon costs still works. An explicit --url was already authoritative and is unchanged; --url "" skips the server path entirely for forced local resolution. This completes the #288 authority model end-to-end. Who cares: FinOps/operators reading budget denominators next to a running gateway. Verify: go test ./internal/cmd/ -run TestResolveBudgetUsage_ServerAuthority && go test ./internal/server/ -run TestHealth.

1.8.1 - 2026-07-13

Security

govulncheck reports zero reachable vulnerabilities again (GO-2026-5856, GO-2026-5764). Two findings were reachable from Talon code: the crypto/tls Encrypted Client Hello privacy leak (hit by every TLS dial — gateway upstreams, the egress guard, external scanner health checks) and a panic-DoS in the AWS SDK's EventStream decoder on the Bedrock path. Fixes: Go toolchain 1.25.12 (go.mod directive + CI/security workflows), aws-sdk-go-v2/service/bedrockruntime v1.50.4 with aws/protocol/eventstream v1.7.8, and release builds move to goreleaser-cross v1.26.4 (Go 1.26.4) because no 1.25.12 cross image exists — which also means release binaries stop being built with Go 1.25.7, several stdlib patches behind the tested toolchain. Who cares: anyone running talon serve with TLS upstreams (everyone) or routing to Bedrock; v1.8.0 binaries carry both vulnerabilities — upgrade. Verify: govulncheck ./... (0 reachable), talon version on a release binary shows Go ≥ 1.26.4.

1.8.0 - 2026-07-13

BREAKING — organization policy split into defaults vs constraints (#287, #282, #283)

gateway.organization_policy now states which rule governs each field: defaults: (per-agent baselines an agent override may replace) vs constraints: (organization-wide hard bounds an override can only tighten within). No compatibility period (no installed base): every pre-split flat key fails config load with a migration error naming its new home — default_pii_action → defaults.pii_action, max_daily_cost → defaults.daily_cost, response_pii_action/tool_policy_action/attachment_policy → defaults.*, allowed_providers/allowed_models/blocked_models/max_data_tier/forbidden_tools/egress → constraints.*; the operational scalars (log_prompts, log_responses, log_response_preview_chars, scan_tool_content) stay top-level. Note the semantic shift on max_daily_cost/max_monthly_cost: the old flat keys were per-agent baselines and map to defaults.daily_cost/monthly_cost; under constraints.max_daily_cost/max_monthly_cost those names now mean org budget ceilings — new Rego rules deny alongside the per-agent cap with an org-attributed reason (budget_exceeded: request would exceed organization daily cost limit), so an agent declaring a bigger budget than the org allows is stopped at the org line and the signed record never blames the agent. Ceiling-vs-default consistency is validated at load (an explicit default above the org's own ceiling fails; the implicit 100/2000 baseline clamps). tool_policy_action is now monotonic at the agent layer: an agent may tighten filter → block but can no longer loosen an org/provider block back to filter (operator layers still merge most-specific). New org hard constraints close the two deliberate #266 gaps: constraints.allowed_tools (#282) — when non-empty, a tool must pass the org allowlist AND the most-specific allowed list AND every forbidden list, on the primary route and every failover candidate — and org session budgets (#283): defaults.session_cost (baseline every agent inherits; agent session_limits.max_cost replaces) plus constraints.max_session_cost (ceiling with its own org-attributed session rule, same fail-open contract on synthetic sessions as #198). Budget displays (dashboard caps, talon costs) denominate against the binding cap — the tightest of the agent cap and org ceiling — so they can never show more headroom than enforcement grants. Also (#284): constraints.allowed_providers entries must name configured gateway.providers at load (a case typo like OpenAI used to silently deny every request at runtime). Who cares: every gateway operator — this is the config surface the fleet issues (#267+) build on. Verify: go test ./internal/gateway/ -run 'TestResolveEffectivePolicyContract|TestLoadGatewayConfigRejectsPreSplitOrgKeys|TestValidateBudgetBounds' ./internal/policy/ -run TestGatewayEngine_OrgBudgetCeilings; or load a pre-split config and read the migration error.

BREAKING — agent-only identity (#266)

The legacy caller identity model is removed — one agent.talon.yaml = one AI use case = one Talon traffic identity = one active vault-bound key. There is no compatibility period (Talon has no installed base; decision recorded on #266). What this replaces: gateway.callers[] with inline tenant_keys and policy_overrides, source-IP identification, and anonymous fallback. What it becomes: the agent policy carries agent.key.secret_name — a vault reference, never raw material (talon secrets set <name> "$(openssl rand -hex 24)"; the schema rejects any inline value) — plus optional agent.tenant_id (default default); at startup the gateway resolves every agent's key through the vault into an immutable identity registry (fail-closed on duplicate names/keys and missing/ACL-denied/empty secrets; --gateway with zero keyed agents refuses to start); per request, the presented Authorization: Bearer / x-api-key value resolves constant-time to exactly one agent or the request is 401-rejected. The quickstart facade's synthetic identity is the only non-key path. The same agent key also authenticates the tenant-scoped APIs (/v1/costs, /v1/audit, …), scoped to the tenant derived key → agent → tenant_id — which is now authoritative for native runs too (talon run --tenant errors on mismatch with the file). Effective policy is computed in exactly one place — organization baseline (gateway.organization_policy, renamed from default_policy) → the agent's one override (expressed in the agent file's existing vocabulary: cost_limits, session_limits.max_cost, capabilities tool fields, data_classification booleans, plus new policies.models{allowed,blocked}, policies.allowed_providers, policies.egress, data_classification.max_data_tier, metadata.team) → provider destination constraints — and that one computation (ResolveEffectivePolicy) feeds enforcement, every failover candidate, talon costs, and the dashboard budget endpoint, so they can never disagree again. Removed config keys fail validation with an explicit error (never silently ignored): callers, default_policy, require_caller_id, trusted_proxy_cidrs, identify_by: source_ip, per_caller_requests_per_min (→ per_agent_requests_per_min). Rego policy inputs rename caller_* → agent_*; the OTel metric label caller → agent (update Grafana dashboards). Who cares: every operator — this is the foundational identity/policy model the control-plane MVP (#265) builds on (#267 agents_dir discovery, #268 enable/disable, #269 reload, #270 fleet view). Verify: talon doctor (agent-identity preflight); go test ./internal/gateway/ -run 'TestResolveEffectivePolicy|TestBuildIdentityRegistry' and go test ./tests/ -run TestNoLegacyCallerNouns (the vocabulary guard); or end-to-end: mint a key, talon serve --gateway, curl with the key (evidence carries agent_id + derived tenant), curl with a random key (401).
Security/consistency hardening from the #266 review (same release, same cutover): (1) PII actions are monotonic — the organization default_pii_action/response_pii_action is a floor an agent can only tighten (block > redact > warn > allow); bare input_scan/output_scan flags no longer synthesize a warn override, so turning on scanning can never weaken an org-wide block. (2) Organization hard constraints — organization_policy.allowed_providers, .allowed_models/.blocked_models, and .max_data_tier bind every agent regardless of its override (models enforced by dedicated Rego rules on org_* input keys; the tier cap is a ceiling agents can only lower); org-wide allowed_tools and session limits are deliberate follow-ups, not silent gaps. (3) allowed_providers rides the resolver — the agent's provider allowlist moved into PolicyOverride/EffectivePolicy.ProviderAllowed, consumed identically by the primary route and every failover candidate (failover skip reason renamed caller_allowlist → agent_allowlist). (4) upstream_auth_mode: client_bearer is rejected outside --proxy-quickstart — in a normal gateway the presented bearer is a Talon agent key, and client_bearer would have forwarded it verbatim to the upstream provider; the quickstart profile flag is unexported, so no YAML config can enable it. (5) An agent key equal to TALON_ADMIN_KEY fails startup in every serve mode that loads agent keys — gateway and plain serve; --proxy-quickstart never builds the registry, so no agent key is loaded there at all (the tenant-or-admin middleware checks the admin bearer first, so the collision would have silently elevated that workload to operator authority); talon doctor runs the same check. (6) The legacy --caller audit-export flag and the caller JSON alias on /v1/evidence/export + /v1/costs/export are deleted (use --agent/agent_id), and the legacy-noun CI guard gained the vocabulary that survived the first sweep (callers:, identify_by, caller key, per caller, …). Verify: go test ./internal/gateway/ -run 'TestResolveEffectivePolicy|TestBuildIdentityRegistryAdminKeyCollision|TestGatewayConfigValidate' and go test ./internal/policy/ -run TestGatewayEngine_EvaluateGateway_OrgModelConstraints.
Config-contract and preflight hardening (#266 review, round 2): (1) schemas/talon.config.schema.json is synchronized with the runtime — the organization hard constraints (allowed_providers, allowed_models/blocked_models, max_data_tier) validate, client_bearer is no longer offered for file configs, and TestConfigSchema_RuntimeParity runs the same YAML through the schema validator AND LoadGatewayConfig so the two contracts can never diverge silently again. (2) The gateway block is strictly decoded (KnownFields): a typo'd security-boundary key (e.g. allowed_provider:) fails config load instead of silently disabling the intended constraint; removed legacy keys keep their friendly breaking-change errors. (3) talon doctor and talon enforce enable share gateway startup's fail-closed identity preflight — a missing agent policy, missing agent.key.secret_name, unminted/ACL-denied/empty/duplicate/admin-colliding key is now a doctor FAIL (was warn) and blocks enforce enable, with a parity regression test per condition. (4) Signed denial reasons name the layer whose rule fired: provider denials record provider not allowed: organization_provider_allowlist vs agent_provider_allowlist (failover skip filters use the same identifiers), and data-tier caps ride per-layer policy-input keys (org_max_data_tier / agent_max_data_tier) with distinct Rego messages (exceeds organization restriction vs exceeds agent restriction) — the record never blames the agent for an organization rule. Verify: go test ./internal/gateway/ -run 'TestConfigSchema_RuntimeParity|TestBuildServeIdentityRegistryModeMatrix' ./internal/doctor/ -run TestGatewayIdentityParity.
Second-order hardening (#266 review, round 3): (1) Root-layout gateway configs are removed — gateway fields at the file root (instead of under gateway:) fail load with a migration error; that layout was the last permissive decode path, where a typo'd security key was silently ignored. Operator-only configs (no gateway vocabulary) keep loading a disabled gateway. (2) doctor/serve share ONE policy→agent adapter — internal/agentbridge.LoadedAgentFromPolicy is now used by gateway startup, talon doctor, and talon enforce enable, so the preflight validates the FULL identity including the policy override (a schema-valid but gateway-invalid egress rule — tier with no destination lists — previously passed doctor and failed serve; now both reject, regression-tested at both layers). (3) Model-less requests fail closed under model policies — blocked_models: ["*"] (agent or org) now denies a request that omits its model, and any active model allow/block policy denies model-less requests with model_required_for_policy_evaluation (the extractor never required a model, so the prompt used to cross the provider boundary unevaluated); HTTP-level test proves zero upstream calls. (4) Schema/runtime parity completed — org max_data_tier accepts the named tier aliases the runtime accepts, timeouts.response_header_timeout added (existed in runtime only), and every gateway-subtree schema object now carries additionalProperties: false to mirror strict decoding. (5) The quickstart "relocated" tenant agent chat route is removed along with its startup log and docs — it was unreachable in production (quickstart never builds the identity registry) and its documentation advertised a dead path (#285). Verify: go test ./internal/policy/ -run TestGatewayEngine ./internal/gateway/ -run 'TestConfigSchema_RuntimeParity|TestBlockedPath_Modelless' ./internal/doctor/ -run TestGatewayIdentityParity.

BREAKING — #266 follow-ups: agent-scoped mutations, atomic identity snapshots, authoritative runtime budgets (#286, #288, #289, #290)

Two agents in one tenant are now fully isolated on every agent-key-reachable surface (#286). Sessions, pending plans, and trigger history joined evidence/costs/memory in the agent-scoped read contract (store-level agent_id filters; another agent's record is a 404, indistinguishable from a missing one), and the isolation extends to MUTATION: POST /v1/sessions/{id}/complete enforces agent ownership inside the UPDATE's WHERE clause, so agent A can no longer close agent B's session by id (found in review — reads alone were not the boundary). Session ownership is the one agent that created the row; external session ids were already unique per (tenant, agent) tuple, so owner filtering is participation filtering. Plan/trigger handlers additionally honor the admin ?tenant_id view. Who cares: multi-agent tenants (CrewAI-style crews, coding-agent fleets) — per-agent blast radius now holds across the whole tenant API. Verify: go test ./internal/server/ -run TestSessionsPlansTriggersAgentScope ./internal/session/ -run TestComplete_TenantScoped.
The identity registry has ONE atomic snapshot seam (#289, groundwork for #269 reload). A shared RegistryHolder (atomic pointer over the immutable registry) now feeds the gateway data plane, server agent-key auth, the dashboard caps lookup, and the metrics scope — previously each captured its own startup copy, so a future reload would have updated some consumers and not others. Auth decisions are single-snapshot by construction: AuthenticateAgentKey returns the resolution AND the keys-configured (dev-open) fact from one registry read, closing the empty→non-empty swap race; metrics capture ONE combined scope (tenant filter + budget denominators) per snapshot, so a reload can never denominate one registry generation's spend against another's caps. Who cares: operators planning key rotation/reload (#269) — the swap is now one pointer store away. Verify: go test ./internal/cmd/ -run TestHolderKeyResolver ./internal/gateway/ -run TestRegistryHolder ./internal/metrics/ -run TestScopeFn.
Budget numbers come from the running runtime, never guessed files (#288). The dashboard gauge sums per-agent binding effective caps over the identity registry (registry + ResolveEffectivePolicy, the exact path enforcement uses — the tightest of the agent cap and org ceiling); /v1/costs/budget answers unknown_agent / unresolved_multi_agent explicitly instead of reporting the default agent file's caps for an agent it doesn't know; and talon costs queries the running server first (new --url flag, TALON_ADMIN_KEY auth) with a strict tri-state contract — a server ANSWER is final (including no-cap answers), an EXPLICIT --url that is unreachable or rejects the query is a hard error naming the network failure (DNS/refused/timeout/TLS), and only the implicit localhost probe may fall back to local resolution for offline use (follow-up on the probe contract: #293). Budget lines print their source ([server_agent_effective_cap] vs local) in human output. Who cares: FinOps and operators — the CLI/dashboard denominator can no longer disagree with what enforcement gated on, and it says where every number came from. Verify: go test ./internal/cmd/ -run 'TestResolveBudgetUsage_ServerAuthority|TestAgentCapsLookupFor' ./internal/server/ -run TestCostsBudget_RuntimeResolvedContract.
The single-agent-policy limitation is explicit, and attribution can no longer diverge from governance (#290). Until agents_dir discovery (#267), exactly one agent policy is loaded per process (TALON_DEFAULT_POLICY/--policy): a native run, trigger, or server run request naming any OTHER agent fails loudly (unknown agent … (#267)) BEFORE any lifecycle state exists — no session row, no run-registry entry, only signed early-termination evidence — and "default" is the unset sentinel on every surface (CLI, runner, and the run API), resolving to the loaded policy's agent instead of being rejected as spoofing. Unnamed runs now attribute their session, trace, registry entry, and evidence to the governing policy's agent, never the "default" placeholder. Documented in LIMITATIONS.md §8, talon doctor, and --help. Verify: go test ./internal/agent/ -run TestRun_AgentIdentitySettledBeforeLifecycle ./internal/server/ -run TestResolveRunAttribution.

Changed

Repositioned README, ROADMAP, and docs around the control plane for company AI use cases. The product category is now "control plane for company AI use cases" with four pillars — cost control, reliability, shared policy, session understanding — and evidence as the cross-cutting proof layer; EU sovereignty/compliance support are framed as differentiators rather than the category. New canonical explainer: docs/explanation/control-plane.md. docs/README.md is reorganized by operator jobs with pillar tags; compliance and memory/plan-review docs are reframed as proof-layer/optional-layer content (no capability claims changed). Removed docs/reference/dashboard-competitor-benchmark.md (competitor-parity matrix), the stale ADOPTION_SCENARIOS.md (#276), and merged the duplicate docs/policy-cookbook.md into docs/guides/policy-cookbook.md. Who cares: anyone quoting Talon's positioning or navigating docs. Verify: bash scripts/check-claim-discipline.sh; browse docs/README.md. The GitHub roadmap was normalized the same day (pinned epic #265, milestones "MVP: control plane for company AI use cases" and "Parked — not on active roadmap").

Fixed

Server-side agent runs silently skipped sovereignty routing (#261). The agent runner applies compliance-aware routing only when RunRequest.SovereigntyMode is set; the CLI set it, but the two server run handlers (/v1/agents/run, native chat) built their RunRequest without it — so a server-side run for the same agent ignored the data_sovereignty_mode the CLI honored, and no RoutingDecision (rejected + selected candidates) reached signed evidence. serve now threads cfg.EffectiveSovereigntyMode() into both handlers (WithSovereigntyMode), and the governed-session demo gained the single-stack sovereignty story (eu_preferred + confidential-tier routing: one Talon process, one evidence trail, a healthy US primary pre-emptively rejected in favor of LOCAL with both candidates evidenced). Who cares: anyone running EU-strict/preferred deployments through the HTTP API rather than the CLI. Verify: go test ./internal/agent/ -run TestResolveProvider_SovereigntyRoutes_USRejectedLocalSelected and go test -tags=integration ./tests/integration -run TestSovereigntyRouting_ServerHTTP.
Scaffolded policies shipped a self-contradictory tier_2 route: bedrock_only: true with non-Bedrock models (#207, #277). Every fresh talon serve logged the same routing warning twice, and per that warning the router then forced the Bedrock provider for a model Bedrock cannot serve — scaffolded tier_2 (PII) routing was broken, not merely noisy, across 10 shipped templates. The default/minimal scaffold and generic/langchain/crewai packs drop the broken flag (tier_2 now routes to the provider the scaffold actually configures); the EU packs and gdpr/dora compliance overlays carry residency via location: eu + sovereignty mode and document the correct Bedrock-pin path — which also fixes the compliance-overlay merge that would have forced Bedrock onto OpenAI models. The double routing-warning log itself is gone too: serve re-loaded the policy file just to re-read routing/cost config it already had. Who cares: everyone on the onboarding path — first serve is now clean. Verify: talon init --scaffold && talon serve (no routing warning); go test ./internal/cmd/ -run TestInit_GeneratedPolicyHasNoRoutingWarnings.
Budget enforcement and talon costs now agree on where "today" starts, and budget-utilization honors per-caller caps (#216, closing the timezone + alert halves). Two defects made the dashboard disagree with what the runtime enforced. (1) Timezone: evidence timestamps were persisted with the server's local UTC offset, and the mattn/go-sqlite3 driver serializes a time.Time keeping that offset (e.g. 2026-07-08 18:00:00-08:00), so cost-window queries — which compare those strings lexicographically — bucketed a request into the wrong UTC day for the offset hours around midnight on any non-UTC host. An operator could watch talon costs report a caller under its daily cap while the gateway denied (or vice versa). The evidence store now normalizes every record's timestamp to UTC at write time (before signing, so records stay self-consistent) and every time-window query bound — including the audit timeline neighbor lookup, whose target comes from a record's own JSON and so could still carry an old offset — to UTC at read time, so enforcement (callerCostTotals) and reporting (talon costs, both already UTC-windowed) compare apples to apples. Forward-looking: already-written records keep their stored offset, but every read path now buckets them by UTC instant; no back-migration. (2) Budget-utilization ignored overrides: the talon.budget.utilization gauge and 80%/95% alerts divided spend by ServerDefaults caps only, so a caller with a per-caller max_daily_cost/max_monthly_cost override saw a dashboard denominator that didn't match the cap enforcement actually gated on. A single shared ResolveCostCaps helper (server default overlaid by override when set) now feeds both the utilization metrics/alerts and the policy input, so they can't drift. Who cares: any operator on a non-UTC server, or any deployment using per-caller budget overrides — the "dashboard shows what runtime enforced" promise now holds at the day boundary and under overrides. Verify: go test ./internal/evidence/ -run TestStore_NormalizesTimestampToUTC ./internal/gateway/ -run TestResolveCostCaps. No config or CLI changes.

1.7.1 - 2026-07-07

Added

Pricing tables declare their currency, and every cost surface honors it (#216 currency slice, #257). pricing/models.yaml (and the embedded default) gain a top-level currency: field — a 3-letter ISO-4217 code, defaulting to USD because the shipped tables' values always were USD. The gateway stamps the code into each signed evidence record at write time (execution.currency), and the declared unit now renders everywhere a hardcoded €/EUR label used to lie: talon audit list/show/verify --session, session rollups, talon costs (labels, budget-utilization lines, JSON payload gains currency), talon metrics, talon report, the signed export, the /api/v1/metrics snapshot, and the gateway dashboard. Budget caps (max_daily_cost/max_monthly_cost/max_session_cost) are documented as denominated in the pricing-table currency — resolving the "USD numbers compared against caps operators read as EUR" half of #216 (its timezone and budget-alert defects remain open). Who cares: every operator reading talon costs or handing evidence to FinOps/auditors — the unit is now honest and self-described; EU teams wanting true EUR set currency: EUR with EUR rates. Migration (CSV consumers): talon costs export renames the cost_eur column to cost and adds a currency column; JSON payloads keep their legacy *_eur keys with currency as the authoritative unit. Verify: talon costs (amounts render $…), talon audit export --format json | jq '.records[-1].currency'.
Governed-session demo: one real two-provider agent session under one budget (#107 Act II, #257). examples/governed-session/ drives a real Anthropic planner + real OpenAI executors (bring-your-own keys, cheap models, ~$0.05/run) through the enforce-mode gateway: a prompt-cache write then read on the same ~2.3k-token prefix (cache_control), OpenAI cached_tokens on repeat executor calls, a forbidden admin_* tool stripped from the request body, an IBAN denied pre-forward mid-session, and an executor loop that runs until real cross-provider session spend closes the max_session_cost gate with a pre-forward 403 — then prints the money story from the signed export (live-verified: naïve all-tokens-at-input-rate accounting overstated a real session 65%, $0.0577 vs Talon's cache-aware $0.0350) and finishes with talon audit verify --session → all records valid. Who cares: evaluators who want proof on live traffic, not mocks, and FinOps teams pricing cached agent workloads. Verify: export ANTHROPIC_API_KEY=… OPENAI_API_KEY=…; make governed-session && cd examples/governed-session && ./demo.sh all. CI: the six-proof mock shortlist demo gains a rot-guard workflow (path-gated PRs, main pushes, nightly); the governed-session demo verifies nightly when DEMO_ANTHROPIC_API_KEY/DEMO_OPENAI_API_KEY repo secrets are configured (skips cleanly otherwise, never on fork PRs).

Fixed

Budget deny reasons were illegible at real-money scale and broken at zero spend (#255, #257). The three budget rules rendered amounts with %.2f, so real sub-cent API costs printed 0.00 in the deny body and signed evidence — and OPA's sprintf refuses %f for integral JSON numbers entirely, so a first-request session denial rendered session spend %!f(int=0000). Amounts now use %v with 4-decimal rounding: always valid, legible at sub-cent scale (live: session spend 0.035 + estimate 0.0063 exceeds limit 0.04). Verify: opa test internal/policy/rego/ and a first-request deny against a max_session_cost below the pre-request estimate.
talon audit show hid the fields that explain cache-aware costs (#256, #257). Evidence records carry cache_read/cache_write token counts and pricing_basis (#196), but the single-record view — what an auditor drills into — dropped them, making a corrected cost look inexplicable next to naïve input×rate math. The Tokens line now includes cache_read=…/cache_write=… when non-zero, plus a Pricing Basis: line, and the Cost line renders the record's stamped currency. Verify: run any prompt-cached request, then talon audit show <id>.
Shortlist demo Proof 6 failed on Linux hosts — bind-mounted ./out was unwritable, then unreadable (#258, #257). Two stages, both invisible on macOS (Docker Desktop uid-mapping): the container couldn't write the compliance exports into the host-owned ./out mount (first container-side write in the demo), and once writable, the exports landed 0600 under the container uid so the demo's host-side jq read-back died — silently, because the failing pipeline's stderr was discarded under set -euo pipefail. The up script now makes the throwaway artifact dir world-writable, the demo chmods its exports readable and keeps compliance stderr in out/compliance.stderr (dumped on failure), and the new rot-guard workflow — which caught this on its first run — keeps the demo honest on Linux. Verify: make verify-shortlist-demo on any Linux host.
Mid-stream upstream death now emits a family-correct terminal SSE event (#195). Previously a provider dying mid-stream (or the gateway's own request timeout firing) simply truncated the SSE stream with a 200 already on the wire — Codex retry-loops waiting for a response.completed that never comes; Anthropic SDKs hang until their own timeout. Anthropic-wire streams now end with the documented event: error (api_error), Responses streams with event: response.failed (upstream_error); healthy streams are byte-identical to before. Chat Completions has no standard mid-stream error event — Talon does not fabricate [DONE], and the remaining truncation is documented in LIMITATIONS. Terminal-event messages are gateway-authored constants; upstream error text is never forwarded. Who cares: anyone running Codex or long Claude generations through the gateway — a dead upstream is now an explicit, machine-readable stream ending. Verify: go test ./internal/gateway/ -run TestStreamCopy_ -v.
Gateway error envelopes are now provider-native on three more paths (#195). (1) Anthropic-family denials without a machine code carried "type": "error" — not a member of Anthropic's error enum, so typed SDK error handling fell through; the fallback now maps the HTTP status to the correct enum member (400→invalid_request_error, 401→authentication_error, 403→permission_error, 404→not_found_error, 413→request_too_large, 429→rate_limit_error, 529→overloaded_error, other 5xx→api_error). Machine codes from the documented prefix convention (e.g. session_budget_exceeded) still travel in error.type; the final enum contract table is #142's. (2) Response-PII blocks (HTTP 451) and scanner-unavailable blocks (502) — streaming and non-streaming — previously returned a bare {"error":{…}} on both wire families; they now render through the shared per-family envelopes (Anthropic {"type":"error","error":{…}}; OpenAI envelope gains its code field). (3) Semantic-cache hits on anthropic routes returned an OpenAI chat completion; they now return an Anthropic Messages object. Also documented: pre-route errors (unknown provider prefix) intentionally use the OpenAI shape since no wire family is resolved yet. Who cares: anyone whose Claude-family client parses gateway denials — typed error handling and retry logic now see valid shapes. Verify: go test ./internal/gateway/ -run 'TestWriteAnthropicError_StatusMappedTypes|TestScanResponseForPII_BlockBodyPerFamily|TestWriteCachedCompletion_AnthropicShape' -v.

1.7.0 - 2026-07-06

Added

talon secrets set --tenant/--agent: scope CLI-set secrets per tenant (#237). talon secrets set wrote an empty ACL, which means allow-all — any authenticated tenant's gateway traffic could cause retrieval of any CLI-set secret, so multi-tenant secret isolation silently did not exist unless secrets were seeded programmatically. New repeatable --tenant and --agent flags (glob patterns allowed) scope a secret to specific tenants/agents. The default stays allow-all for backward compatibility but now prints a stderr notice pointing at the flags, and scoped sets echo the stored ACL; talon secrets audit shows the per-tenant allow/deny decisions. Single-tenant deployments are unaffected. Who cares: MSP/multi-tenant operators — provider-key isolation between customers is now a one-flag change. Verify: talon secrets set k v --tenant acme, then confirm an unscoped talon secrets set prints the allow-all notice; go test ./internal/cmd/ -run TestSecretsSet -v. Docs: multi-tenant/MSP guide → "Scope vault secrets per tenant".
Coding-agents adoption surface: policy pack, integration guides, reproducible demo (#200 docs, #201, #202, #203 — epic #192 PR-I). Three pieces turn the epic's machinery into a 10-minute rollout. (1) talon init --pack coding-agents scaffolds a governed two-caller gateway (Claude Code on the Anthropic wire, Codex CLI on the Responses wire) with honest defaults: response_pii_action: allow (anything else buffers whole SSE streams today), soft max_session_cost, raised coding timeouts, and four high-precision credential recognizers (PEM private-key block, AWS AKIA…, GitHub ghp_/github_pat_, Anthropic/OpenAI sk-… keys) that fire in the real scan path (fixture-tested) — Talon is not a secret scanner; the pack says so and points at gitleaks/trufflehog. The OpenClaw pack's "credential scanning" claim is now backed by the same recognizers. (2) Guides: claude-code-integration.md, codex-cli-integration.md (the docs half of #200), and governing-coding-agents.md — the canonical neutral-metadata-contract reference (generic X-Talon-* headers, vendor adapters as data, precedence, hygiene, provenance) — plus a new LIMITATIONS.md §7 stating every coding-agent sharp edge with its backing test (attribution≠authentication, local tools invisible, subscription billing ungovernable, stream buffering, cache-price fallback, soft caps, store semantics, tool-content evidence-only). README gains a coding-agents integration row; the policy cookbook gains the session-budget/recognizer recipe. (3) make coding-agents-demo (#203): one command, fully offline — the mock provider now speaks both wire families incl. SSE (Anthropic message_start/content_block_delta/message_delta with cache-token usage; Responses response.completed with cached_tokens), and demo.sh walks a cross-provider session with subagent attribution, a PII event, a provider-native session_budget_exceeded denial, and a signed export that verifies. The same sequence is CI-smoke-tested without Docker (TestCodingAgentsDemo_EndToEnd builds and drives the real mock binary). Who cares: platform teams who want the epic's governance running against their coding fleet this afternoon, and skeptics who want the receipts first. Verify: go test -tags=integration ./tests/integration -run TestCodingAgentsDemo_EndToEnd and go test ./internal/cmd/ -run TestInitPack_CodingAgents -v.
Dashboard: orchestration session drill-down (#199, epic #192 PR-H). Operators now see coding sessions and subagents, not just callers. The gateway dashboard gains a Coding Sessions panel: the most recently active client/vendor-asserted sessions with request/allow/deny counts, providers, models, token totals, cost, and a click-to-expand per-subagent breakdown (generator, judge ← generator, …). A mixed-provider session renders as one session — the point of the neutral session contract (#194). The numbers are produced by the same pure function behind talon audit list --session (evidence.BuildSessionSummary), re-derived from signed evidence on every snapshot — the dashboard and the CLI are structurally incapable of disagreeing and the destructive 30s ReconcileFromStore rebuild can't change them (tested). Denials are now bucketed by machine-code reason (denials_by_reason: session_budget_exceeded, budget_exceeded, egress_*, …) instead of lumping under policy_deny; the evidence→metrics projection carries session_id + orchestration attribution (previously dropped). Panel is hidden without orchestration data; every client-asserted string is HTML-escaped (hostile input — enforced by tests); the old metrics feed card is renamed "Gateway Activity Feed" to end the naming collision. No new endpoints — sessions and denials_by_reason ride the existing /api/v1/metrics JSON + SSE (schema documented in gateway-dashboard.md). Who cares: platform operators running Claude Code/Codex fleets — "which session is burning money, on which subagent, across which providers" is now one glance. Verify: go test ./internal/metrics/ -run 'TestFillSessions|TestDenialsByReason' -v.
Session budget enforcement at the gateway (#198, epic #192 PR-G; fixes #214, #215). A runaway coding session is now denied as a unit, synchronously, in the policy hot path. Set policy_overrides.max_session_cost on a caller and the gateway denies a new request once accumulated session spend + the pre-request estimate exceeds the cap — cross-provider (€6 on anthropic + €5 on openai against a €10 cap → the next request is denied on either route), with reason session_budget_exceeded: … rendered provider-native and a structured session_budget: {limit, spent, estimate} block in signed evidence (evidence spec 1.8, additive). This is a soft cap: one in-flight request can overshoot (estimate < real cost); the overshoot is caught on the next request; atomic reservation stays #144. Shadow mode records would-have-denied; a session-store failure fails open with a session_budget_unavailable evidence annotation; hot-path cost is one caller-scoped SQLite read (~0.03 ms measured, BenchmarkSessionBudgetLookup). Underneath, the session lifecycle is fixed (#214): asserted session ids create-if-absent under the caller-scoped tuple (tenant_id, caller_id, external_session_id) (unique index; additive sessions columns external_session_id/caller_id/source), synthetic ids never create session rows (the orphan-row-per-request growth is gone), usage actually accumulates, and rows follow audit.retention_days via a daily sweep. Isolation is structural (#215): two callers asserting the same session id get separate sessions and budgets, session reads in policy input go through the tuple (never the raw client-supplied id), and GET/POST /v1/sessions/{id}[/complete] + session listing now enforce tenant ownership — another tenant's session is indistinguishable from a missing one (404). Who cares: platform teams putting Claude Code/Codex fleets behind Talon — a runaway orchestrator burning budget in one session is stopped at the gateway with signed proof. Verify: go test ./internal/gateway/ -run TestSessionBudget -v and go test ./internal/session/ -v.
Session-scoped audit and cost rollups (#197, epic #192 PR-F). A multi-model coding session — an orchestrator on one provider delegating to executors/judges on another, all sharing one session_id (#194) — is now auditable as a unit. talon audit list --session <id> prints a caller-scoped session summary (window, request/allow/deny/error counts, providers, models, token totals incl. cache read/write, total cost) plus a per-subagent breakdown keyed on the client-asserted orchestration.agent_id (falling back to the caller), then lists the session's records. talon audit export --session <id> scopes any export format to one session; talon audit verify --session <id> HMAC-verifies every record in a session and exits non-zero on any failure; talon costs --session <id> (with --json) gives the cost rollup. The aggregation is a single pure function, evidence.BuildSessionSummary, reused verbatim by the dashboard sessions panel (#199) so CLI and UI can never drift. Caller-scoped by construction: --tenant/--caller filters drop records that are not the caller's, and the summary surfaces every distinct caller that touched a session_id so a cross-caller collision is visible rather than silently merged. No new tables — it reads existing signed evidence via Store.ListBySessionID. Who cares: platform/FinOps/DPO teams governing coding-agent rollouts — "what did this whole coding session cost, across which models, and did every record verify" is now one command. Verify: go test ./internal/evidence/ -run TestBuildSessionSummary -v and go test ./internal/cmd/ -run 'TestAuditListCmd_SessionScoped|TestAuditVerifyCmd_Session' -v.
Provider-aware usage-detail extraction and cache-aware pricing (#196, epic #192 PR-E). Signed cost evidence is now correct for prompt-cached and streamed traffic — the traffic class coding agents generate on nearly every call. The gateway parses prompt-cache tokens per provider family (Anthropic cache_creation_input_tokens/cache_read_input_tokens, which are separate counts; OpenAI prompt_tokens_details.cached_tokens / Responses input_tokens_details.cached_tokens, which are a subset of input and are normalized to input = prompt − cached), and reads OpenAI Responses streaming usage from the terminal response.completed event (Codex always streams — previously its cost was estimate-only). The cost estimator contract is now cache-aware and provider-keyed (CostEstimator func(provider, model string, Usage) CostResult), so the routed provider's real pricing is used instead of a max-across-providers guess, and evidence records how the number was derived (pricing_basis: table | cache_fallback_input_rate | default_estimate; pricing_known) so a signed cost is never silently a fallback. Pricing schema gains optional cache_read_per_1m/cache_write_per_1m (absent → cache tokens priced at the input rate, fail-conservative — never below pre-change); current Anthropic (write 1.25×, read 0.1×) and OpenAI (cached 0.1×, no write premium) models refreshed. Evidence execution.tokens gains cache_read/cache_write, execution gains pricing_basis/pricing_known (evidence spec 1.7, additive); talon audit export gains cache_read_tokens/cache_write_tokens/pricing_basis columns. Chat-completions streaming requests get stream_options.include_usage injected (per-provider inject_stream_usage, default true) so their usage is captured. Who cares: anyone reading talon costs/signed FinOps evidence for Claude Code or Codex traffic — cached-prompt spend was materially misstated before. Verify: go test ./internal/gateway/ -run TestGatewayCacheCost_EndToEnd -v and go test ./internal/pricing/ -run TestEstimateCached -v.
Provider-neutral orchestration metadata contract (#194, epic #192 PR-D). Coding agents (Claude Code, Codex, any client) now get per-subagent attribution in signed evidence. The gateway ingests session/subagent/parent identity from generic X-Talon-Session-ID / -Agent-ID / -Parent-Agent-ID / -Client headers, or a vendor adapter (Claude Code's x-claude-code-*, Codex's session-id/x-openai-subagent) — adapters are a data table, not code branches, so a new client needs no core change. Precedence is generic > vendor > absent; one session_id groups a coding session across provider routes (an anthropic orchestrator delegating to an openai executor). Recorded as an orchestration block (evidence spec 1.6), flattened into talon audit export (orch_agent_id, orch_client, orch_session_source), and shown by talon audit show. Evidence-only and caller-scoped by construction: identity is provenance: client_asserted, never a policy input (attestation is #149), and one caller can never join another caller's session by asserting its id. Per-caller accept_client_metadata (default true) gates recording; hostile header values (oversized, non-token charset, HTML injection) are rejected with a 400 before reaching evidence. Also closes the #219 spec drift by backfilling the failover field into the integrity-spec field table. Who cares: security/DPO teams and platform leads governing coding-agent rollouts — "which subagent, in which session, sent what" is now answerable from signed evidence. Verify: send a request with X-Claude-Code-Agent-Id: reviewer, then talon audit export --format json | jq '.records[-1] | {orch_agent_id, orch_client}'.

Fixed

connect_timeout doubled as the response-header budget — long non-streaming requests were killed at 10s (#230). The gateway set http.Transport.ResponseHeaderTimeout from connect_timeout (default 10s), so any upstream whose time-to-first-byte exceeded 10s was aborted (http2: timeout awaiting response headers) regardless of request_timeout=120s — long-prompt non-streaming Responses/Messages calls hit this routinely. Dialing, meanwhile, was not bounded at all (no DialContext). Now connect_timeout bounds connection establishment (TCP dial + TLS handshake) via a real net.Dialer, and a new gateway.timeouts.response_header_timeout bounds the header wait, defaulting to request_timeout so slow-TTFB calls get the full request budget. The coding-agents pack and both integration guides drop their connect_timeout: 60s workaround. No config change is required to benefit; set response_header_timeout explicitly only to tighten it. Who cares: any operator running non-streaming traffic with large inputs or high reasoning effort. Verify: go test ./internal/gateway/ -run TestHTTPClientForGateway -v. Docs: per-phase timeout table in configuration reference.
Test and CI hardening (#234, #246, #236 guard, #242). Two order-dependent test flakes fixed: TestToolApprovalStore_Cleanup (a 50 ms approval timeout could expire before the poll loop observed the pending request under parallel load — #234) and TestBudgetAlertClaimFire (a package-global 1 h-cooldown dedupe table was never reset, so any -count>1 run failed deterministically — #246, found while validating #234). A new integration guard (TestExampleComposeHostPathsAreTracked) asserts every host path bind-mounted by an example compose file is git-tracked, closing the #236 loop so a .gitignore rule can never again silently drop a demo's config. And CI now runs shellcheck over every tracked *.sh (#242) — the demo/ops scripts were previously invisible to CI, which is how three shell-only bugs (#239, #240, #241) shipped in sequence; 65 findings were fixed or triaged to a clean tree, including a real unquoted-expansion hazard in run-benchmarks.sh. Contributor-facing; no runtime behavior change.
.gitignore swallowed every talon.config.yaml — five example stacks referenced configs that were never in git (#236). The repo-wide ignore (meant for user-local root configs) silently kept intentional example/template configs out of commits and out of //go:embed: the README's 60-second demo compose, the copaw/gateway-minimal/scanner examples — and it nearly shipped the new coding-agents pack with its config template missing (caught by PR-I's pre-merge adversarial review; a fresh clone would have failed talon init --pack coding-agents at runtime). .gitignore now carves out internal/pack/templates/** and examples/**, and all six previously-swallowed configs (verified secret-free — they carry vault secret names only) are committed. Also fixed en route: pack wizard post-init base URLs gain the required trailing /v1 for OpenAI-SDK/Codex clients (#235), and the OpenClaw template's recognizer set now matches the coding-agents pack exactly.
talon init scaffold wrote a stale pricing table and schema-invalid numeric agent names (#231, #232). The scaffold embedded a third, drifting copy of the pricing table — missing every model added since (incl. gpt-5.3-codex, claude-sonnet-5) and all prompt-cache rates — which silently shadowed the binary's current embedded table (LoadOrDefault prefers a loadable file), so freshly scaffolded projects priced current models as default_estimate. talon init now writes the embedded default's exact bytes (pricing.DefaultModelsYAML(); the drifting template is deleted; an equality test prevents recurrence). Separately, all init/pack templates rendered name: {{ .Name }} unquoted, so --name 192 produced a YAML integer and the generated agent.talon.yaml failed schema validation immediately after init printed success — names are now rendered with %q quoting across all ten templates. Found during end-user verification of epic #192. Verify: go test ./internal/cmd/ -run TestInitScaffold -v.
Gateway force-overwrote the client's Responses API store field (#213). ensureResponsesStore unconditionally set store: true, silently reversing an explicit client store: false — 30-day provider retention against the client's stated intent. This matters for Codex CLI, which sends store: false and resends the full transcript each turn. Now governed by gateway.providers.<id>.responses_store_mode: preserve (new default) honors client intent for every client; force_if_absent sets store: true only when the field is absent (opt-in for previous_response_id continuity — OpenClaw/quickstart use it); force_true still forces but records the override of explicit client intent in signed evidence (annotation responses_store_overridden). Migration: OpenClaw-style deployments relying on the old forcing must set responses_store_mode: force_if_absent. Verify: go test ./internal/gateway/ -run TestConformanceResponses_StoreModes -v.
Anthropic streaming cost evidence was silently input-only (#211). Real message_delta SSE events carry top-level usage, which matched the OpenAI parsing branch first — streaming output tokens were never captured, so signed cost undercounted every streamed Anthropic response and TPOT was never computed. Typed Anthropic events are now parsed before the generic branch. Who cares: anyone reading talon costs or signed FinOps evidence for streamed Anthropic traffic. Verify: go test ./internal/gateway/ -run TestConformanceAnthropic_Fixtures/streaming_sse -v.
count_tokens recorded fabricated spend (#218). The free /v1/messages/count_tokens endpoint returns no usage wrapper, so evidence fell back to the fixed pre-request estimate and the invented cost counted against caller budgets. Now classified as invocation_type: "gateway_count_tokens" with cost 0 and zero budget estimate — still fully governed (PII scan + policy run; the token count is recorded in evidence). Verify: go test ./internal/gateway/ -run TestConformanceAnthropic_Fixtures/count_tokens -v.
Block-array system prompts could not be redacted. The form Claude Code sends on every request (with cache_control) was extracted for detection but only string-form system was rewritten, so PII + pii_action: redact failed closed with HTTP 400 on every such request. Block arrays are now redacted; cache_control and untouched blocks survive byte-identically.
Client backoff headers were dropped. Retry-After, request-id/anthropic-request-id, and the Anthropic token-remaining/reset rate-limit headers are now forwarded to callers — coding-agent 429 backoff depends on them.

Added

Anthropic protocol conformance suite (#193, epic #192 PR-A). Recorded Claude-Code-shaped fixtures (streaming SSE, block-array system + cache_control, tool_use/tool_result round-trips, count_tokens, image blocks, tool_choice, ~50KB system prompts) replayed through the full gateway pipeline against a canned upstream, plus a transform-determinism guarantee: identical input yields byte-identical rewritten bodies (non-determinism would silently break provider-side prompt caching for clients). Fixtures are sanitized (synthetic keys, corpus emails) with a scripted recapture procedure (scripts/record-conformance-fixtures.sh) and a pinned last-verified client version (internal/gateway/testdata/conformance/README.md).
Pricing table refreshed to the current Anthropic and OpenAI lineups (verified against vendor pricing pages, 2026-07): Claude Fable 5 / Opus 4.8-4.5 / Sonnet 5 / Sonnet 4.6-4.5 / Haiku 4.5, and GPT-5.5/5.4 families + gpt-5.3-codex. Fixes unknown model for cost estimation warnings (and the resulting flat-fallback cost evidence) for current-model traffic. Legacy entries retained; operators can still override in pricing/models.yaml. Cache read/write rates land with the cache-aware pricing schema (#196).
Large-prompt pipeline benchmark. BenchmarkGatewayPipelineOverheadLargePrompt runs a ~50KB PII-bearing system prompt through the full pipeline (informational row in docs/reference/benchmarks.md; not regression-gated yet).
Observation-only PII scan of tool-related request content (#212, epic #192 PR-B). Agentic loops feed tool output (file contents, query results) back through the gateway on every turn — previously invisible to PII detection, redaction, AND the residual-PII verifier on both wire families. Talon now scans tool_use inputs, tool_result outputs, function-call arguments (Chat Completions tool_calls[].function.arguments, Responses function_call/function_call_output) and records findings in signed evidence (classification.tool_content, evidence spec 1.5) without changing enforcement: tool content cannot be redacted yet, so acting on the signal would fail-close every redact-mode deployment on agentic traffic. Config: gateway.default_policy.scan_tool_content: evidence_only (default) | off. Who cares: security/DPO teams governing coding agents and any agentic caller — "which sessions moved PII through tool results" is now answerable from evidence. Verify: send a request with PII only inside a tool_result block, then talon audit export --format json | jq '[.records[] | select(.invocation_type=="gateway")] | sort_by(.timestamp) | last | {tool_content_scanned, tool_content_has_pii, tool_content_entity_types}' (the flat export carries trailing tool_content_* fields; the full nested block is in --format signed-json records and talon audit show). Limitation stated in LIMITATIONS.md §3; enforcement is future work gated on per-block-type tool redaction.
Responses API instructions is now governed as prompt text. The system-prompt equivalent of the Responses API was previously never extracted — PII in instructions was forwarded verbatim, unscanned. It now joins the main scanned text and is redactable like any other prompt content.

1.6.8 - 2026-07-04

Added

feat(scanner): external EntityScanner adapters and local scanner engines (#181, #204). Operators can now replace Talon's built-in regex PII scanner with an out-of-process engine — a Microsoft Presidio sidecar, any custom detector speaking the Presidio wire format (HTTP or Unix domain socket), or a local LLM (scanner.type: llm, flagship: Ollama) — without changing gateway, MCP, agent, evidence, or redaction paths. Who cares: operators who need detection quality beyond regexes (names, addresses, fuzzy identifiers) or who must keep scanning on their own hardware (air-gap/sovereignty). The core stays deterministic and fail-closed: adapter output is untrusted (size-capped, one invalid entity rejects the scan, rune→byte offset normalization verified against the text), engine timeouts/errors block egress in enforce mode with truthful evidence (classification.scanner with engine identity, version, scan duration, and typed failure kind — spec v1.4; flattened as scanner_engine/scanner_type/scanner_version/scanner_failure in talon audit export), and a residual-PII block is never conflated with an unverifiable scan. Startup health probes refuse to serve against a dead or unrunnable engine (the llm probe warm-loads the model). The llm engine never trusts model offsets: it prompts for verbatim values (fixed versioned prompt llm-ner/v1), relocates every occurrence to byte offsets itself, drops hallucinated and placeholder-shaped values, token-bounds generation, and tolerates field-observed small-model reply shapes (bare arrays, string-array values, unterminated envelopes) while staying fail-closed on anything murkier. scanner.entities narrows the NER prompt for CPU-constrained hosts. Built-in regex remains the zero-config default. Verify quickly: cd examples/scanners/presidio && docker compose up, send a PII prompt through the gateway, then talon audit export --format json | jq '.records[-1].scanner_engine'; or the Ollama variant in examples/scanners/ollama/. Docs: external scanners reference, local scanner engines cookbook. Smoke: tests/smoke_sections/36_external_scanner.sh (hermetic llama stand-in; TALON_SMOKE_OLLAMA_URL opts into real Ollama); nightly scanner-ollama-smoke workflow.

1.6.7 - 2026-07-03

Added

feat(reliability): error-driven, sovereignty-respecting provider fallback chains (#138, #191). Operators can now keep agent traffic flowing through provider outages without giving up governance guarantees. On a transient upstream failure (timeout, connection error, HTTP 429/5xx) Talon walks an ordered fallback chain — on both the gateway proxy (gateway.providers.<name>.fallback, optional per-target model rewrite) and the talon run path (policies.model_routing.tier_N.fallback_chain). Chains are same-wire-format (OpenAI-compatible ↔ OpenAI-compatible, Anthropic ↔ Anthropic; validated at load). Permanent errors (401/4xx) pass through unchanged; once a chain is engaged only success ends it; exhaustion fails closed and the refusal is recorded as a successful governance outcome. Every candidate re-runs the caller's full policy surface (provider allowlist, model lists, target tool policy, budgets, session context) so failover can never become a policy bypass, and under eu_strict a non-EU/LOCAL candidate is never dispatched — shadow mode included, where would-be denials are recorded as shadow violations without changing runtime behavior. Each engagement produces signed evidence: one gateway_failover_attempt/llm_failover_attempt record per failed provider plus exactly one terminal record (fallback decision or fail-closed), linked by correlation_id and a per-engagement failover_group_id. The new api_family provider field lets aliased Anthropic-compatible endpoints get correct parsing, PII redaction, tool filtering, auth conventions (x-api-key + anthropic-version), and error shape. OTel spans expose talon.provider.original / talon.provider.selected / talon.provider.fallback_reason. Verify quickly: point a provider's base_url at a dead port, add fallback: [{provider: backup}], POST through the gateway → 200 served by the backup, then talon audit verify --failover. Docs: configuration reference.

Fixed

fix(config): project-local talon.config.yaml takes precedence over ~/.talon (#191). Viper searched the home directory first, contradicting the documented --config default (./talon.config.yaml or ~/.talon/talon.config.yaml) — a machine-wide config silently overrode per-project sovereignty mode, cache settings, and compliance controller declarations. Upgrade impact: operators who (perhaps unknowingly) relied on ~/.talon/talon.config.yaml overriding a project-local file now get the local file; pass --config ~/.talon/talon.config.yaml explicitly to keep the old behavior. Verify quickly: run talon config show in a directory with its own talon.config.yaml.
fix(gateway): Anthropic plain-string message content is now PII-redacted (#191). The redactor only handled content-block arrays; the Messages API's plain-string content form reached the post-redaction verifier unredacted and failed closed in redact mode.
fix(server): /v1/dashboard/governance-alerts returns "alerts": [] instead of JSON null when no alerts exist (#191).

1.6.6 - 2026-06-30

Added

feat(sovereignty): air-gap deployment mode with egress guard (#132, #185). Operators in regulated EU environments can now deploy Talon with provable in-region operation: sovereignty.deployment_mode: air_gap implies eu_strict, applies deny-by-default gateway egress (EU/LOCAL only when no custom rules are set), wraps the upstream HTTP client with an allowlist derived from declared Ollama/gateway endpoints and optional allowed_egress_hosts, and hard-fails startup when TALON_SECRETS_KEY / TALON_SIGNING_KEY are missing or still the generated defaults. Verify quickly: cp examples/airgap/talon.config.airgap.yaml ~/.talon/talon.config.yaml, set explicit 64-hex keys, talon doctor --gateway-config ~/.talon/talon.config.yaml --skip-upstream. Docs: air-gapped deployment guide, examples/airgap.
feat(compliance): talon compliance sovereignty posture report (#133, #186). Security and DPO reviewers can now export a sovereignty posture document (HTML or JSON) that merges declared facts (sovereignty.mode, deployment_mode, gateway provider regions, operator env keys) with observed egress from signed evidence — including providers that were declared but excluded under eu_strict (excluded_declared / gateway excluded posture). Verify quickly: talon compliance sovereignty --format html --output sovereignty.html --from 2020-01-01. Docs: configuration reference.
feat(demo): reproducible shortlist demo bundle (#107, #184). A self-contained examples/shortlist-demo/ bundle (config, agent policy, demo.sh, docker-compose) for repeatable buyer shortlist walkthroughs without ad-hoc setup. Verify quickly: cd examples/shortlist-demo && ./demo.sh.

Changed

feat(sovereignty): non-fatal eu_strict provider gate with runtime gateway denial (#111, #188). Under eu_strict, Talon no longer refuses startup when non-EU/LOCAL providers are declared alongside compliant ones — they are excluded from routing with ERROR logs and talon.sovereignty.provider_excluded_total, while EU/LOCAL providers keep the process running. The gateway denies direct requests to excluded providers at runtime (HTTP 403 + signed evidence + talon.sovereignty.provider_denied_total; shadow mode records violations but still forwards). Region-aware gating (Bedrock, Azure OpenAI, Vertex) uses the configured region (AWS_REGION, provider config) — e.g. us-east-1 excludes Bedrock even when metadata lists EU regions. talon doctor warns (exit 0) when exclusions exist but something EU/LOCAL remains routable; it fails only when nothing is routable, with gateway vs native checks separated so a compliant native provider does not mask an all-US gateway. serve / run / plan call ApplySovereigntyGate instead of hard-failing ValidateSovereignty. Breaking change: operators who relied on startup failure to discover misconfigured US providers must now check ERROR logs, talon doctor, gateway 403 responses, or talon compliance sovereignty. Recommended: run talon doctor --gateway-config ... --skip-upstream in CI. Verify quickly: declare OpenAI (US) + Ollama (LOCAL), talon serve --gateway starts, OpenAI proxy returns 403, Ollama returns 200, doctor warns with exit 0. Docs: air-gapped deployment guide, configuration reference.

1.6.5 - 2026-06-15

Changed

feat(compliance): RoPA now distinguishes redacted from raw PII at each recipient, and cross-checks declared residency against observed transfers. Two accuracy gaps surfaced during field testing. (1) Section 5 (Recipients) listed identifier types per destination (e.g. email → openai) without saying whether the raw values actually reached the recipient — misleading when redact_pii was on and the provider only ever received placeholders. Types that were redacted in every flow to a destination are now annotated (redacted before egress); a type forwarded raw even once stays unannotated (no overstatement in either direction). The JSON export gains a redacted_entity_types field per destination. (2) Declaring compliance.data_residency: eu while running llm.routing.data_sovereignty_mode: eu_preferred/global let non-EU transfers happen silently relative to the declaration; the RoPA now adds a consistency: warning when EU residency is declared but non-EU/LOCAL destinations appear in the data-flow evidence, pointing at the two honest resolutions — enforce eu_strict, or document the transfer mechanism (SCCs/adequacy) with your DPO. Verify quickly: declare data_residency: eu, run traffic through a US provider, regenerate talon compliance ropa and see the warning; docs: RoPA declarations guide, configuration reference.
feat(cmd): talon audit show now renders the Data Flow section. The data_flow evidence section was signed and exported but invisible in the human-readable view — operators had to fall back to audit export --format signed-json + jq to see where a request's data went. audit show <id> now prints one line per flow item: source → destination (kind, name, model, region), disposition (forwarded/redacted/blocked/surfaced), data tier, and detected entity types. The PII Redacted line now labels both directions (input=… output=…): it previously showed only the output flag, which read as a contradiction next to a redacted input flow ("PII Redacted: false" while the prompt was in fact redacted before egress).
feat(evidence): data-flow evidence now covers all governed traffic, not only classified data. Previously the data_flow evidence section was recorded only when PII or tier > 0 data was detected, and only on the gateway path — so a clean talon run against OpenAI produced a RoPA with empty Recipients (Art. 30(1)(d)) and Transfers (Art. 30(1)(e)) sections despite real egress to a US provider. Now every request that egresses records at least its prompt → destination flow (provider, model, region): gateway requests, CLI/scheduled/webhook agent runs (new), and MCP proxy tool calls. Provider regions for agent runs resolve from registered provider metadata (e.g. openai → US, mistral → EU, ollama → LOCAL). Blocked flows are recorded as evidence but no longer counted as RoPA recipients/transfers — blocked data never reached the destination. Verify quickly: talon run "hello" then talon compliance ropa --format html --output ropa.html — Section 5 lists your provider and Section 6 flags non-EU transfers with the SCC/adequacy note. No migration impact: data_flow remains optional in the integrity spec (requests denied before egress still omit it); records signed under earlier spec versions verify unchanged.

Added

feat(scanner): Epic #112 PII trust-path hardening (#182). Talon now normalizes built-in and external scanner output through a Presidio-compatible boundary contract (byte-offset canonicalization, classifier.Facade seam) and enforces fail-closed residual PII verification (VerifyEgress / RedactGuard) on gateway, MCP proxy/server, and agent tool args/results — including blocks when redaction produces invalid JSON. Evidence spec 1.3 adds optional compact entity_attributions on data-flow items (field path + spans, no raw values). Tool-approval remediation (re_redact_rescan) re-scans before approval without bypassing residual blocks. Verify quickly: make proof-gates. Docs: Presidio compatibility matrix, LIMITATIONS.md. Closes #112 and child issues #134–#137; external runtime adapters remain #181.
feat(ci): PII benchmark regression gate on every PR; make proof-gates on main push and nightly. Committed testdata/benchmarks/pii_scan_baseline.<goos>.<goarch>.json artifacts are validated and enforced in CI (make benchmark-regression on ubuntu-latest); full Epic #112 proof gates (matrix, egress, fuzz, benchmark) run on main and on a 03:00 UTC schedule.
docs(plan-review): operator guide and phased E2E test case. Step-by-step Plan Review operations (plan-review-operators.md, plan-review-e2e-testcase.md) for CLI, serve auto-dispatch, dashboard, and TC-PR-001–012 pass criteria.
feat(server): compliance HTTP API — /v1/compliance/{coverage,ropa,annex-iv,report} (#109). The talon compliance generators are now exposed over admin-authenticated HTTP, so a DPO or an automation can pull framework coverage and auditor documents from a running server without CLI access to the host. All four endpoints accept tenant, agent, from/to (YYYY-MM-DD), and format=html|json; report additionally takes framework=gdpr|eu-ai-act|nis2|dora|iso-27001. Declarations are re-read from talon.config.yaml / the default agent policy on every request, so declaration edits apply without a restart. Every export records a signed control-plane evidence record (compliance_export_ropa / _annex_iv / _report) carrying the export format and scope — the act of generating an auditor document is itself auditable. Verify quickly: curl -H "X-Talon-Admin-Key: $TALON_ADMIN_KEY" localhost:8080/v1/compliance/coverage | jq '.frameworks[].framework'. Docs: export runbook, auth and key scopes. Tenant keys and anonymous callers are rejected (admin-only); output remains supporting documentation, not a compliance determination.
feat(dashboard): compliance mode in /dashboard (#129). The unified governance dashboard gains a Compliance tab so framework posture is reviewable where the evidence already lives: per-framework coverage cards (each control mapping with its article, Talon control, source, and supporting-evidence count), declaration warnings listing what is still missing for a complete RoPA / Annex IV pack, recent signed evidence in scope, and one-click exports (RoPA HTML/JSON, Annex IV HTML/JSON, framework-filtered report) that honor the active tenant/agent/date filters. Verify quickly: open http://localhost:8080/dashboard?talon_admin_key=$TALON_ADMIN_KEY, select Compliance, click RoPA (HTML). Docs: tutorial — turnkey compliance reports. Closes the gap where compliance posture required the CLI while everything else in the epic was dashboard-first.
feat(dashboard): unified FinOps view on /dashboard (#109). The FinOps & Runtime tab now answers "which tenants / apps / agents are spending money" without leaving the governance dashboard: budget utilization and semantic-cache cards (hits, hit rate, cost saved), and spend breakdowns by caller, model, and provider — all mapped from the existing /api/v1/metrics snapshot (no second metrics pipeline). The Evidence tab's governance quadrant gains a store-wide denial summary from the new GET /v1/dashboard/denials-by-reason endpoint (pii_block, policy_deny, attachment_block, tool_filtered). /gateway/dashboard remains available as a deep link for full gateway telemetry. Verify quickly: run gateway traffic, open the FinOps tab, and cross-check curl -s -H "X-Talon-Admin-Key: $TALON_ADMIN_KEY" localhost:8080/api/v1/metrics | jq .budget_status. Docs: gateway dashboard reference.
feat(init): EU-first compliance policy packs (#128). talon init can now apply curated policy packs for GDPR, NIS2, DORA, and the EU AI Act on every init path: a multi-select step in the interactive wizard, --compliance gdpr,nis2 (or all) with --pack/--scaffold/scripted init, and --list-compliance to browse the catalog. Each applied pack merges its policy defaults into the generated agent.talon.yaml and annotates the header with the articles it supports (supports: gdpr Art. 30 — <source> (<control>)), linked one-to-one to internal/compliance/mapping.go — a build-time link-integrity test fails if an annotation drifts from the mapping table, so generated policies cannot claim support that the coverage report would not back. Verify quickly: talon init --scaffold --compliance gdpr,eu-ai-act --skip-verify && head -30 agent.talon.yaml. Docs: policy packs guide, configuration reference. Packs configure controls that support these articles; they are not a certification or a compliance determination. No migration impact: omitting --compliance generates the same files as before.
feat(evidence): governance parity across all entry paths — MCP server and graph adapter now record data flow; a runtime guardrail prevents future drift. Two paths lagged behind the consolidated data-flow posture and are now reconciled. (1) The embedded MCP server (talon serve → POST /mcp) classifies tool arguments and results for PII and records a data_flow section on every tools/call — including policy-denied calls (disposition: blocked) — with destination region LOCAL (embedded tools execute in-process). (2) The graph adapter (POST /v1/graph/events) records an orchestrator-reported prompt → external:<framework> flow on run_end whenever the external runtime reported a model or non-zero cost; content never transits Talon on this path, so the item carries no entity types and region unknown — Talon never guesses, and the unresolved region deliberately surfaces in RoPA Section 6 as a prompt to gateway the traffic. The shared contract is now enforced in three layers: evidence.ValidateGovernedRecord runs on every store and logs governance_parity_violation warnings (fail-open — evidence is never dropped), TestGovernanceParity_EntryPathContract enumerates all five entry paths in CI, and smoke section 29 verifies black-box that every model-call record in the live evidence DB carries data_flow. New reference doc: Governance control matrix — which controls run on which path, by-design limitations, and the checklist for adding new entry paths. Verify quickly: call any embedded tool via POST /mcp and check talon audit show <id> for the data_flow section. No migration impact: data_flow remains optional in the integrity spec; existing signatures verify unchanged.
feat(compliance): talon compliance annex-iv — EU AI Act Annex IV technical-documentation pack (#126). CTOs and DPOs preparing for the AI Act (high-risk obligations apply from 2 August 2026) can now generate an Annex IV-shaped pack (HTML or JSON) combining declared system facts (compliance.declarations.system in agent.talon.yaml: description, intended purpose, oversight arrangements) with runtime records from signed evidence: models/providers observed, policy denials and reasons (Art. 9 risk controls), plan-review human-oversight events (Art. 14), routing/egress decisions, audited memory writes, and post-market monitoring coverage (Art. 72). The pack explicitly lists items Talon cannot produce (model development process, performance metrics, declaration of conformity) with their owners — honest scoping for deployers. Verify quickly: talon compliance annex-iv --format html --output annex-iv.html, or see examples/auditor-pack/annex-iv.html. Docs: export runbook. Supporting documentation for Annex IV review, not a conformity assessment.
feat(compliance): talon compliance ropa — GDPR Art. 30 Record of Processing Activities export (#125). DPOs and platform teams can now generate an Art. 30(1)-shaped RoPA (HTML print-to-PDF-ready, or JSON) that merges declared facts (controller identity from talon.config.yaml compliance.controller; purposes/retention/legal basis from agent.talon.yaml compliance.declarations) with runtime facts from the signed evidence store (processing activities observed, personal-data identifiers detected, recipients and regions, third-country transfers). Missing declarations never fail the export — they are listed as warnings and rendered as flagged "DECLARATION MISSING" sections so the document tells you what to complete before auditor handoff. Every document carries an evidence-linkage block (record count, sample IDs, talon audit verify command) and a claims-discipline footer: supporting records for review, not a legal filing. Verify quickly: talon compliance ropa --format html --output ropa.html after any governed traffic, or see the committed sample in examples/auditor-pack/ropa.html. Docs: export runbook, configuration reference. No migration impact: both declaration blocks are optional.

Fixed

fix(dashboard): gateway link 404, unstable Blocked card, Detail implicitly verifying. Three UX bugs from manual testing of the unified dashboard. (1) The "Gateway telemetry" links rendered even when the server ran without --gateway, navigating to a plain-text 404; the dashboard now probes /api/v1/metrics on load and, on a 404, hides the links and shows a restart hint instead (auth errors keep the links so a key fix restores access). (2) The Blocked card was recounted from the visible evidence rows, so clicking it — which applies the Denied filter — refilled the table and made the number jump (e.g. 22 → 50); it is now fed by the store-wide denied total from /v1/dashboard/denials-by-reason, relabeled Blocked (all evidence), and stays stable while drilling down. (3) The Detail button fetched /verify alongside the record, flipping the Integrity column exactly like Verify; Detail is now read-only and the detail pane shows the already-known verification state or "Not checked (use the Verify button)". Verify quickly: start talon serve without --gateway and confirm the dashboard shows the hint instead of a dead link. Docs: gateway dashboard reference.
fix(classifier,agent): canonical types on normalization fallback and JSON validity after agent-tool redaction. When Presidio normalization fails, fallback entities now use canonical type strings (email, not EMAIL_ADDRESS). Agent tool args/results match MCP fail-closed posture: block when redaction breaks valid JSON while VerifyEgress still passes.

1.6.0 - 2026-06-10

Added

feat(gateway): egress allow/deny by destination and data classification (#130). Operators can now declare which destinations (providers and/or regions) each data tier may egress to via gateway.default_policy.egress (per-caller override under callers[].policy_overrides.egress). Denials happen in the policy step — before secrets retrieval and before any bytes reach the upstream — return HTTP 403 with machine codes egress_tier_destination_disallowed / egress_destination_disallowed, and map to the new POLICY_DENIED_EGRESS explanation code. This supports data-transfer controls (e.g. GDPR Chapter V transfer policies) for CTO/DPO personas; Talon enforces and evidences the rule, it does not make the compliance determination. Verify quickly: add a tier_2 rule with allowed_regions: ["EU", "LOCAL"], send a payload containing an IBAN to a US-region provider, and expect a 403 plus an egress_decision evidence section. Unconfigured deployments are unchanged (egress is not evaluated); in shadow mode violations are recorded but forwarded.
feat(evidence): egress_decision evidence section (integrity spec v1.2). Signed evidence records now carry an optional egress_decision object (tier, provider, region, decision, matched_rule, reason) whenever an egress policy is configured. The field is additive and appended after data_flow: records signed under spec 1.0/1.1 verify unchanged.
feat(gateway): named data-tier aliases in config. Tier fields in the gateway config (egress.rules[].tier, callers[].policy_overrides.max_data_tier) now accept public/internal/confidential (case-insensitive) interchangeably with 0/1/2, following the ascending-sensitivity convention used by ISO 27001 practice and Microsoft Purview/AGT. This makes policies self-documenting for operators without changing tier semantics: evidence records, Rego inputs, and the JSON schema keep numeric tiers (schema accepts both forms). No migration needed — numeric configs remain valid.
feat(observability): egress decision telemetry. New counter talon.gateway.egress.decisions (tenant_id, tier, gen_ai.system, region, decision) and talon.egress.* span attributes on gateway request spans; egress denials emit a structured gateway_egress_denied log line with correlation_id, tenant_id, tier, destination, and reason.

Changed

fix(config): removed phantom config keys that the runtime never read. talon init no longer generates tenants:, evidence:, llm_provider:, or secrets_key_env: blocks in talon.config.yaml — none of these were parsed by any loader, which misled operators into believing budgets/rate limits or evidence paths were configured there (they live in agent.talon.yaml and {data_dir}/evidence.db respectively). Existing configs with these keys keep working (keys are ignored, as before); regenerate with talon init or delete the blocks to clean up.
feat(config): log_level / log_format in talon.config.yaml now take effect. Previously only the --log-level/--log-format flags worked and the YAML values were silently ignored. Precedence: flag > config file > default.
feat(cache): cache.ttl_by_tier is now enforced. The documented per-tier TTL overrides (public/internal/confidential, seconds) were parsed but never applied; cache entries now use the tier-specific TTL and record their real data tier (previously always public). talon doctor validates the keys. Verify: set ttl_by_tier.internal: 900, store a tier-1 entry, and check its expires_at.
feat(policy): one canonical agent schema. talon validate previously used an embedded schema that had drifted from the documented schemas/agent.talon.schema.json. The embedded schema (now internal/policy/agent.talon.schema.json) is canonical and backfilled with all parsed sections (tool_policies, copaw, semantic_enrichment, session_limits, compliance.plan_review, extended rate/resource limits, destructive_patterns); schemas/agent.talon.schema.json is an exact synced copy enforced by a test.
feat(policy): unknown-key warnings on policy load. Misspelled or misplaced keys in agent.talon.yaml were silently ignored (e.g. policies.plan_review instead of compliance.plan_review). The loader now logs a structured warning naming the unknown field; loading still succeeds for backward compatibility. A test guards that all shipped examples and pack overlays are warning-free.
feat(schema): talon.config.schema.json now covers the full Go config surface — top-level fields (data_dir, secrets_key, signing_key, default_policy, max_attachment_mb, ollama_base_url, log_level, log_format), the cache block, and previously missing gateway fields (upstream_auth_mode, dashboard_listen, response_scanning, network_interception, tool/attachment governance, full caller overrides).
fix(policy): proxy compliance accepts data_residency: "eu". The proxy Rego only matched the literal "eu-only", so the "eu" token that talon init writes was silently unenforced. Both tokens now require EU upstream regions.
feat(otel): routing spans emit talon.routing.* attributes. llm.route/llm.graceful_route spans now carry talon.data.tier, talon.routing.sovereignty_mode, talon.provider.jurisdiction, talon.provider.region, talon.routing.rejected_count, and talon.routing.selection_reason (constants existed but were never emitted; the old non-namespaced data.tier key is replaced).
docs: model_routing.*.location documented as declarative. The field is informational; region enforcement comes from provider registry metadata + llm.routing.data_sovereignty_mode (and gateway egress rules). Documented defaults corrected: audit.retention_days (2555 when section omitted, not 90), attachment_handling.mode (permissive when omitted), memory defaults (max_entries 100, max_entry_size_kb 10, mode: active when enabled), action_on_detection value log_only (not log), and a new cache configuration reference section.

Fixed

fix(policy): compliance.plan_review.volume_threshold and mode were silently dropped on load. The YAML-facing policy.PlanReviewConfig lacked volume_threshold, so the documented volume-detection recipe never reached the runtime; the runner mapping also dropped mode. Both now flow through to plan review and talon intent classify.
fix(pack): EU AI Act overlay require_for_tier: "2" was a no-op. The parser accepts tier_0/tier_1/tier_2; the overlay now uses tier_2 so tier-based plan review actually triggers.
fix(schema): talon.config.schema.json caller field renamed source_cidrs → source_ip_ranges to match what the gateway actually parses, and the gateway mode schema default corrected from shadow to enforce (the runtime default when mode is omitted).
docs: consistency fixes across config docs. Quickstart demo claimed data tier 3 (tiers are 0–2; confidential = 2); policy cookbook caller example used nonexistent api_key (now tenant_key); human_oversight examples used invalid on_demand (canonical: on-demand); the tool-class governance recipe documented a nonexistent policies.plan_review path with unimplemented fields (now shows compliance.plan_review + built-in class defaults); add-talon-to-existing-app copy-paste config was missing the required base_url for the enabled openai provider.

1.5.5 - 2026-06-01

Added

feat(evidence): signed export and offline file verification. Added talon audit export --format signed-json|signed-ndjson and talon audit verify --file <path> so operators and compliance teams can verify evidence integrity outside the running instance. This matters for GDPR/NIS2 handoffs where auditors request portable, tamper-evident artifacts. Verify quickly with talon audit export --format signed-json --output signed.json && talon audit verify --file signed.json.
feat(dashboard): persistent evidence integrity UX. Evidence rows now expose explicit integrity states (Verified, Invalid, Not checked, Unable to verify), with a persistent detail/signature block that shows signed fields and trust/spend context in one view. This makes integrity obvious to CTO/DPO users without requiring CLI-first workflows.

Docs

docs(evidence): add 5-minute tamper-proof demo and signed export runbook updates. Added docs/tutorials/evidence-integrity-demo.md, updated the 60-second demo and compliance export runbook to distinguish reduced reporting exports from signed integrity exports, and documented /v1/evidence/{id}/verify response shape in the evidence store reference.

1.5.0 - 2026-06-01

Added

feat(serve): OpenAI-compatible quickstart proxy mode. Added talon serve --proxy-quickstart for dev/local host-root compatibility (POST /v1/chat/completions, POST /v1/responses) without gateway YAML, while keeping policy, PII redaction, and evidence active.
feat(gateway): upstream auth mode support for quickstart. Added provider upstream_auth_mode (secret default, client_bearer quickstart path) with client bearer forwarding, OPENAI_API_KEY fallback, and explicit 401 when no upstream key is available.
feat(evidence): quickstart upstream auth metadata. Evidence records now include additive fields upstream_auth_mode, upstream_key_source, upstream_key_fingerprint, and gateway_annotations (backward compatible with existing records).
feat(metrics): periodic reconciliation loop and status telemetry. Added bounded/idempotent collector reconciliation (ReconcileFromStore + loop), OTel reconcile metrics, and /v1/status fields for reconcile runs/recovered events/errors.
feat(server): consolidated SSOT gate suite. Added internal/server/ssot_gate_test.go plus make test-ssot-gate and wired it into make check as an explicit release gate.
feat(events): sanitized reasons[] on operational events. /api/v1/events/recent and /api/v1/events/stream now include deterministic, deduped, length-bounded reasons[] derived from policy decision reasons, explanation reasons, and execution errors. This improves operator context without exposing raw payloads. Verify quickly with curl -s -H "X-Talon-Admin-Key: $TALON_ADMIN_KEY" "http://localhost:8080/api/v1/events/recent?limit=1" | jq '.events[0].reasons'.

Changed

change(server): dev-mode route relocation under quickstart. When --proxy-quickstart is enabled, host-root OpenAI-compatible paths are handled by the quickstart facade. Tenant agent chat is available at POST /v1/agents/chat/completions only when the operator has configured real tenant keys; in default quickstart (no tenant keys), that route is not mounted and returns 404 Not Found to preserve a strict facade-only boundary.
change(serve): quickstart no longer registers a synthetic tenant key. Quickstart mode is strictly a host-root OpenAI-compatibility facade; it will not silently unlock tenant APIs. When tenant keys are configured, the relocated tenant endpoint sits behind standard tenant-auth middleware and returns 401 Unauthorized without a valid key.
change(serve): --gateway-config exclusivity check uses explicit flag set. --proxy-quickstart is rejected alongside --gateway or any explicitly passed --gateway-config, detected via cobra.Flags().Changed rather than the default string value.
change(gateway): quickstart unsafe-listen signal threaded via config. The quickstart_unsafe_listen evidence annotation is driven by GatewayConfig.QuickstartUnsafeListen, populated from --unsafe-listen through QuickstartOptions, instead of a process environment variable.
change(events/metrics): evidence-first projection parity hardening. Operational event reason fields now prefer deterministic explanation payloads, evidence/event ordering is stabilized on timestamp DESC, id DESC, and metrics conversion is unified through evidence-driven projection paths for stronger CLI/API/dashboard parity.
change(dashboard/cli): reliability signals surfaced in routine flows. Dashboard and gateway pages now expose degraded/reliability warning chips, and talon metrics / talon events tail print preflight warnings when /v1/status reports degradation.
change(observability/events): SSOT scope contract locked. /api/v1/metrics is documented as all-activity (gateway and agent-run evidence-backed runtime), and /api/v1/events/* is documented as one event per persisted evidence row, including terminal outcomes plus evidence-backed lifecycle subset records (plan_review, graph runtime). Endpoint shapes remain backward-compatible.
change(metrics/evidence): pragmatic SSOT live-feed unification. Dashboard live metrics are now fed from evidence.Store.Store() post-commit observer notifications (all invocation types), while periodic reconciliation remains bounded/idempotent repair. Degraded evidence-write signaling is centralized in the evidence store path, and production serve wiring no longer double-emits via direct gateway metrics recorder attachment.

Fixed

fix(session): auto-migrate legacy sessions schema on startup. Session store initialization now adds missing max_cost and reasoning columns when older SQLite tables are detected, preventing run/session creation failures on upgraded installs. Verify with go test ./internal/session -run MigratesLegacySessionsTable.
fix(agent): preserve audit trail on evidence write failures. Runner paths that previously ignored evidence/step write errors now log structured failures (correlation_id, tenant_id, agent_id) so silent audit-loss conditions are observable during denied, dry-run, cached, and tool-step flows.
fix(memory): redact low-risk PII before memory governance checks. Memory observations now sanitize person/location entities before validation, allowing safe useful memories while sensitive PII still fails closed under governance policy.
fix(events): expand stream reliability telemetry. Event stream handling now increments disconnect and backlog-drop counters (in addition to gap/replay signals) and exposes them in status output for faster operator diagnosis.
fix(gateway/metrics): no metrics emission without persisted evidence. Gateway collector events are now emitted only after successful evidence persistence, preventing runtime telemetry drift when evidence writes fail.
fix(metrics): surface collector backpressure drops. Collector channel overflow drops now increment dropped_events, emit OTel counter talon.metrics.events_dropped.total, and appear in /v1/status as metrics_events_dropped.

1.4.6 - 2026-04-14

Added

feat(explanation): deterministic explanation normalization. Added canonical normalization for deterministic policy explanation tokens so equivalent outcomes converge to stable, reusable phrasing across runs and audit surfaces. This helps operators compare evidence reliably and reduces explanation drift in dashboards and tests. Verify quickly with go test ./internal/explanation/....

Fixed

fix(explanation): stage taxonomy and token collapse consistency. Aligned explanation stage taxonomy (including MCP PII semantics) and fixed edge cases where fully-collapsed tokens were not returned as a single deduplicated canonical token. This improves consistency between policy decisions and rendered explanations.
fix(gateway): canonical explanation stage propagation. Gateway explanation output now uses the canonical explanation stage instead of pipeline-stage values, preventing mismatched stage labels in downstream evidence and UI surfaces.
fix(graphadapter): preserve graph evidence identity fields. Graph adapter run evidence now retains session and model fields on graph execution paths, improving traceability for stateful graph runs and downstream audit analysis.

Docs

docs(quickstart): add verification snippet. Quickstart now includes an explicit verification snippet so operators can validate a governed setup immediately after onboarding with less ambiguity.

1.4.5 - 2026-04-12

Added

feat(graphadapter): graph runtime governance control plane. Added graph-aware governance execution with event-aware policy checks, lineage-aware evidence hooks, and integration points for LangChain/LangGraph stateful flows. Operators and framework integrators get first-class graph execution visibility while preserving existing run governance semantics. Verify quickly with tests/smoke_sections/30_graph_events.sh and go test ./tests/integration -run Graph.
feat(policy): graph governance Rego policies and tests. Added dedicated graph governance policy modules and policy tests to enforce graph-specific constraints and deny handling at runtime, including deterministic explanation rendering for governance outcomes.
docs(integration): LangChain/LangGraph integration guide and examples. Added end-to-end integration docs and runnable examples under examples/langchain-integration/ to demonstrate stateless and stateful adapter usage patterns with Talon governance.

Fixed

fix(graphadapter): tenant binding and denial propagation hardening. Tightened tenant binding checks, stabilized run-end denial handling, and improved explanation/evidence consistency under denied branches and error paths.
fix(graphadapter): concurrency and lint hardening. Addressed run-state race conditions, aligned request construction with context-aware patterns, and added regression tests for concurrent denial tracking and retry guardrails.

Test

test(graphadapter): full graph governance test pyramid. Added broad unit, handler, policy, integration, and smoke coverage for graph event execution and governance decisions, reducing regression risk for graph-enabled agent pipelines.

1.4.0 - 2026-03-31

Added

feat(agent): operational control plane. Run lifecycle state machine (QUEUED → RUNNING → COMPLETED|FAILED|TERMINATED|BLOCKED|DENIED) with structured failure taxonomy (cost_exceeded, llm_error, tool_timeout, policy_deny, operator_kill, etc.) in evidence records. New admin API surfaces: GET /v1/runs (list active), POST /v1/runs/{id}/kill (terminate), POST /v1/runs/kill-all?tenant_id=X (tenant-wide kill), POST /v1/runs/{id}/pause / resume (mid-execution pause). Operator overrides: POST /v1/overrides/{tenant_id}/lockdown (reject all new runs + kill active), dynamic tool disable (/v1/overrides/{tenant_id}/tools/disable), runtime policy tightening (/v1/overrides/{tenant_id}/policy). Pre-tool approval gates: tools listed in resource_limits.require_approval pause for human decision via POST /v1/tool-approvals/{id}/decide (5 min default timeout). Single-shot cost check catches expensive LLM calls that exceed per-request budget. Per-run tool failure escalation auto-disables tools after 3 consecutive failures. All new endpoints are admin-only (X-Talon-Admin-Key). See Operational control plane reference.
feat(agent): input prompt PII redaction. New redact_input / redact_output fields in data_classification config give granular control over when PII is redacted from prompt (before LLM) and response (before returning). The legacy redact_pii field is preserved as a shorthand that defaults both. Evidence now includes input_pii_redacted for audit. Schema, template, init merge, smoke test (section 26), and PII enrichment quality test updated.
feat(classifier): PII semantic enrichment. Optional semantic attributes on PII placeholders: PERSON → gender (from title/honorific), LOCATION → scope (city/region/country). Canonical entity model and adapter from current detector; built-in enricher; Rego policy semantic_enrichment.rego (mode off/shadow/enforce, allowed_attributes). Placeholder renderer: legacy [TYPE] or XML-style <PII type="..." id="..." .../>. Config: policies.semantic_enrichment (enabled, mode, confidence_threshold, allowed_attributes). Metrics: talon.pii.enrichment.attempts.total, talon.pii.enrichment.attributes.emitted.total, talon.pii.enrichment.fallback_unknown.total. Smoke section 26 (5+5 runs with enrichment off/enforce). Docs: PII semantic enrichment reference, policy cookbook snippet, Presidio migration note.
feat(evidence): deterministic policy explanations. Policy explanation rendering is now deterministic across evidence generation and surfaces, reducing ordering drift and making repeated runs easier to compare in audits and tests.
chore(legal): add LICENSE file. Repository now includes a root LICENSE file for explicit distribution terms.

Fixed

fix(security): governance hardening. Governance pipeline checks were tightened based on adversarial audit findings to reduce bypass risk under hostile or malformed inputs.

Changed

fix(readme): improve trust signals. Status and metadata links now render as badge images; the previous "Trust Signals" text block was removed for a more scannable project header.

Test

test(classifier): enrichment quality comparison script. Added a dedicated semantic enrichment quality comparison script to support repeatable validation of enrichment behavior.

1.3.0 - 2026-03-18

Added

feat(dashboard): Mission Control UX. Governance and Gateway dashboards unified under a shared Mission Control layout with consistent 3-band information architecture, new widgets (posture, interventions, fleet risk, drift/PII signals), session timeline and compliance report preview panels (#35).
feat(agent): intent governance tooling. New talon intent CLI (classify/classes) backed by internal/agent/intent.go infers operation class, risk, and bulk signals from tool names and JSON params to determine plan review requirements (#36).
feat(agent): tool safety gaps T7, T8, T9. T7: per-tool max_row_count and require_dry_run with Rego deny and pre-execution row count guard; T8: IdempotencyStore (SQLite) deduplicates tool calls by (agent_id, correlation_id, tool_name, argument_hash) with pending/completed lifecycle; T9: forbidden_argument_values in ToolPIIPolicy with Rego deny for specific argument values (e.g. mode=overwrite). Session governance Rego (cost, max_candidates, max_judge_calls), session store, evidence session/stage fields, tool registry schema validation (#37).
feat(agent): tool_governance idempotency config. New tool_governance policy section for per-tool idempotency: scope (request_id/session_id), cache_ttl, duplicate handling (return_cached/fail), strict_mode. Runner applies idempotency only to listed tools; keys use correlation_id or session_id; cached results stored after PII redaction. IdempotencyStore supports TTL-based expiration (#38).

Fixed

fix(agent): Idempotency cache now stores PII-scanned results and handles pending status explicitly so cached results are redacted and non-idempotent tools are not double-executed on retry (#37).

Changed

chore(build): Go bumped to 1.25.8 for stdlib vulnerability fixes (govulncheck: GO-2026-4603, GO-2026-4602, GO-2026-4601).
feat(init): Pack validation derived from pack.ValidPackIDs(), additional industry packs in wizard, dedicated langchain/generic agent templates (#36).
docs: Policy cookbook update_records hardening example; talon intent output fields (#36, #37).

1.2.0 - 2026-03-13

Added

feat(evidence): session_id in export and API. Evidence records and audit export (CSV, JSON, NDJSON) now include session_id for lifecycle session correlation. Plan-gated runs and their auto-dispatch share the same session; export and GET /v1/evidence/{id} include it when present.

Fixed

fix(smoke): Section 24 plan-dispatch: accept HTTP 202 for plan_pending (human_oversight); use section-local response file and admin key for evidence read when serve runs without gateway; relax rate limit (requests_per_minute=300) to avoid OPA deny from shared evidence DB; capture plan execute stderr and dispatch evidence session_id diagnostics on failure.

Changed

docs: Evidence store: document session_id, fix HMAC key (TALON_SIGNING_KEY), retention in agent.talon.yaml, CSV/export columns. Auth: note that serve without --gateway has no tenant keys (admin key only). Agent planning: plan stores session_id, dispatcher reuses it. Compliance export runbook and config reference (TALON_ADMIN_KEY) updated.

1.1.0 - 2026-03-09

Added

feat(cache): governed semantic cache. Optional semantic cache for LLM requests: SQLite store, BM25 embedder, PII scrubber, OPA policy (internal/cache, cache.rego). Config section cache (disabled by default), wizard and doctor support, init templates. Integration in agent runner and gateway (lookup/store, policy, evidence). Evidence: CacheHit, CacheEntryID, CacheSimilarity, CostSaved; CacheEvent for erasure. CLI: talon cache config|stats|list|erase; talon audit, talon costs, talon report show cache savings. Docs: cache vs memory, policy cookbook, config reference; smoke test section for cache.
ci: CodeQL workflow. .github/workflows/codeql.yml for Go analysis with advanced config; .github/codeql-config.yml to exclude go/weak-sensitive-data-hashing (SHA-2 used for cache key derivation, not secrets).

Fixed

fix(cache): Record actual similarity score in evidence instead of threshold; centralize cache key derivation in cache.DeriveEntryKey; gateway uses config-derived tenant ID for cache key (CodeQL taint); remove dead code and clarify cache key hashing docs.
fix(server): HEAD support for dashboard so curl -I returns 200 (health checks / smoke tests).
fix(cmd): Cache prompt (y/N) to match default n and readLine [n].
fix(lint): Resolve golangci-lint gosec and noctx (agent postBudgetAlert ctx, enforce path validation, mounts/retention nolint, gateway tests with NewRequestWithContext); gofmt gateway.go, noctx in otel chi_test and MCP tests.

Changed

ci: Coverage threshold lowered to 65%; enforce.go nolint G703 for validated path; response_pii_test noctx.
docs(gateway): Clarify cacheKeyHash is cache lookup, not password hashing (CodeQL).

1.0.0 - 2026-03-06

Added

feat(docs): self-adoption overhaul (Gates 1–5). README hero shows talon audit list with blocked tool + blocked PII; one-line mechanism and inline 60-second demo. "What it stops" replaces "Why Talon?" with four failure-first bullets (LiteLLM, CloakLLM, DIY proxy). QUICKSTART simplified to 3-path job-to-be-done (existing app / new agent / understand first). New guide Add Talon to your existing app (Gate 4, first real request). Quickstart-demo: "What you just proved", "Now wire this to your app" (Python/Node/curl), "You're done". "You're done" + next-steps table added to all guides. New comment-playbook (internal Reddit/HN templates) and Why not just a PII proxy?. Docs index updated; P8 buzzwords removed from reader-facing copy.

Changed

chore(build): make test and make test-e2e now run with -count=1 so the test cache is disabled and results are always fresh.

[0.9.5] - 2026-03-04

Added

feat(copaw): CoPaw integration. Govern CoPaw (AgentScope/Alibaba DAMO personal AI assistant) via Talon's LLM API gateway. One URL change in CoPaw (Base URL → Talon, API Key → caller key) routes all LLM traffic through Talon for PII scanning, cost limits, and audit. New init pack talon init --pack copaw, caller copaw-main / talon-gw-copaw-001, DashScope support in wizard, CoPaw dashboard tab and /v1/copaw/stats, /v1/copaw/alerts API, OTel span attributes copaw.caller and copaw.channel, MCP-to-CoPaw skill bridge (internal/copaw/bridge.go), memory governor (internal/copaw/memory_governor.go), Rego policy copaw_skills.rego and .talon.yaml copaw.skills schema. Docs: CoPaw integration guide, Docker primer, examples/copaw. Design doc: internal_docs/copaw_integration_design_doc.md.

Fixed

fix(copaw): /v1/copaw/alerts now returns "alerts": [] instead of "alerts": null when no matching evidence records are found, consistent with the no-store path and clients expecting an array.

0.9.2 - 2026-03-03

Added

feat(init): zero-config init wizard. In a terminal, talon init runs an interactive wizard: choose workload type (agent/proxy), framework pack (OpenClaw, generic, etc.), primary LLM provider, region (if applicable), data residency (EU strict / preferred / global), and compliance features (PII, audit, cost, injection, EU AI Act, DORA). Non-interactive options: talon init --scaffold for quick defaults, talon init --pack <id> for starter packs, or scripted talon init --provider openai --name my-agent with optional --data-sovereignty, --features. New list commands: --list-providers, --list-packs, --list-features. When stdin is not a TTY, init prints guidance instead of running the wizard. Pack and feature registries (internal/pack, internal/feature) drive wizard choices; post-init verification reuses talon doctor; next steps are vault-first (TALON_SECRETS_KEY then talon secrets set).

Fixed

fix(init): gosec nolint for init wizard (G705/G703/G115 false positives). Unit tests added for coverage ≥70% (packName, providerName, dataResidencyLabel, readLine, readChoice, BuildConfigs branches, marshalWithHeader, WriteConfigs, PostInitVerify, runList*).

Changed

docs: All user-facing docs updated for init wizard (README, QUICKSTART, configuration reference, first-governed-agent tutorial, persona guides, OpenClaw guides, provider-registry, ADOPTION_SCENARIOS, ROADMAP).

0.9.1 - 2026-03-02

Changed

Version bump to 0.9.1.

0.9.0 - 2026-02-27

Added

feat(community): implement PROMPT_10 launch track and quality track. Full community adoption plan build-out with a launch-first approach — 36 new files across docs, examples, schemas, deploy templates, and community governance.

Launch Track (demo-first for HN virality)

Mock OpenAI provider (examples/docker-compose/mock-provider/main.go): Standalone server with streaming + non-streaming support, realistic token counts, canned PII-triggering responses. No API key needed.
Docker Compose demo stack (examples/docker-compose/): docker compose up starts Talon + mock provider. 60-second demo from clone to evidence record.
README hero rewrite: Terminal output of talon audit list is now the first visible content. Proxy-as-hook framing, Flow 0 commands, CI/license badges. Compliance language moved below the fold.
Show HN post updated (internal_docs/show-hn.md): Reframed around "intercept all AI API calls with one URL change" narrative.
Request lifecycle doc (docs/explanation/what-talon-does-to-your-request.md): 10-step gateway pipeline breakdown, latency budget table (<15ms overhead), "What Talon Does NOT Do" section, streaming behavior, source code pointers.
Verification scripts: scripts/verify-flow0.sh (automated end-to-end Flow 0 test) and scripts/demo-recorder.sh (generates 10 varied evidence records for screenshots/GIFs).

Quality Track (examples, docs, governance)

examples/gateway-minimal/: Smallest working LLM gateway config with run.sh and README.
examples/mcp-proxy-minimal/: Smallest working MCP proxy config with tool filtering.
examples/plan-review/: Human-in-the-loop demo for EU AI Act Article 14 compliance.
examples/policies/: Starter OPA/Rego library — cost-budget, pii-block, model-allowlist, data-residency.
docs/explanation/evidence-store.md: HMAC signing, progressive disclosure, storage, export, compliance mapping.
docs/tutorials/quickstart-demo.md: Flow 0 tutorial (no API key, Docker Compose).
schemas/: JSON Schema for talon.config.yaml and agent.talon.yaml — enables editor autocomplete and CI validation.
deploy/: systemd unit file (hardened, non-root) and production docker-compose (Talon + PostgreSQL + OTel Collector).
Community files: CODE_OF_CONDUCT.md (Contributor Covenant v2.1), MAINTAINERS.md, ROADMAP.md, .github/CODEOWNERS.
Makefile targets: demo-gateway, demo-full, demo-clean, verify-flow0.
docs/README.md: Updated index with all new tutorials, explanations, examples, and policy reference.

0.8.14 - 2026-02-26

Added

feat(audit): show tool governance in talon audit show. Gateway evidence records now display a "Tool Governance (gateway)" section with Requested, Filtered, and Forwarded tool names when the request included a tools array, so operators can verify which tools were stripped by forbidden_tools before the LLM saw them.
docs(gateway): Added gateway-default-policy-tool-governance-snippet.yaml in the OpenClaw primer for pasting forbidden_tools and tool_policy_action into talon.config.yaml.

Fixed

fix(gateway): persist tool governance when any of requested/filtered/forwarded is non-empty. Previously RecordGatewayEvidence only set tool_governance when ToolsRequested had length; it now persists whenever any of the three slices is non-empty.

Test

test(gateway): TestRecordGatewayEvidence_ToolGovernanceRoundTrip ensures tool governance is stored and returned by Get() (same path as talon audit show).

0.8.13 - 2026-02-26

(No notable changes in this release.)

0.8.12 - 2026-02-26

Added

feat(gateway): attachment scanning for base64-encoded file blocks (#23). The gateway now detects base64-encoded file blocks in OpenAI (Chat Completions file/image_url + Responses API input_file) and Anthropic (document/image with source.type: "base64") requests. Text is extracted from supported formats (PDF, TXT, CSV, HTML), scanned for PII and prompt injection, and governed by a new attachment_policy with four actions: allow, warn (default — log findings, forward unchanged), strip (remove file blocks before forwarding), block (reject request with HTTP 400). Per-caller overrides via policy_overrides.attachment_policy. Images are logged for evidence but skip text-based scanning.
feat(gateway): enforce PII actions on streaming responses. handleStreamingPIIScan now buffers the SSE stream, scans the completed content, and either forwards as-is (warn), rewrites the SSE payload with redacted content (redact), or returns HTTP 451 (block). Default response_pii_action is warn.

Changed

refactor(gateway): decompose openclaw_incident_test.go by testing pyramid. The 1134-line monolith is now split into layered test files: gateway_test_helpers_test.go, response_pii_test.go, extract_test.go, forward_test.go, gateway_integration_test.go, responses_api_test.go, evidence_test.go.

Test

test(gateway): Extensive attachment scanning coverage: multi-file requests, size/type enforcement, Responses API input_file, Anthropic base64 document/image blocks, multi-turn string content tolerance, corrupt/empty/unsupported formats, warn/strip/block/allow modes, per-caller override propagation, and full gateway integration tests.
test(attachment): PDF extraction tests with buildTestPDF helper generating valid PDFs; ExtractBytesWithLimit override tests.
test(gateway): Streaming response PII tests covering warn/redact/block behaviours with real SSE format.

0.8.11 - 2026-02-26

Fixed

fix(gateway): streaming response PII scanning no longer breaks SSE clients. The v0.8.10 approach of forcing stream:false on upstream requests caused OpenClaw (and any SSE-expecting client) to hang — it received a plain JSON response but was waiting for SSE events. The gateway now buffers the full SSE stream from the upstream, extracts the completed response from the response.completed event (Responses API) or delta accumulation (Chat Completions), scans for PII, and either forwards the original buffered events (no PII) or returns a redacted response wrapped in valid SSE format. Streaming is preserved when PII action is allow.

Test

test(gateway): Replaced disableStreaming-based tests with SSE-native tests: TestGateway_ResponsesAPI_StreamingResponsePIIRedacted (redact mode with SSE), TestGateway_ResponsesAPI_StreamingNoPII (clean passthrough), TestGateway_StreamingAllowed_WhenPIIActionAllow, and TestGateway_ResponsesAPI_StreamingPIIBlock. All tests use real SSE response format.

0.8.10 - 2026-02-26

Fixed

fix(gateway): response PII scanning now works when clients send stream:true (superseded by v0.8.11 — see above). This version forced stream:false which broke SSE clients.

Test

test(gateway): Added streaming PII scanning tests (updated in v0.8.11).

0.8.9 - 2026-02-26

Fixed

fix(gateway): Refactored extractResponseContentText and redactResponseContentFields in response_pii.go to reduce cyclomatic complexity below the linter threshold (gocyclo > 15). Extracted Anthropic and Responses API parsing into dedicated helpers.
fix(gateway): redactOpenAIBody no longer injects content: null into Responses API input array items that have no content field (e.g. item_reference entries). Previously this caused 400 Unknown parameter: 'input[N].content' from OpenAI.
fix(gateway): openAIContentToText and redactOpenAIContent now recognize input_text and output_text block types in addition to text, covering all Responses API content block formats.

Test

test(gateway): Added 8 full-pipeline integration tests for the Responses API path: request PII redaction (string input, array content, input_text blocks), item_reference preservation (no content:null injection), response PII redaction and blocking, clean passthrough, and block-mode request rejection. These tests exercise the complete gateway handler including routing, store:true injection, PII scanning, evidence recording, and upstream forwarding.

0.8.8 - 2026-02-26

Fixed

fix(gateway): PII scanning and redaction now handles the OpenAI Responses API format (output[].content[].text with type: "output_text") in addition to Chat Completions (choices[].message.content) and Anthropic (content[].text). Previously, emails and other PII in Responses API output passed through unredacted.
fix(gateway): Request-path PII extraction and redaction now handles the Responses API input field (string or array of message objects), in addition to Chat Completions messages[]. All other request fields (store, previous_response_id, etc.) are preserved during redaction.

Test

test(gateway): Added Responses API test cases for response PII scanning (email, IBAN in output[].content), content extraction (single/multiple outputs, non-text outputs ignored), request extraction (input as string/array/content blocks), and request redaction (string input, array input, field preservation).

0.8.7 - 2026-02-26

Fixed

fix(gateway): Force store: true on OpenAI Responses API requests instead of only adding it when missing. OpenClaw (and other clients) may send store: false explicitly; the gateway now overwrites it so multi-turn conversations work through the proxy.

0.8.6 - 2026-02-26

Fixed

fix(gateway): Automatically inject store: true into OpenAI Responses API requests (/v1/responses) when not explicitly set. Without this, OpenAI does not persist response items, causing 404 errors on multi-turn conversations when the client (e.g. OpenClaw) references previous response IDs. Explicit store: false from the client is preserved.

Test

test(gateway): Added TestIsResponsesAPIPath and TestEnsureResponsesStore — path detection for Responses API, store injection with field preservation, explicit store override, and invalid JSON safety.

0.8.5 - 2026-02-26

Fixed

fix(gateway): Strip Accept-Encoding from headers forwarded to upstream providers. Go's http.Transport only auto-decompresses gzip responses when it manages the header itself; forwarding the client's Accept-Encoding caused raw gzip bytes to be written back to the client, producing "404 + binary garbage" in OpenClaw and other clients. Also strip stale Content-Length (invalid after PII redaction). Defensive strip added in both the gateway handler and the Forward() function.
fix(version): talon version and OTel service.version resource now use runtime/debug.ReadBuildInfo() as fallback when ldflags are not injected (e.g. go install ...@v0.8.5), so the correct module version is displayed instead of "dev" in both CLI output and trace spans.

Docs

docs(openclaw): Added troubleshooting entry for "Talon dev" version string after go install.

Test

test(gateway): Added TestForward_GzipErrorDecompressed and TestForward_GzipSuccessDecompressed — verify that gzip-compressed upstream responses (both 404 and 200) are transparently decompressed for the client, PII scanner, and token usage parser.

0.8.4 - 2026-02-25

Fixed

fix(gateway): Response PII scanner now scans only LLM-generated content fields (choices[].message.content for OpenAI, content[].text for Anthropic) instead of the entire JSON body. Prevents false positives on API envelope fields (created timestamp, token counts, id, system_fingerprint). The [NATIONAL_ID] false positive on created timestamps is eliminated.
fix(init): talon init --pack openclaw now shows TALON_SECRETS_KEY as step 1 before talon secrets set, preventing vault key mismatch errors.

Docs

docs: macOS go install linker error (unsupported tapi file type) workaround added to README, OpenClaw integration guide, and first-governed-agent tutorial.

Test

test(gateway): Comprehensive response PII false-positive prevention suite — 12 envelope-only subtests (timestamps, large tokens, fingerprints, Anthropic format, multi-choice, multimodal, empty/null content), 4 content-PII-with-envelope-preserved subtests, 9 extractResponseContentText unit tests, 5 scanResponseForPII mode tests.

0.8.2 - 2026-02-25

Added

feat(init): talon init --pack openclaw generates OpenClaw gateway starter (agent.talon.yaml + talon.config.yaml) with post-init instructions.
docs(openclaw): Integration guide — baseUrl with trailing /v1 for correct upstream paths; two-keys clarification (TALON_SECRETS_KEY vs caller api_key); troubleshooting (404, binary garbage, vault key); diagnostics script; recommended sequence (secrets then serve). Standardized caller api_key to talon-gw-openclaw-001 across examples and guides; install instructions (go install, install.gettalon.dev).

Fixed

fix(gateway): Error responses (4xx/5xx) from upstream are no longer streamed; body is read and forwarded so clients receive readable JSON instead of raw binary/gzip (fixes OpenClaw "404 + garbage" when upstream returned error with SSE content-type).

Test

test(gateway): Forward-level tests for error responses (404/500/429/400/401 with SSE or JSON) not streamed; success stream unchanged. Gateway pipeline tests: upstream 404/500 readable, 404 with SSE content-type, evidence recorded on upstream error, PII redact then upstream 404, 429 rate-limit forwarded with headers.

0.8.1 - 2026-02-25

Added

feat(governance): Tool-aware PII redaction with per-tool, per-argument policies — allow/redact/audit/block categories (Gap T1).
feat(gateway): Response-path PII scanning with redact/block/warn modes for both MCP proxy and LLM gateway (Gap F).
feat(agent): Kill switch via ActiveRunTracker.Kill() Go API (Gap D). CLI and HTTP wrappers planned for next release.
feat(agent): Circuit breaker with half-open recovery for repeated policy denials, configurable via circuit_breaker_threshold and circuit_breaker_window in .talon.yaml (Gap C).
feat(policy): Destructive operation detection in tool_access.rego — blocks delete, drop, remove patterns (Gap A).
feat(policy): Per-agent rate limit isolation in rate_limits.rego with requests_last_minute_agent policy input (Gap B).
feat(agent): Contextual volume detection in plan review — flags high-volume operations (Gap E).
feat(evidence): SanitizeForEvidence defense-in-depth — scrubs PII from evidence payloads before storage (Gap G).
feat(memory): Optional HMAC signing for memory entries (Gap H).
feat(evidence): Pre-execution pending evidence for tool calls — writes "pending" step record before tool.Execute(), updates to "completed"/"failed" after. A kill or crash never creates an unaudited action (Gap T2).
feat(mcp): tools/list filtering in MCP proxy — agents only see tools in their allowed_tools list (Gap T3).
feat(agent): Separate tool failure tracking — tool execution errors feed ToolFailureTracker with operator alerting, not the circuit breaker. Configurable via tool_failure_threshold and tool_failure_window (Gap T4).
feat(agent): Per-tool execution timeouts — reads ToolPIIPolicy.Timeout and wraps tool.Execute() with context.WithTimeout (Gap T5).
feat(agent): Tool argument validation interface — tools implementing ArgumentValidator get pre-execution validation. Full JSON Schema validation planned for Phase 2 (Gap T6).
feat(gateway): Per-caller and global rate limiting enforced via token bucket (golang.org/x/time/rate). Configured via global_requests_per_min and per_caller_requests_per_min.
fix(agent): Wire circuit breaker into Runner execution — checks before policy evaluation, records denials/successes.
fix(agent): Pass requests_last_minute_agent to OPA policy input — per-agent rate limiting now functional.
test: Comprehensive E2E governance test suite covering OpenClaw incident failure modes.

0.8.0 - 2026-02-24

Added

Memory Phase 1: Input-hash deduplication; memory.governance.dedup_window_minutes; per-run --no-memory; talon audit show without ID shows latest; retention/max_entries enforcement. See docs/MEMORY_GOVERNANCE.md.
Memory Phase 2: Consolidation pipeline (ADD/UPDATE/INVALIDATE/NOOP); temporal invalidation (preserved for audit); point-in-time AsOf (CLI talon memory as-of <RFC3339> and API GET /v1/memory/as-of). See docs/MEMORY_GOVERNANCE.md.
Memory Phase 3: Three-type memory (semantic, episodic, procedural) and relevance-scored retrieval (relevance × recency × type weight × trust); enhanced input fingerprint (prompt + attachment hashes). See docs/MEMORY_GOVERNANCE.md.

0.7.6 - 2026-02-23

Changed

CLI: When talon run is invoked without --agent, the runtime agent ID (evidence, memory, secrets) is now taken from the loaded policy file (agent.name in the YAML) instead of the CLI default "default". Explicit --agent <name> continues to override. This aligns config file and runtime identity when using the default policy.

Added

CLI: resolveRunAgentName and unit tests for default vs explicit agent name resolution; --agent flag description updated; QUICKSTART and PERSONA_GUIDES note the behavior when --agent is omitted.

0.7.5 - 2026-02-23

Added

Policy: policies.data_classification.block_on_pii — when true, runs are denied (no LLM call) when the user prompt or any attachment content contains PII; prompt and attachment text are scanned and evidence is recorded on deny. Documented in policy cookbook.

Fixed

Agent: Deterministic ordering of PIIDetected / pii_detected in evidence and logs (merged PII entity names are now sorted to avoid flaky tests and unstable serialized evidence).

0.7.2 - 2026-02-23

Fixed

CI: Dockerfile Go 1.24 to match go.mod; goreleaser skip linux/arm64 (CGO assembler incompatibility in goreleaser-cross); gitleaks allowlist for test/doc placeholders.

0.7.1 - 2026-02-23

Fixed

Release: Use goreleaser-cross for CGO cross-compilation (fix darwin/arm64 build from Linux). GoReleaser archive deprecations (format → formats).
Security: Run gitleaks CLI instead of gitleaks-action@v2 to avoid org license requirement. Dependency upgrades for govulncheck: OpenTelemetry v1.28 → v1.40 (GO-2026-4394), OPA v0.62 → v0.68 (GO-2024-3141), golang.org/x/net → v0.38 (GO-2025-3595). Go 1.22 → 1.23 for stdlib fixes.

0.7.0 - 2026-02-23

Added

Bootstrap & CLI: Cobra CLI with OpenTelemetry integration; zerolog structured logging with OTel bridge; Makefile, Dockerfile, docker-compose, CI workflows.
Policy engine: Embedded OPA with v2.0 schema; Rego policies for cost limits, rate limits, time restrictions, resource limits, tool access, secret access, memory governance, data classification; talon init and talon validate (strict mode); template-based init.
MCP proxy: Architecture and onboarding docs; proxy Rego policies (tool allowlists, rate limits, PII redaction, high-risk blocking).
PII, attachments, LLM: Regex-based PII classifier (EU patterns); attachment scanner with extraction, instruction detection, sandboxing; multi-provider LLM router (OpenAI, Anthropic, Bedrock EU, Ollama); cost estimation and tier-based routing.
Agent pipeline: Full runner (policy → classify → scan attachments → OPA → secrets → route LLM → evidence); execution plan generation and plan review gate (EU AI Act Art. 11/13); pipeline hooks (webhook delivery); MCP tool registry; talon run with --dry-run, --agent, --tenant, --attach, --policy.
Secrets & evidence: AES-256-GCM secrets vault with per-secret ACL; secret rotation and audit log; SQLite evidence store with HMAC-SHA256; progressive disclosure (list → timeline → detail); talon audit list/verify, talon secrets set/list/audit/rotate.
Cost & PII: Graceful cost degradation (fallback model when budget threshold reached); expanded EU PII patterns.
Testing: Test pyramid (unit, integration, e2e); shared internal/testutil (mock provider, policy helpers, constants); e2e CLI flows (init, run, validate, audit, costs, secrets, memory); fuzz and benchmarks; CI coverage threshold 70%.
Memory, context, triggers: Governed agent memory (Constitutional AI, allowed/forbidden categories, PII scan); shared enterprise context mounts with privacy tags; cron scheduler and webhook handler; memory CLI and search.
SMB governance: Onboarding and governance improvements for SMB use cases.
Agent planning: Bounded agentic loop; step-level evidence; loop containment policy; tests and docs.
Observability & CLI: Config show, doctor, costs/report commands; examples and docs.
HTTP API & MCP: REST API with 15+ endpoints; MCP JSON-RPC 2.0 server; MCP proxy for vendor integration; embedded dashboard (evidence, plan review, memory); per-tenant rate limits.
CI/CD & release: Golden tests for policy engine; integration full-flow and gateway stub tests; gofmt, vet, OPA policy tests, Codecov in CI; security workflow (govulncheck, gitleaks, SBOM); docs workflow (markdown link check); install script with checksum verification; GoReleaser with SBOM and Docker (GHCR); SECURITY.md; issue and PR templates.

Fixed

Policy engine post-review fixes (PR #4).
Memory: prevent data race on shared Governance OPA evaluator.

Security

AES-256-GCM encryption for secrets at rest.
HMAC-SHA256 signatures for evidence integrity.
Timing-safe API key comparison; per-agent/tenant ACL; fail-closed policy evaluation.

Compliance

ISO 27001: policy, classification, audit, secrets controls.
GDPR: controller obligations, privacy by design, processing records, security.
NIS2: risk management, incident reporting via evidence timeline.
EU AI Act: risk management, transparency, human oversight (Art. 9, 13, 14).
Data residency: tier-based EU model routing.

Unreleased​

Release Note Quality Bar​

Fixed​

[1.9.3] - 2026-07-20​

BREAKING​

Added​

Fixed​

Documentation​

Known issues​

[1.9.2] - 2026-07-20​

BREAKING — MCP proxy mode fails closed (#346)​

Fixed​

Added​

Documentation​

Known issues​

[1.9.1] - 2026-07-19​

BREAKING — unknown keys in --proxy-config files now fail closed (#332)​

Fixed​

Documentation​

Known issues​

1.9.0 - 2026-07-17​

BREAKING — pack layouts move to the agents_dir discovery convention (#308)​

Added​

Fixed​

1.8.1 - 2026-07-13​

Security​

1.8.0 - 2026-07-13​

BREAKING — organization policy split into defaults vs constraints (#287, #282, #283)​

BREAKING — agent-only identity (#266)​

BREAKING — #266 follow-ups: agent-scoped mutations, atomic identity snapshots, authoritative runtime budgets (#286, #288, #289, #290)​

Changed​

Fixed​

1.7.1 - 2026-07-07​

Added​

Fixed​

1.7.0 - 2026-07-06​

Added​

Fixed​

Added​

1.6.8 - 2026-07-04​

Added​

1.6.7 - 2026-07-03​

Added​

Fixed​

1.6.6 - 2026-06-30​

Added​

Changed​

1.6.5 - 2026-06-15​

Changed​

Added​

Fixed​

1.6.0 - 2026-06-10​

Added​

Changed​

Fixed​

1.5.5 - 2026-06-01​

Added​

Docs​

1.5.0 - 2026-06-01​

Added​

Changed​

Fixed​

1.4.6 - 2026-04-14​

Added​

Fixed​

Docs​

1.4.5 - 2026-04-12​

Added​

Fixed​

Test​

1.4.0 - 2026-03-31​

Added​

Fixed​

Changed​

Test​

1.3.0 - 2026-03-18​

Added​

Fixed​

Changed​

1.2.0 - 2026-03-13​

Unreleased

Release Note Quality Bar

Fixed

[1.9.3] - 2026-07-20

BREAKING

Added

Fixed

Documentation

Known issues

[1.9.2] - 2026-07-20

BREAKING — MCP proxy mode fails closed (#346)

Fixed

Added

Documentation

Known issues

[1.9.1] - 2026-07-19

BREAKING — unknown keys in `--proxy-config` files now fail closed (#332)

Fixed

Documentation

Known issues

1.9.0 - 2026-07-17

BREAKING — pack layouts move to the agents_dir discovery convention (#308)

Added

Fixed

1.8.1 - 2026-07-13

Security

1.8.0 - 2026-07-13

BREAKING — organization policy split into defaults vs constraints (#287, #282, #283)

BREAKING — agent-only identity (#266)

BREAKING — #266 follow-ups: agent-scoped mutations, atomic identity snapshots, authoritative runtime budgets (#286, #288, #289, #290)

Changed

Fixed

1.7.1 - 2026-07-07

Added

Fixed

1.7.0 - 2026-07-06

Added

Fixed

Added

1.6.8 - 2026-07-04

Added

1.6.7 - 2026-07-03

Added

Fixed

1.6.6 - 2026-06-30

Added

Changed

1.6.5 - 2026-06-15

Changed

Added

Fixed

1.6.0 - 2026-06-10

Added

Changed

Fixed

1.5.5 - 2026-06-01

Added

Docs

1.5.0 - 2026-06-01

Added

Changed

Fixed

1.4.6 - 2026-04-14

Added

Fixed

Docs

1.4.5 - 2026-04-12

Added

Fixed

Test

1.4.0 - 2026-03-31

Added

Fixed

Changed

Test

1.3.0 - 2026-03-18

Added

Fixed

Changed

1.2.0 - 2026-03-13