Skip to main content

Reproducible Benchmarks

Status: stable · Scope: gateway pipeline overhead, PII scan latency, evidence write throughput.

The README states that pipeline overhead is typically under 15 ms excluding upstream latency. This document defines how to reproduce the micro-benchmarks behind that claim, what each number measures, and what is intentionally out of scope.

The authoritative numbers for a given machine are whatever make benchmarks prints when you run it locally. Results vary with CPU, Go version, SQLite build, and load; do not treat a single snapshot as a SLA.

Quick start

make benchmarks

Or with a saved snapshot file:

scripts/run-benchmarks.sh -o /tmp/talon-benchmarks.md

Requirements: Go 1.22+ (project pins 1.25.x in CI), CGO enabled (SQLite), repo root checkout.

What we measure

MetricGo benchmarkPackageWhat it includes
Gateway pipeline overheadBenchmarkGatewayPipelineOverheadinternal/gatewayOne non-streaming ServeHTTP round trip: route, caller auth, request extract, PII scan, OPA policy evaluation, forward to a local httptest mock upstream, response PII scan, signed evidence write, metrics. Representative payload includes EU email + IBAN patterns.
PII scan latencyBenchmarkPIIScaninternal/classifierOne Scanner.Scan on fixed text (email, IBAN, card). Isolates classifier cost without HTTP or SQLite.
Evidence write throughputBenchmarkEvidenceStoreinternal/evidenceOne Generator.Generate (HMAC-signed SQLite insert) per iteration. Isolates evidence path without gateway HTTP.

What is excluded

  • WAN upstream RTT — the gateway benchmark uses an in-process mock server; add your provider latency separately.
  • Retry / fallback routing — not benchmarked until Epic #113 (#138 / #139) lands.
  • Streaming responses — benchmarks use non-streaming JSON completions only.
  • Attachment extraction / injection scan — not in the default payload; add fixtures if you need that dimension.

Method

  1. Toolchain: go test -bench=… -benchmem -benchtime=2s -count=5 -run=^$ over ./internal/gateway/..., ./internal/classifier/..., and ./internal/evidence/....
  2. Cache: -count=5 runs five iterations; the script reports the last ns/op line per benchmark (median-of-runs is a reasonable stability check; inspect raw output in stderr for spread).
  3. Hardware: scripts/run-benchmarks.sh records go version, uname, and CPU model in the emitted table. Paste that block when publishing numbers externally.
  4. Comparison to the 15 ms budget: See the step table in What Talon does to your request. Gateway overhead should be below 15 ms on a modern laptop/desktop when upstream is local; production adds network, disk contention, and concurrent load.

Interpreting results

  • Gateway ms/req — wall-clock per governed request with mock upstream. If this is consistently above 15 ms on your hardware, profile before citing the README claim in customer-facing material.
  • PII ms/scan — scales with input length and pattern density; the fixed benchmark string is a regression anchor, not a worst case.
  • Evidence writes/s — inverse of ns/op for BenchmarkEvidenceStore; useful for capacity planning on evidence-heavy workloads.

Source locations