Cap AI spend for a Slack/support bot in 10 minutes
Talon gives EMEA SMB operators a simple promise:
You can cap AI spend before it runs away, and prove every allow/deny decision with signed evidence.
This guide shows the fastest path to that outcome for a support bot caller.
What this guide covers
- Hard daily/monthly spend caps for one caller (
support-slack-bot) - Deny before any upstream provider call when the next request would exceed cap
- Signed evidence for both allowed and denied decisions
- Dashboard + CLI visibility for today/month, by caller/model/provider
1. Add a caller with hard EUR caps
Use a gateway config like this (from examples/gateway/talon.config.gateway.yaml):
gateway:
enabled: true
listen_prefix: "/v1/proxy"
mode: "enforce"
callers:
- name: "support-slack-bot"
tenant_key: "talon-gw-support-xyz"
tenant_id: "default"
policy_overrides:
max_daily_cost: 10.00
max_monthly_cost: 200.00
pii_action: "warn"
allowed_models: ["gpt-4o-mini"]
For a demo, temporarily lower max_daily_cost (for example 0.01) so you can trigger a denial quickly.
2. Start Talon gateway
talon serve --gateway --gateway-config=path/to/talon.config.yaml
Ensure provider keys are in vault (for example talon secrets set openai-api-key "sk-...").
3. Run the 6-step demo flow
- Configure very low
max_daily_costforsupport-slack-bot - Send one request that is allowed
- Send a second request that would exceed budget
- Verify Talon denies before provider call
- Verify signed evidence exists for the denial
- Verify dashboard budget utilization reflects the event
Example calls:
# Allowed request (first call)
curl -sS "http://localhost:8080/v1/proxy/openai/v1/chat/completions" \
-H "Authorization: Bearer talon-gw-support-xyz" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Summarize ticket #123"}]}'
# Denied request (second call, over cap)
curl -sS "http://localhost:8080/v1/proxy/openai/v1/chat/completions" \
-H "Authorization: Bearer talon-gw-support-xyz" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Summarize ticket #124"}]}'
Expected deny characteristics:
- HTTP 403
- Machine-readable error code/reason contains
budget_exceeded - No upstream provider execution for denied call
- Signed evidence row recorded with denial reason and estimated pre-call cost
4. Verify with CLI and dashboard
CLI:
talon costs --tenant default
talon costs --caller support-slack-bot --json
talon costs --by-provider --tenant default
Export cost rows (joinable to signed evidence by evidence_id):
talon costs export --tenant default --caller support-slack-bot --format csv
talon audit export --tenant default --caller support-slack-bot --format signed-json
HTTP/API equivalents:
GET /v1/costsGET /v1/costs/budgetPOST /v1/costs/export
Dashboard:
/gateway/dashboardfor real-time caller/model/provider operational view/dashboardfor evidence-backed governance view and drill-down
Cost visibility vs caps vs evidence attribution
- Cost visibility: today/month totals and breakdowns by tenant/caller/model/provider
- Hard budget caps: deny requests that would exceed daily/monthly cap before provider call
- Evidence-backed attribution: every allow/deny is signed and traceable by evidence ID, including budget-denied rows with zero provider cost and denial reason
This is the launch narrative connection to evidence integrity:
Verified evidence proves both governance decisions and cost attribution.
Native agents (no gateway)
If you run talon run directly, use .talon.yaml:
policies:
cost_limits:
per_request: 0.50
daily: 20.00
monthly: 400.00
Next steps
| I want to… | Doc |
|---|---|
| Verify signed exports and tamper checks | How to export evidence for auditors |
| Understand dashboard metrics schema | Gateway dashboard |
| Add more callers/models | Configuration and environment |
| Apply additional governance snippets | Policy cookbook |