The $100K LLM Bill: Tracking AI Agent Costs with Arc

Last month, I got a call from a friend running a startup. "Nacho, I need help. Our December AI bill was $127,000."
I almost dropped my coffee. "Wasn't it like $12K in November?"
"Yeah. Our CFO is losing his mind."
The culprit? Their customer support agent had a feature that let users ask follow-up questions. When it went viral on Product Hunt, each conversation spawned 15-20 API calls instead of the expected 3-4. No one noticed until the invoice arrived.
This is the story of how we built a cost monitoring system that would have caught this in hours, not weeks.
The Hidden Cost Explosion Problem
AI agents are different from traditional software. They make decisions autonomously, and those decisions have direct cost implications.
Consider what can go wrong:
Runaway loops. An agent decides it needs "more context" and keeps calling the API until it hits a token limit. I've seen a single user session burn $50 in minutes.
Verbose prompts. Someone adds "be thorough and detailed" to a system prompt. Suddenly every response is 4x longer. That's 4x the output tokens, 4x the cost.
Model upgrades without testing. "Let's try GPT-4 instead of GPT-4o-mini for better quality." Cool, that's a 66x price increase on output tokens.
Retry storms. Rate limits trigger retries. Each retry costs money. Under load, you can spend more on failed requests than successful ones.
Traditional APM tools track latency, errors, and throughput. They don't track token economics. You find out about cost problems when the invoice arrives—usually 30 days too late.
Why LLM Telemetry is Time-Series Data
Every LLM API call generates the same type of data:
- Timestamp: When did this call happen?
- Tokens: How many input/output tokens?
- Cost: What did this specific call cost?
- Model: Which model was used?
- Metadata: Which agent, user, session?
This is textbook time-series data. High volume, append-only, time-ordered. And it has high cardinality—thousands of agent sessions across multiple models, users, and API keys.
Arc handles exactly this workload. Let's build the monitoring system.
Setting Up the Infrastructure
We'll use Arc for storage and Grafana for visualization. Here's the Docker Compose setup:
version: '3.8'
services:
arc:
image: ghcr.io/basekick-labs/arc:25.12.1
container_name: arc
ports:
- "8000:8000"
volumes:
- arc-data:/app/data
environment:
- STORAGE_BACKEND=local
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=basekick-arc-datasource
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- arc
restart: unless-stopped
volumes:
arc-data:
grafana-data:Start it up:
docker compose up -dGrab your Arc admin token from the logs:
docker logs arc | grep "Initial admin API token"Save this token—you'll need it for API calls and Grafana configuration.
The Data Collection Script
We collect two types of data from OpenAI and store them in separate tables:
- Usage API →
llm_usagetable: hourly token counts per model - Costs API →
llm_coststable: daily costs per line item
This keeps the raw data intact—no aggregation in the collector. All analysis happens in SQL at query time.
Important: You need an Admin API key with the api.usage.read scope. Create one at platform.openai.com/settings/organization/admin-keys.
Here's the Python script:
#!/usr/bin/env python3
"""
LLM Cost Collector for Arc - Polls OpenAI Usage + Costs APIs and stores data in Arc.
Run hourly via cron: 0 * * * * /path/to/llm_cost_collector.py
Stores two separate tables:
- llm_usage: hourly token counts from Usage API
- llm_costs: daily costs from Costs API
"""
import os
import requests
import msgpack
from datetime import datetime, timezone, timedelta
ARC_URL = os.environ.get("ARC_URL", "http://localhost:8000")
ARC_TOKEN = os.environ.get("ARC_TOKEN")
OPENAI_ADMIN_KEY = os.environ.get("OPENAI_ADMIN_KEY")
def fetch_openai_usage(start_time: datetime, end_time: datetime) -> list:
"""Fetch token usage from OpenAI Usage API."""
if not OPENAI_ADMIN_KEY:
return []
response = requests.get(
"https://api.openai.com/v1/organization/usage/completions",
headers={"Authorization": f"Bearer {OPENAI_ADMIN_KEY}"},
params={
"start_time": int(start_time.timestamp()),
"end_time": int(end_time.timestamp()),
"bucket_width": "1h",
"group_by": ["model"]
}
)
response.raise_for_status()
data = response.json()
records = []
for bucket in data.get("data", []):
ts = bucket["start_time"] * 1000
for result in bucket.get("results", []):
records.append({
"time": ts,
"provider": "openai",
"model": result.get("model") or "unknown",
"input_tokens": result.get("input_tokens", 0),
"output_tokens": result.get("output_tokens", 0),
"num_requests": result.get("num_model_requests", 0),
})
return records
def fetch_openai_costs(start_time: datetime, end_time: datetime) -> list:
"""Fetch costs from OpenAI Costs API."""
if not OPENAI_ADMIN_KEY:
return []
day_start = start_time.replace(hour=0, minute=0, second=0, microsecond=0)
day_end = (end_time + timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
response = requests.get(
"https://api.openai.com/v1/organization/costs",
headers={"Authorization": f"Bearer {OPENAI_ADMIN_KEY}"},
params={
"start_time": int(day_start.timestamp()),
"end_time": int(day_end.timestamp()),
"bucket_width": "1d",
"group_by": ["line_item"]
}
)
response.raise_for_status()
data = response.json()
records = []
for bucket in data.get("data", []):
ts = bucket["start_time"] * 1000
for result in bucket.get("results", []):
line_item = result.get("line_item") or "unknown"
amount = float(result.get("amount", {}).get("value", 0.0))
if amount > 0:
records.append({
"time": ts,
"provider": "openai",
"line_item": line_item,
"cost_usd": amount,
})
return records
def send_usage_to_arc(records: list):
"""Send usage records to Arc."""
if not records:
return
data = {
"m": "llm_usage",
"columns": {
"time": [r["time"] for r in records],
"provider": [r["provider"] for r in records],
"model": [r["model"] for r in records],
"input_tokens": [r["input_tokens"] for r in records],
"output_tokens": [r["output_tokens"] for r in records],
"num_requests": [r["num_requests"] for r in records],
}
}
response = requests.post(
f"{ARC_URL}/api/v1/write/msgpack",
headers={
"Authorization": f"Bearer {ARC_TOKEN}",
"Content-Type": "application/msgpack",
"x-arc-database": "llm_costs"
},
data=msgpack.packb(data)
)
print(f"Sent {len(records)} usage records to Arc (status: {response.status_code})")
def send_costs_to_arc(records: list):
"""Send cost records to Arc."""
if not records:
return
data = {
"m": "llm_costs",
"columns": {
"time": [r["time"] for r in records],
"provider": [r["provider"] for r in records],
"line_item": [r["line_item"] for r in records],
"cost_usd": [r["cost_usd"] for r in records],
}
}
response = requests.post(
f"{ARC_URL}/api/v1/write/msgpack",
headers={
"Authorization": f"Bearer {ARC_TOKEN}",
"Content-Type": "application/msgpack",
"x-arc-database": "llm_costs"
},
data=msgpack.packb(data)
)
print(f"Sent {len(records)} cost records to Arc (status: {response.status_code})")
def main():
end_time = datetime.now(timezone.utc)
start_time = end_time - timedelta(hours=24)
print(f"Fetching data from {start_time} to {end_time}")
# Fetch and store usage (tokens)
usage_records = fetch_openai_usage(start_time, end_time)
print(f"Found {len(usage_records)} usage records")
send_usage_to_arc(usage_records)
# Fetch and store costs (USD)
cost_records = fetch_openai_costs(start_time, end_time)
print(f"Found {len(cost_records)} cost records")
send_costs_to_arc(cost_records)
if __name__ == "__main__":
main()Save this as llm_cost_collector.py and set up your environment:
export ARC_TOKEN="your-arc-token"
export OPENAI_ADMIN_KEY="your-openai-admin-key" # Admin key, not regular API key
pip install requests msgpack
python llm_cost_collector.pySet up a cron job to run hourly:
# Edit crontab
crontab -e
# Add this line (adjust paths as needed)
0 * * * * /usr/bin/python3 /path/to/llm_cost_collector.py >> /var/log/llm_costs.log 2>&1Cost Tracking Queries
Now for the fun part—SQL queries that give you instant visibility into your LLM spending.
We have two tables:
llm_usage: hourly token counts (from Usage API)llm_costs: daily costs in USD (from Costs API)
Note: Since the collector may run multiple times and re-fetch overlapping data, we use DISTINCT to deduplicate records. Let Grafana handle aggregation (Sum, Avg, etc.) in the panel settings.
Daily Burn Rate
How much are we spending per day?
SELECT time, line_item, cost_usd
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE time > NOW() - INTERVAL '30 days'
ORDER BY time DESC;Configure the Grafana panel to use Sum transformation to aggregate by day.
Token Usage by Model
Which models are consuming the most tokens?
SELECT time, model, input_tokens, output_tokens, num_requests
FROM (SELECT DISTINCT time, model, input_tokens, output_tokens, num_requests FROM llm_costs.llm_usage) d
WHERE time > NOW() - INTERVAL '7 days'
ORDER BY time DESC;Cost by Line Item
OpenAI breaks down costs by model and direction (input/output):
SELECT time, line_item, cost_usd
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE time > NOW() - INTERVAL '7 days'
ORDER BY cost_usd DESC;Week-over-Week Cost Growth
Is spending accelerating?
SELECT
week,
weekly_cost,
LAG(weekly_cost) OVER (ORDER BY week) as prev_week_cost,
ROUND(
(weekly_cost - LAG(weekly_cost) OVER (ORDER BY week)) /
NULLIF(LAG(weekly_cost) OVER (ORDER BY week), 0) * 100,
1
) as growth_percent
FROM (
SELECT
DATE_TRUNC('week', time) as week,
SUM(cost_usd) as weekly_cost
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE time > NOW() - INTERVAL '8 weeks'
GROUP BY DATE_TRUNC('week', time)
) weekly_data
ORDER BY week DESC;Hourly Token Spikes (Anomaly Detection)
Find hours where token usage spiked abnormally:
SELECT
hour,
hourly_tokens,
avg_tokens,
ROUND((hourly_tokens - avg_tokens) / NULLIF(stddev_tokens, 0), 2) as z_score
FROM (
SELECT
DATE_TRUNC('hour', time) as hour,
SUM(input_tokens + output_tokens) as hourly_tokens,
AVG(SUM(input_tokens + output_tokens)) OVER () as avg_tokens,
STDDEV(SUM(input_tokens + output_tokens)) OVER () as stddev_tokens
FROM (SELECT DISTINCT time, model, input_tokens, output_tokens FROM llm_costs.llm_usage) d
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY DATE_TRUNC('hour', time)
) hourly_data
WHERE (hourly_tokens - avg_tokens) / NULLIF(stddev_tokens, 0) > 2
ORDER BY hour DESC;A z-score above 2 means the hourly token usage was more than 2 standard deviations above average—worth investigating.
Grafana Dashboard
Let's visualize this data. First, install the Arc datasource plugin:
# Download and install
wget https://github.com/basekick-labs/grafana-arc-datasource/releases/latest/download/basekick-arc-datasource-1.0.0.zip
unzip basekick-arc-datasource-1.0.0.zip
docker cp basekick-arc-datasource grafana:/var/lib/grafana/plugins/
docker restart grafanaConfigure the Arc datasource in Grafana:
- Go to Configuration → Data Sources → Add data source
- Select Arc
- Set URL to
http://arc:8000(container name) - Enter your API token
- Set database to
llm_costs
Dashboard Panels
Here's what your dashboard should include:
1. Total Spend (Last 7 Days) - Stat Panel
SELECT cost_usd
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE time > NOW() - INTERVAL '7 days'Set the stat panel calculation to Total to sum all values.
2. Daily Cost Trend - Time Series
SELECT time, cost_usd
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE $__timeFilter(time)
ORDER BY time ASC
3. Cost by Line Item - Bar Chart
SELECT line_item, cost_usd
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE $__timeFilter(time)
ORDER BY cost_usd DESC4. Token Usage by Model - Bar Chart
SELECT model, input_tokens, output_tokens
FROM (SELECT DISTINCT time, model, input_tokens, output_tokens FROM llm_costs.llm_usage) d
WHERE $__timeFilter(time)
5. Hourly Token Trend - Time Series
SELECT time, input_tokens, output_tokens
FROM (SELECT DISTINCT time, model, input_tokens, output_tokens FROM llm_costs.llm_usage) d
WHERE $__timeFilter(time)
ORDER BY time ASC
Alerting on Cost Spikes
Here's the query that would have saved my friend $115,000. Set up a Grafana alert that fires when daily spending exceeds a threshold.
Create the Alert Rule
- Go to Alerting → Alert rules → New alert rule
- Set the query:
SELECT cost_usd
FROM (SELECT DISTINCT time, line_item, cost_usd FROM llm_costs.llm_costs) d
WHERE time > NOW() - INTERVAL '1 day'- Set the alert condition to fire when Total is above 500
- Configure notification channel (Slack, PagerDuty, email)

The Alert That Saves Money
With this alert in place, here's what would have happened to my friend's startup:
- Day 1 of viral spike: Daily cost hits $800 by midday (projected $1,600+). Alert fires.
- Within an hour: Team investigates, finds the follow-up question loop.
- Same day: Hotfix deployed—rate limit on follow-ups.
- Total damage: ~$2,000 instead of $115,000.
That's a 98% cost reduction from a single alert.
Taking It Further
Once you have the basic monitoring in place, consider these enhancements:
Per-user cost tracking. Add user_id to your data model. Find out which users (or API consumers) are driving costs. Implement usage limits.
Cost allocation. Tag API calls with team/department. Generate monthly chargebacks. Make cost visible to teams that create it.
Prompt optimization alerts. Track average tokens per request by endpoint. Alert when a prompt starts using significantly more tokens—someone might have added "be comprehensive" to a system prompt.
Budget forecasting. Use your historical data to project month-end spend. Alert at 50%, 75%, 90% of budget.
Conclusion
AI costs are the new infrastructure costs. They can scale from $12K to $127K in a month without anyone noticing—unless you're watching.
Arc makes this easy:
- High-throughput ingestion for real-time cost data
- SQL queries for instant cost analysis
- Grafana integration for dashboards and alerts
- Portable Parquet storage so your cost data isn't locked in
The script we built today takes 30 minutes to set up. It could save you six figures.
Don't wait for the invoice.
Resources:
Ready to handle billion-record workloads?
Deploy Arc in minutes. Own your data in Parquet.