Real-Time LLM Cost Tracking: Instrument Your AI Agents with Arc

In our previous post on LLM cost tracking, we built a system that polls the OpenAI and Anthropic Usage APIs to collect cost data. That approach works great for historical analysis, but it has limitations: you need admin API keys, and there's a delay before data appears.
What if you want to see costs the moment they happen? What if you want to track costs per user, per agent, per conversation—metadata that the Usage APIs don't provide?
The answer: instrument your LLM calls directly.
The Power of Application-Level Instrumentation
Every LLM API response includes token counts in the usage field. OpenAI returns prompt_tokens and completion_tokens. Anthropic returns input_tokens and output_tokens. This data is available immediately, on every single call.
By wrapping your LLM clients, you can:
- Track costs in real-time - See spending the moment it happens
- Add custom metadata - Tag by user, agent, session, feature, experiment
- Work with any API key - No admin keys required
- Get per-request granularity - Analyze individual calls, not hourly buckets
Let's build it.
Setting Up the Infrastructure
Same as before—Arc for storage, Grafana for visualization:
version: '3.8'
services:
arc:
image: ghcr.io/basekick-labs/arc:25.12.1
container_name: arc
ports:
- "8000:8000"
volumes:
- arc-data:/app/data
environment:
- STORAGE_BACKEND=local
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=basekick-arc-datasource
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- arc
restart: unless-stopped
volumes:
arc-data:
grafana-data:docker compose up -d
docker logs arc | grep "Initial admin API token"The Instrumentation Library
Here's a Python module that wraps OpenAI and Anthropic clients to automatically track usage:
#!/usr/bin/env python3
"""
LLM Cost Tracker - Application-level instrumentation for Arc
Wrap your LLM calls with this module to track costs in real-time.
"""
import os
import requests
import msgpack
from datetime import datetime, timezone
from typing import Optional
from openai import OpenAI
from anthropic import Anthropic
# Configuration
ARC_URL = os.environ.get("ARC_URL", "http://localhost:8000")
ARC_TOKEN = os.environ.get("ARC_TOKEN")
# Current pricing (December 2025) - per 1M tokens
PRICING = {
# OpenAI
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-2024-11-20": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-4o-mini-2024-07-18": {"input": 0.15, "output": 0.60},
"gpt-4-turbo": {"input": 10.00, "output": 30.00},
"gpt-4": {"input": 30.00, "output": 60.00},
# Anthropic
"claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
"claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
}
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
"""Calculate cost in USD for a given model and token counts."""
prices = PRICING.get(model, {"input": 0, "output": 0})
input_cost = (input_tokens / 1_000_000) * prices["input"]
output_cost = (output_tokens / 1_000_000) * prices["output"]
return round(input_cost + output_cost, 6)
def send_to_arc(record: dict) -> bool:
"""Send a single usage record to Arc."""
if not ARC_TOKEN:
return False
data = {
"m": "llm_usage",
"columns": {
"time": [record["time"]],
"provider": [record["provider"]],
"model": [record["model"]],
"input_tokens": [record["input_tokens"]],
"output_tokens": [record["output_tokens"]],
"cost_usd": [record["cost_usd"]],
"project": [record.get("project", "default")],
"agent": [record.get("agent", "default")],
"user_id": [record.get("user_id", "")],
"session_id": [record.get("session_id", "")],
}
}
try:
response = requests.post(
f"{ARC_URL}/api/v1/write/msgpack",
headers={
"Authorization": f"Bearer {ARC_TOKEN}",
"Content-Type": "application/msgpack",
"x-arc-database": "llm_costs"
},
data=msgpack.packb(data)
)
return response.status_code == 204
except Exception as e:
print(f"Error sending to Arc: {e}")
return False
class TrackedOpenAI:
"""OpenAI client wrapper that tracks usage to Arc."""
def __init__(
self,
project: str = "default",
agent: str = "default",
user_id: Optional[str] = None,
session_id: Optional[str] = None
):
self.client = OpenAI()
self.project = project
self.agent = agent
self.user_id = user_id or ""
self.session_id = session_id or ""
def chat_completion(self, **kwargs) -> dict:
"""Make a chat completion and track usage."""
response = self.client.chat.completions.create(**kwargs)
# Extract usage from response
usage = response.usage
model = response.model
record = {
"time": int(datetime.now(timezone.utc).timestamp() * 1000),
"provider": "openai",
"model": model,
"input_tokens": usage.prompt_tokens,
"output_tokens": usage.completion_tokens,
"cost_usd": calculate_cost(model, usage.prompt_tokens, usage.completion_tokens),
"project": self.project,
"agent": self.agent,
"user_id": self.user_id,
"session_id": self.session_id,
}
send_to_arc(record)
return response
def set_session(self, user_id: str, session_id: str):
"""Update user and session for subsequent calls."""
self.user_id = user_id
self.session_id = session_id
class TrackedAnthropic:
"""Anthropic client wrapper that tracks usage to Arc."""
def __init__(
self,
project: str = "default",
agent: str = "default",
user_id: Optional[str] = None,
session_id: Optional[str] = None
):
self.client = Anthropic()
self.project = project
self.agent = agent
self.user_id = user_id or ""
self.session_id = session_id or ""
def message(self, **kwargs) -> dict:
"""Make a message request and track usage."""
response = self.client.messages.create(**kwargs)
# Extract usage from response
usage = response.usage
model = kwargs.get("model", "unknown")
record = {
"time": int(datetime.now(timezone.utc).timestamp() * 1000),
"provider": "anthropic",
"model": model,
"input_tokens": usage.input_tokens,
"output_tokens": usage.output_tokens,
"cost_usd": calculate_cost(model, usage.input_tokens, usage.output_tokens),
"project": self.project,
"agent": self.agent,
"user_id": self.user_id,
"session_id": self.session_id,
}
send_to_arc(record)
return response
def set_session(self, user_id: str, session_id: str):
"""Update user and session for subsequent calls."""
self.user_id = user_id
self.session_id = session_id
# Example usage
if __name__ == "__main__":
# OpenAI example with user tracking
openai_client = TrackedOpenAI(
project="customer-support",
agent="ticket-classifier",
user_id="user_123",
session_id="session_abc"
)
response = openai_client.chat_completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(f"OpenAI response: {response.choices[0].message.content}")
# Anthropic example
anthropic_client = TrackedAnthropic(
project="customer-support",
agent="response-generator",
user_id="user_123",
session_id="session_abc"
)
response = anthropic_client.message(
model="claude-3-5-haiku-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(f"Anthropic response: {response.content[0].text}")Save this as llm_tracker.py:
export ARC_TOKEN="your-arc-token"
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
pip install requests msgpack openai anthropic
python llm_tracker.pyIntegrating Into Your Application
Replace your existing LLM client initialization:
# Before
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
# After
from llm_tracker import TrackedOpenAI
client = TrackedOpenAI(
project="my-app",
agent="summarizer",
user_id=current_user.id,
session_id=session.id
)
response = client.chat_completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)Every call is now automatically tracked with rich metadata.
New Queries: Per-User and Per-Session Analytics
With application-level instrumentation, you can answer questions the Usage APIs can't:
Cost Per User (Find Your Expensive Customers)
SELECT
user_id,
COUNT(*) as total_calls,
SUM(cost_usd) as total_cost,
SUM(input_tokens + output_tokens) as total_tokens,
ROUND(SUM(cost_usd) / COUNT(*), 4) as avg_cost_per_call
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '7 days'
AND user_id != ''
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 20;Cost Per Session (Find Runaway Conversations)
SELECT
session_id,
user_id,
agent,
COUNT(*) as calls_in_session,
SUM(cost_usd) as session_cost,
MIN(time) as session_start,
MAX(time) as session_end
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '24 hours'
AND session_id != ''
GROUP BY session_id, user_id, agent
HAVING SUM(cost_usd) > 1.00 -- Sessions costing more than $1
ORDER BY session_cost DESC
LIMIT 20;Agent Performance Comparison
SELECT
agent,
model,
COUNT(*) as calls,
AVG(input_tokens) as avg_input_tokens,
AVG(output_tokens) as avg_output_tokens,
SUM(cost_usd) as total_cost,
ROUND(SUM(cost_usd) / COUNT(*), 4) as cost_per_call
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY agent, model
ORDER BY total_cost DESC;Detect Users Exceeding Quotas
WITH user_daily_costs AS (
SELECT
user_id,
DATE_TRUNC('day', time) as day,
SUM(cost_usd) as daily_cost
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '7 days'
AND user_id != ''
GROUP BY user_id, DATE_TRUNC('day', time)
)
SELECT
user_id,
day,
daily_cost
FROM user_daily_costs
WHERE daily_cost > 10.00 -- Users spending more than $10/day
ORDER BY daily_cost DESC;Real-Time Alerting
With real-time data, you can alert on anomalies as they happen—not hours later.
Alert on High-Cost Sessions
SELECT
session_id,
user_id,
SUM(cost_usd) as session_cost
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '10 minutes'
AND session_id != ''
GROUP BY session_id, user_id
HAVING SUM(cost_usd) > 5.00 -- Alert if session exceeds $5 in 10 minutesAlert on User Spending Spike
WITH current_hour AS (
SELECT user_id, SUM(cost_usd) as cost
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY user_id
),
previous_avg AS (
SELECT user_id, AVG(hourly_cost) as avg_cost
FROM (
SELECT user_id, DATE_TRUNC('hour', time) as hour, SUM(cost_usd) as hourly_cost
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '7 days'
AND time < NOW() - INTERVAL '1 hour'
GROUP BY user_id, DATE_TRUNC('hour', time)
) hourly
GROUP BY user_id
)
SELECT
c.user_id,
c.cost as current_hour_cost,
p.avg_cost as typical_hourly_cost,
ROUND(c.cost / NULLIF(p.avg_cost, 0), 1) as multiplier
FROM current_hour c
JOIN previous_avg p ON c.user_id = p.user_id
WHERE c.cost > p.avg_cost * 5 -- Alert if 5x higher than usual
ORDER BY multiplier DESC;Grafana Dashboard Additions
Add these panels to your dashboard:
Top Users by Spend (Table)
SELECT
user_id,
SUM(cost_usd) as total_cost,
COUNT(*) as calls
FROM llm_costs.llm_usage
WHERE $__timeFilter(time)
AND user_id != ''
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 10Cost by Agent (Bar Chart)
SELECT
agent,
SUM(cost_usd) as cost
FROM llm_costs.llm_usage
WHERE $__timeFilter(time)
GROUP BY agent
ORDER BY cost DESCReal-Time Cost Rate (Stat Panel)
SELECT
SUM(cost_usd) / (EXTRACT(EPOCH FROM (MAX(time) - MIN(time))) / 3600) as cost_per_hour
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '1 hour'
Comparing Approaches
| Feature | Usage API Polling | Application Instrumentation |
|---|---|---|
| Data freshness | Hourly buckets | Real-time |
| API key required | Admin key | Regular API key |
| Custom metadata | Limited (project, model) | Unlimited (user, session, agent, etc.) |
| Per-request granularity | No | Yes |
| Setup complexity | Cron job | Code change |
| Works with streaming | Yes | Yes (with stream_options) |
Use Usage API polling when:
- You want organization-wide historical data
- You don't control the application code
- You need data for compliance/auditing
Use application instrumentation when:
- You need real-time visibility
- You want per-user/per-session tracking
- You want to alert on anomalies immediately
- You don't have admin API keys
Best of Both Worlds
You can use both approaches together. The Usage API gives you authoritative billing data; application instrumentation gives you real-time operational visibility. Store them in separate tables and correlate when needed.
Conclusion
Application-level instrumentation transforms LLM cost tracking from a monthly accounting exercise into a real-time operational capability. You can:
- See costs the moment they happen
- Track spending per user, session, and agent
- Alert on anomalies in real-time
- Optimize prompts based on actual cost data
The wrapper pattern shown here adds minimal overhead and works with any LLM provider that returns token counts in responses.
Stop waiting for invoices. Start tracking costs in real-time.
Resources:
Ready to handle billion-record workloads?
Deploy Arc in minutes. Own your data in Parquet.