Real-Time LLM Cost Tracking: Instrument Your AI Agents with Arc

In our previous post on LLM cost tracking, we built a system that polls the OpenAI and Anthropic Usage APIs to collect cost data. That approach works great for historical analysis, but it has limitations: you need admin API keys, and there's a delay before data appears.

What if you want to see costs the moment they happen? What if you want to track costs per user, per agent, per conversation—metadata that the Usage APIs don't provide?

The answer: instrument your LLM calls directly.

The Power of Application-Level Instrumentation

Every LLM API response includes token counts in the usage field. OpenAI returns prompt_tokens and completion_tokens. Anthropic returns input_tokens and output_tokens. This data is available immediately, on every single call.

By wrapping your LLM clients, you can:

Track costs in real-time - See spending the moment it happens
Add custom metadata - Tag by user, agent, session, feature, experiment
Work with any API key - No admin keys required
Get per-request granularity - Analyze individual calls, not hourly buckets

Let's build it.

Setting Up the Infrastructure

Same as before—Arc for storage, Grafana for visualization:

version: '3.8'
 
services:
  arc:
    image: ghcr.io/basekick-labs/arc:25.12.1
    container_name: arc
    ports:
      - "8000:8000"
    volumes:
      - arc-data:/app/data
    environment:
      - STORAGE_BACKEND=local
    restart: unless-stopped
 
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=basekick-arc-datasource
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - arc
    restart: unless-stopped
 
volumes:
  arc-data:
  grafana-data:

docker compose up -d
docker logs arc | grep "Initial admin API token"

The Instrumentation Library

Here's a Python module that wraps OpenAI and Anthropic clients to automatically track usage:

#!/usr/bin/env python3
"""
LLM Cost Tracker - Application-level instrumentation for Arc
Wrap your LLM calls with this module to track costs in real-time.
"""
 
import os
import requests
import msgpack
from datetime import datetime, timezone
from typing import Optional
from openai import OpenAI
from anthropic import Anthropic
 
# Configuration
ARC_URL = os.environ.get("ARC_URL", "http://localhost:8000")
ARC_TOKEN = os.environ.get("ARC_TOKEN")
 
# Current pricing (December 2025) - per 1M tokens
PRICING = {
    # OpenAI
    "gpt-4o": {"input": 2.50, "output": 10.00},
    "gpt-4o-2024-11-20": {"input": 2.50, "output": 10.00},
    "gpt-4o-mini": {"input": 0.15, "output": 0.60},
    "gpt-4o-mini-2024-07-18": {"input": 0.15, "output": 0.60},
    "gpt-4-turbo": {"input": 10.00, "output": 30.00},
    "gpt-4": {"input": 30.00, "output": 60.00},
    # Anthropic
    "claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
    "claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
    "claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
}
 
 
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
    """Calculate cost in USD for a given model and token counts."""
    prices = PRICING.get(model, {"input": 0, "output": 0})
    input_cost = (input_tokens / 1_000_000) * prices["input"]
    output_cost = (output_tokens / 1_000_000) * prices["output"]
    return round(input_cost + output_cost, 6)
 
 
def send_to_arc(record: dict) -> bool:
    """Send a single usage record to Arc."""
    if not ARC_TOKEN:
        return False
 
    data = {
        "m": "llm_usage",
        "columns": {
            "time": [record["time"]],
            "provider": [record["provider"]],
            "model": [record["model"]],
            "input_tokens": [record["input_tokens"]],
            "output_tokens": [record["output_tokens"]],
            "cost_usd": [record["cost_usd"]],
            "project": [record.get("project", "default")],
            "agent": [record.get("agent", "default")],
            "user_id": [record.get("user_id", "")],
            "session_id": [record.get("session_id", "")],
        }
    }
 
    try:
        response = requests.post(
            f"{ARC_URL}/api/v1/write/msgpack",
            headers={
                "Authorization": f"Bearer {ARC_TOKEN}",
                "Content-Type": "application/msgpack",
                "x-arc-database": "llm_costs"
            },
            data=msgpack.packb(data)
        )
        return response.status_code == 204
    except Exception as e:
        print(f"Error sending to Arc: {e}")
        return False
 
 
class TrackedOpenAI:
    """OpenAI client wrapper that tracks usage to Arc."""
 
    def __init__(
        self,
        project: str = "default",
        agent: str = "default",
        user_id: Optional[str] = None,
        session_id: Optional[str] = None
    ):
        self.client = OpenAI()
        self.project = project
        self.agent = agent
        self.user_id = user_id or ""
        self.session_id = session_id or ""
 
    def chat_completion(self, **kwargs) -> dict:
        """Make a chat completion and track usage."""
        response = self.client.chat.completions.create(**kwargs)
 
        # Extract usage from response
        usage = response.usage
        model = response.model
 
        record = {
            "time": int(datetime.now(timezone.utc).timestamp() * 1000),
            "provider": "openai",
            "model": model,
            "input_tokens": usage.prompt_tokens,
            "output_tokens": usage.completion_tokens,
            "cost_usd": calculate_cost(model, usage.prompt_tokens, usage.completion_tokens),
            "project": self.project,
            "agent": self.agent,
            "user_id": self.user_id,
            "session_id": self.session_id,
        }
 
        send_to_arc(record)
        return response
 
    def set_session(self, user_id: str, session_id: str):
        """Update user and session for subsequent calls."""
        self.user_id = user_id
        self.session_id = session_id
 
 
class TrackedAnthropic:
    """Anthropic client wrapper that tracks usage to Arc."""
 
    def __init__(
        self,
        project: str = "default",
        agent: str = "default",
        user_id: Optional[str] = None,
        session_id: Optional[str] = None
    ):
        self.client = Anthropic()
        self.project = project
        self.agent = agent
        self.user_id = user_id or ""
        self.session_id = session_id or ""
 
    def message(self, **kwargs) -> dict:
        """Make a message request and track usage."""
        response = self.client.messages.create(**kwargs)
 
        # Extract usage from response
        usage = response.usage
        model = kwargs.get("model", "unknown")
 
        record = {
            "time": int(datetime.now(timezone.utc).timestamp() * 1000),
            "provider": "anthropic",
            "model": model,
            "input_tokens": usage.input_tokens,
            "output_tokens": usage.output_tokens,
            "cost_usd": calculate_cost(model, usage.input_tokens, usage.output_tokens),
            "project": self.project,
            "agent": self.agent,
            "user_id": self.user_id,
            "session_id": self.session_id,
        }
 
        send_to_arc(record)
        return response
 
    def set_session(self, user_id: str, session_id: str):
        """Update user and session for subsequent calls."""
        self.user_id = user_id
        self.session_id = session_id
 
 
# Example usage
if __name__ == "__main__":
    # OpenAI example with user tracking
    openai_client = TrackedOpenAI(
        project="customer-support",
        agent="ticket-classifier",
        user_id="user_123",
        session_id="session_abc"
    )
    response = openai_client.chat_completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello, how are you?"}]
    )
    print(f"OpenAI response: {response.choices[0].message.content}")
 
    # Anthropic example
    anthropic_client = TrackedAnthropic(
        project="customer-support",
        agent="response-generator",
        user_id="user_123",
        session_id="session_abc"
    )
    response = anthropic_client.message(
        model="claude-3-5-haiku-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello, how are you?"}]
    )
    print(f"Anthropic response: {response.content[0].text}")

Save this as llm_tracker.py:

export ARC_TOKEN="your-arc-token"
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
 
pip install requests msgpack openai anthropic
python llm_tracker.py

Integrating Into Your Application

Replace your existing LLM client initialization:

# Before
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)
 
# After
from llm_tracker import TrackedOpenAI
client = TrackedOpenAI(
    project="my-app",
    agent="summarizer",
    user_id=current_user.id,
    session_id=session.id
)
response = client.chat_completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

Every call is now automatically tracked with rich metadata.

New Queries: Per-User and Per-Session Analytics

With application-level instrumentation, you can answer questions the Usage APIs can't:

Cost Per User (Find Your Expensive Customers)

SELECT
    user_id,
    COUNT(*) as total_calls,
    SUM(cost_usd) as total_cost,
    SUM(input_tokens + output_tokens) as total_tokens,
    ROUND(SUM(cost_usd) / COUNT(*), 4) as avg_cost_per_call
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '7 days'
  AND user_id != ''
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 20;

Cost Per Session (Find Runaway Conversations)

SELECT
    session_id,
    user_id,
    agent,
    COUNT(*) as calls_in_session,
    SUM(cost_usd) as session_cost,
    MIN(time) as session_start,
    MAX(time) as session_end
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '24 hours'
  AND session_id != ''
GROUP BY session_id, user_id, agent
HAVING SUM(cost_usd) > 1.00  -- Sessions costing more than $1
ORDER BY session_cost DESC
LIMIT 20;

Agent Performance Comparison

SELECT
    agent,
    model,
    COUNT(*) as calls,
    AVG(input_tokens) as avg_input_tokens,
    AVG(output_tokens) as avg_output_tokens,
    SUM(cost_usd) as total_cost,
    ROUND(SUM(cost_usd) / COUNT(*), 4) as cost_per_call
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY agent, model
ORDER BY total_cost DESC;

Detect Users Exceeding Quotas

WITH user_daily_costs AS (
    SELECT
        user_id,
        DATE_TRUNC('day', time) as day,
        SUM(cost_usd) as daily_cost
    FROM llm_costs.llm_usage
    WHERE time > NOW() - INTERVAL '7 days'
      AND user_id != ''
    GROUP BY user_id, DATE_TRUNC('day', time)
)
SELECT
    user_id,
    day,
    daily_cost
FROM user_daily_costs
WHERE daily_cost > 10.00  -- Users spending more than $10/day
ORDER BY daily_cost DESC;

Real-Time Alerting

With real-time data, you can alert on anomalies as they happen—not hours later.

Alert on High-Cost Sessions

SELECT
    session_id,
    user_id,
    SUM(cost_usd) as session_cost
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '10 minutes'
  AND session_id != ''
GROUP BY session_id, user_id
HAVING SUM(cost_usd) > 5.00  -- Alert if session exceeds $5 in 10 minutes

Alert on User Spending Spike

WITH current_hour AS (
    SELECT user_id, SUM(cost_usd) as cost
    FROM llm_costs.llm_usage
    WHERE time > NOW() - INTERVAL '1 hour'
    GROUP BY user_id
),
previous_avg AS (
    SELECT user_id, AVG(hourly_cost) as avg_cost
    FROM (
        SELECT user_id, DATE_TRUNC('hour', time) as hour, SUM(cost_usd) as hourly_cost
        FROM llm_costs.llm_usage
        WHERE time > NOW() - INTERVAL '7 days'
          AND time < NOW() - INTERVAL '1 hour'
        GROUP BY user_id, DATE_TRUNC('hour', time)
    ) hourly
    GROUP BY user_id
)
SELECT
    c.user_id,
    c.cost as current_hour_cost,
    p.avg_cost as typical_hourly_cost,
    ROUND(c.cost / NULLIF(p.avg_cost, 0), 1) as multiplier
FROM current_hour c
JOIN previous_avg p ON c.user_id = p.user_id
WHERE c.cost > p.avg_cost * 5  -- Alert if 5x higher than usual
ORDER BY multiplier DESC;

Grafana Dashboard Additions

Add these panels to your dashboard:

Top Users by Spend (Table)

SELECT
    user_id,
    SUM(cost_usd) as total_cost,
    COUNT(*) as calls
FROM llm_costs.llm_usage
WHERE $__timeFilter(time)
  AND user_id != ''
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 10

Cost by Agent (Bar Chart)

SELECT
    agent,
    SUM(cost_usd) as cost
FROM llm_costs.llm_usage
WHERE $__timeFilter(time)
GROUP BY agent
ORDER BY cost DESC

Real-Time Cost Rate (Stat Panel)

SELECT
    SUM(cost_usd) / (EXTRACT(EPOCH FROM (MAX(time) - MIN(time))) / 3600) as cost_per_hour
FROM llm_costs.llm_usage
WHERE time > NOW() - INTERVAL '1 hour'

Real-time LLM cost dashboard with user and agent breakdown

Comparing Approaches

Feature	Usage API Polling	Application Instrumentation
Data freshness	Hourly buckets	Real-time
API key required	Admin key	Regular API key
Custom metadata	Limited (project, model)	Unlimited (user, session, agent, etc.)
Per-request granularity	No	Yes
Setup complexity	Cron job	Code change
Works with streaming	Yes	Yes (with `stream_options`)

Use Usage API polling when:

You want organization-wide historical data
You don't control the application code
You need data for compliance/auditing

Use application instrumentation when:

You need real-time visibility
You want per-user/per-session tracking
You want to alert on anomalies immediately
You don't have admin API keys

Best of Both Worlds

You can use both approaches together. The Usage API gives you authoritative billing data; application instrumentation gives you real-time operational visibility. Store them in separate tables and correlate when needed.

Conclusion

Application-level instrumentation transforms LLM cost tracking from a monthly accounting exercise into a real-time operational capability. You can:

See costs the moment they happen
Track spending per user, session, and agent
Alert on anomalies in real-time
Optimize prompts based on actual cost data

The wrapper pattern shown here adds minimal overhead and works with any LLM provider that returns token counts in responses.

Stop waiting for invoices. Start tracking costs in real-time.

Resources:

Questions? Reach out on Twitter or LinkedIn.

Time-Series Database

Streaming