Getting Started with the Python SDK for Arc

Cover image for Getting Started with the Python SDK for Arc

Photo by Brecht Corbeel on Unsplash

We've been getting requests for a Python SDK since day one. "Love Arc, but I don't want to craft HTTP requests manually." Fair enough.

With the release of Arc 26.01.1, the official Python SDK is now available on PyPI as arc-tsdb-client. It gives you high-performance MessagePack ingestion, query responses directly into pandas or polars DataFrames, buffered writes with automatic batching, and the full management API for retention policies, tokens, and continuous queries.

Let's walk through it.

Installation

The SDK is available via pip or uv. Install the base package or add optional dependencies for DataFrame support:

# Base installation
pip install arc-tsdb-client
 
# With pandas support
pip install arc-tsdb-client[pandas]
 
# With polars support
pip install arc-tsdb-client[polars]
 
# Everything (pandas, polars, PyArrow)
pip install arc-tsdb-client[all]

If you're using uv:

uv add arc-tsdb-client[all]

Quick Start

First, create a client and connect to your Arc instance:

from arc_client import ArcClient
 
client = ArcClient(
    host="localhost",
    port=8000,
    token="your-arc-token",
    database="default"
)
 
# Verify the connection
info = client.auth.verify()
print(f"Connected to Arc. Token: {info.token_info.name}")

That's it. You're connected.

Writing Data

Arc supports multiple ingestion formats. The SDK makes all of them straightforward.

This is the fastest way to write data—18M+ records per second with automatic gzip compression. Use write_columnar() for bulk ingestion:

client.write.write_columnar(
    measurement="temperature",
    columns={
        "time": [1705660800000000, 1705660801000000, 1705660802000000],
        "device": ["sensor_01", "sensor_02", "sensor_01"],
        "location": ["warehouse_a", "warehouse_a", "warehouse_b"],
        "value": [22.5, 23.1, 21.8]
    },
    database="iot"
)

The columns dictionary maps column names to lists of values. All lists must have the same length. Timestamps are in microseconds since epoch.

InfluxDB Line Protocol

If you're migrating from InfluxDB or want compatibility with existing tools, use Line Protocol:

# Single line
client.write.write_line_protocol(
    "temperature,device=sensor_01,location=warehouse_a value=22.5 1705660800000000000",
    database="iot"
)
 
# Multiple lines
lines = """
temperature,device=sensor_01,location=warehouse_a value=22.5 1705660800000000000
temperature,device=sensor_02,location=warehouse_a value=23.1 1705660801000000000
temperature,device=sensor_01,location=warehouse_b value=21.8 1705660802000000000
"""
client.write.write_line_protocol(lines, database="iot")

Note: Line Protocol timestamps are in nanoseconds, while MessagePack uses microseconds.

Writing DataFrames

If you already have data in pandas or polars, write it directly:

import pandas as pd
 
df = pd.DataFrame({
    "time": pd.to_datetime(["2026-01-19 10:00:00", "2026-01-19 10:00:01"]),
    "device": ["sensor_01", "sensor_02"],
    "value": [22.5, 23.1]
})
 
client.write.write_dataframe(
    df,
    measurement="temperature",
    time_column="time",
    tag_columns=["device"],
    database="iot"
)

The SDK handles the conversion to MessagePack format automatically.

Buffered Writes

For high-throughput scenarios where you're writing records one at a time (like processing a stream), use buffered writes. The buffer batches records and flushes automatically:

with client.write.buffered(batch_size=10000, flush_interval=5.0) as buffer:
    for record in incoming_records:
        buffer.write(
            measurement="events",
            tags={"source": record.source, "type": record.event_type},
            fields={"value": record.value, "count": record.count},
            timestamp=record.timestamp
        )
# Buffer automatically flushes on exit

This is useful when you're processing events in a loop and don't want to make an HTTP request for every single record.

Querying Data

Now let's read data back. The SDK supports multiple response formats depending on your use case.

JSON Response

The simplest option—returns a dictionary with columns and data:

result = client.query.query(
    "SELECT * FROM temperature WHERE time > NOW() - INTERVAL '1 hour' LIMIT 10",
    database="iot"
)
 
print(result["columns"])  # ['time', 'device', 'location', 'value']
print(result["data"])     # List of rows

PyArrow Table

For zero-copy data interchange with other Arrow-based tools:

table = client.query.query_arrow(
    "SELECT * FROM temperature WHERE device = 'sensor_01' ORDER BY time DESC LIMIT 1000",
    database="iot"
)
 
print(table.num_rows)
print(table.column_names)

Arrow IPC delivers around 5.2M rows/sec—use this when performance matters.

pandas DataFrame

Query directly into a DataFrame for analysis:

df = client.query.query_pandas(
    """
    SELECT
        time_bucket(INTERVAL '1 hour', time) as hour,
        device,
        AVG(value) as avg_temp,
        MAX(value) as max_temp
    FROM temperature
    WHERE time > NOW() - INTERVAL '24 hours'
    GROUP BY hour, device
    ORDER BY hour DESC
    """,
    database="iot"
)
 
print(df.head())

polars DataFrame

Same thing, but with polars:

pl_df = client.query.query_polars(
    "SELECT * FROM temperature WHERE location = 'warehouse_a' LIMIT 1000",
    database="iot"
)
 
print(pl_df.head())

List Measurements

See what tables exist in a database:

measurements = client.query.list_measurements(database="iot")
 
for m in measurements:
    print(f"{m.measurement}: {m.file_count} files, {m.total_size_mb:.1f} MB")

Complete Example

Here's a full script that writes sensor data and queries it back:

from arc_client import ArcClient
import time
 
# Connect
client = ArcClient(
    host="localhost",
    port=8000,
    token="your-arc-token",
    database="sensors"
)
 
# Write some data
print("Writing sensor data...")
client.write.write_columnar(
    measurement="temperature",
    columns={
        "time": [
            int(time.time() * 1_000_000) - 3_000_000,
            int(time.time() * 1_000_000) - 2_000_000,
            int(time.time() * 1_000_000) - 1_000_000,
            int(time.time() * 1_000_000),
        ],
        "device": ["sensor_01", "sensor_02", "sensor_01", "sensor_02"],
        "location": ["floor_1", "floor_1", "floor_2", "floor_2"],
        "celsius": [21.5, 22.0, 20.8, 21.2],
    }
)
print("Data written.")
 
# Query it back
print("\nQuerying recent data...")
df = client.query.query_pandas(
    """
    SELECT
        time,
        device,
        location,
        celsius
    FROM temperature
    ORDER BY time DESC
    LIMIT 10
    """
)
 
print(df)
 
# Aggregate by location
print("\nAverage temperature by location:")
df_agg = client.query.query_pandas(
    """
    SELECT
        location,
        AVG(celsius) as avg_temp,
        COUNT(*) as readings
    FROM temperature
    GROUP BY location
    """
)
 
print(df_agg)

Save this as arc_example.py and run it:

python arc_example.py

Output:

Writing sensor data...
Data written.

Querying recent data...
                              time     device location  celsius
0 2026-01-19 16:32:53.165162+00:00  sensor_02  floor_2     21.2
1 2026-01-19 16:32:52.165162+00:00  sensor_01  floor_2     20.8
2 2026-01-19 16:32:51.165162+00:00  sensor_02  floor_1     22.0
3 2026-01-19 16:32:50.165161+00:00  sensor_01  floor_1     21.5

Average temperature by location:
  location  avg_temp  readings
0  floor_1     21.75         2
1  floor_2     21.00         2

Async Support

All operations have async variants via AsyncArcClient. If you're building async applications:

from arc_client import AsyncArcClient
import asyncio
 
async def main():
    async with AsyncArcClient(host="localhost", token="your-token", database="sensors") as client:
        # Query the data we wrote earlier
        df = await client.query.query_pandas(
            "SELECT * FROM temperature ORDER BY time DESC LIMIT 10"
        )
        print(df)
 
asyncio.run(main())

What's Next

This is version 1.0 of the SDK. It covers the core workflows—writing data, querying, and basic management operations. But we're just getting started.

We'd love to hear from you:

  • What's working well? Let us know what you're building.
  • What's missing? Features you need that aren't there yet.
  • What's broken? Bugs, edge cases, confusing APIs.

The SDK is open source. If you want to contribute, check out the repo:

https://github.com/Basekick-Labs/arc-client-python

Open an issue, submit a PR, or just star the repo if you find it useful.

Resources

Questions? Drop by the Discord or reach out on Twitter.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet.

Get Started ->