Arc 26.01.1: Python SDK, Azure Storage, and 10M Records Per Second

Last month we rewrote Arc in Go. We fixed the memory leaks, hit 9.47 million records per second, and thought—okay, now we can breathe for a bit.

That lasted about a week.

The feedback started coming in. "Love the performance, but I need a Python client." "We're on Azure, any chance you'll support Blob Storage?" "Can we run this with TLS without putting nginx in front?" The community wasn't waiting around.

So we got back to work. And honestly? This release might be my favorite so far. Not because of any single feature, but because it's shaped by what people actually asked for.

We crossed 10 million records per second. We shipped a Python SDK. Azure teams can finally run Arc on their infrastructure. And a community contributor—Adam Schroder—fixed something that had been bugging us for weeks: historical data landing in the wrong partitions.

Here's what's new.

Official Python SDK

We've been getting requests for this since day one. A proper Python client that doesn't require you to craft HTTP requests manually.

It's now on PyPI as arc-tsdb-client.

The SDK gives you high-performance MessagePack columnar ingestion, query responses in pandas, polars, or PyArrow, buffered writes with automatic batching, and the full management API for retention policies, continuous queries, and authentication.

We built it with httpx for async support and focused on making the common patterns feel natural. If you're already using pandas or polars for your data pipelines, Arc now fits in without friction.

Azure Blob Storage Backend

Arc already supported S3 and local storage. Now it runs on Azure too.

This was a common request from teams running on Microsoft infrastructure. You can use connection strings, account keys, SAS tokens, or Managed Identity—whatever fits your security model.

The storage abstraction we built for S3 made this straightforward to add. Same partitioned Parquet files, same compaction behavior, just a different backend.

Native TLS Support

You can now run Arc with HTTPS directly, no reverse proxy required.

For teams running Arc from native packages on bare metal or VMs, this simplifies the deployment significantly. Point Arc at your certificate files, enable TLS, and you're done.

We also added automatic HSTS headers when TLS is enabled, so browsers know to always use HTTPS.

Data-Time Partitioning

This one came from the community. Thanks to Adam Schroder for the contribution.

Previously, when you backfilled historical data, it would land in today's partition—not where it actually belonged. That broke partition pruning and made time-range queries scan way more data than necessary.

Now Arc organizes files by the data's timestamp, not ingestion time. December 2024 data goes to December 2024 partitions, even if you're ingesting it in January 2026.

Batches that span multiple hours get automatically split. Data gets sorted by timestamp within each file. Partition pruning actually works for historical queries now.

Compaction API

Another community contribution from Adam. You can now trigger hourly and daily compaction manually via API, and the schedules are independently configurable.

Useful when you've just finished a large backfill and want to compact immediately rather than waiting for the next scheduled run.

Configurable Ingestion Concurrency

For high-concurrency deployments—say, 50+ Telegraf agents hitting Arc simultaneously—we've exposed the concurrency settings that were previously hardcoded.

You can tune flush workers, queue sizes, and shard counts to match your workload. The defaults scale with CPU cores, but now you have the knobs if you need them.

DuckDB S3 Query Support

Arc now configures DuckDB's httpfs extension automatically when you're using S3 storage. Queries against S3-backed data just work—no manual extension setup required.

The Numbers

We've been pushing performance since the Go rewrite. This release crosses a milestone.

Ingestion: 10.1M records/sec with p99 latency at 6.73ms.

That's a 7% improvement from 25.12.1, with 63% lower p50 latency and 84% lower p99 latency. The improvements came from Zstd compression support for MessagePack payloads, single-pass timestamp normalization, and better column sorting during schema inference.

Query: Arrow IPC responses now deliver 5.2M rows/sec—an 80% improvement over the previous release.

We added caching at multiple layers: SQL-to-storage-path transformations, partition path lookups, and glob results. For dashboard scenarios where the same queries run repeatedly, these caches make a real difference.

Bug Fixes

A few notable ones:

CTEs work properly now. Previously, CTE names like WITH campaign AS (...) were incorrectly converted to storage paths, causing "No files found" errors.

JOINs work with full table references. JOIN database.table now resolves correctly.

String literals containing SQL keywords don't get mangled anymore. A WHERE clause like msg = 'SELECT * FROM mydb.cpu' stays intact.

Comments in queries don't confuse the parser. Both -- and /* */ style comments are handled correctly.

Storage paths use UTC consistently, preventing partition misalignment when servers run in different timezones.

No Breaking Changes

This is a drop-in upgrade from 25.12.1. Your config files, data, and tokens all work unchanged.

If you're using S3, note that credentials now also get passed to DuckDB for httpfs queries. Make sure your AWS environment variables or config are set up.

What's Next

February is all about MQTT. For IoT and industrial deployments, MQTT is how devices talk. We're adding a native MQTT subscriber so Arc can pull data directly from your broker—no middleware required. Topic-to-measurement mapping, QoS support, the works.

We're also building the integration between Arc and Liftbridge—the lightweight Kafka alternative we took over last month. Liftbridge gives you durable message buffering with replay capability, which is exactly what you need when sensors dump hours of backlogged data or when Arc is down for maintenance. The goal is a complete IoT data platform: Telegraf collects, Liftbridge buffers, Arc stores, Grafana visualizes.

March brings bulk import APIs for CSV and Parquet files, plus migration guides for teams coming from InfluxDB, QuestDB, TimescaleDB, and ClickHouse. If you've been waiting to migrate, that's your window.

We're also working toward Arc Cloud beta. More on that soon.

If you're running Arc in production, we'd love to hear how it's going. Drop by the Discord or open an issue on GitHub.

Resources

GitHub: github.combasekick-labs/archttps://github.com/basekick-labs/arc
Documentation: https://docs.basekick.net/arc
Discord: https://discord.gg/nxnWfUxsdm
Python SDK: https://pypi.org/project/arc-tsdb-client/

Analytical Database

Streaming

AI Memory