Arc 26.02.1: MQTT, InfluxDB Drop-In Compatibility, and 480 Servers Later

#Arc#release#v26.02.1#MQTT#InfluxDB#IoT#performance#time-series#Basekick Labs
Cover image for Arc 26.02.1: MQTT, InfluxDB Drop-In Compatibility, and 480 Servers Later

Four months ago we open-sourced Arc and started building in public. I had no idea what to expect. Maybe a handful of stars, some curious developers kicking the tires, and a lot of silence.

That's not what happened.

Today Arc has over 470 stars on GitHub, 170+ pull requests merged, a Discord community that's genuinely engaged (people helping each other, filing detailed bug reports, suggesting features), and it's running on 480 servers in production. Not test environments. Production.

Every release we've shipped has been shaped by the people using Arc. This one is no different. In fact, this might be the most community-driven release yet.

Let me walk you through it.

InfluxDB Drop-In Client Compatibility

This was the most requested feature by far. People kept asking: "Can I just point my existing InfluxDB client at Arc?"

Now you can.

Arc's Line Protocol endpoints now match InfluxDB's API paths. The Go client, Python client, JavaScript, Java, C#, PHP, Ruby, Node-RED, Telegraf... they all work without changing a single line of code. Just swap the URL.

We also added support for all of InfluxDB's authentication methods: Bearer tokens, Token headers, API key headers, and even the old v1 query parameter style. If your Telegraf config has username and password fields, they'll work.

The Arc-native endpoint (/api/v1/write/line-protocol) is still there for teams that prefer it. But if you're migrating from InfluxDB, the barrier just dropped to zero.

Native MQTT Ingestion

For IoT and edge deployments, MQTT is how devices talk. Until now, you needed middleware (like Telegraf) between your MQTT broker and Arc.

Not anymore.

Arc can now subscribe directly to MQTT topics and ingest data without any additional infrastructure. Wildcard topics, TLS connections, auto-reconnect with backoff, and topic-to-measurement mapping are all built in.

You manage subscriptions through a REST API, and they auto-start on server restart. Passwords are encrypted at rest. It's the kind of feature that makes Arc feel like it was built for IoT from the ground up. Because it was.

S3 File Caching

Contributed by https://github.com/khalid244.

If you're running Arc on S3 and your queries hit the same data repeatedly (Grafana dashboards, CTEs that read the same table multiple times), this one's for you. Arc now supports optional in-memory caching of S3 Parquet files via DuckDB's cache_httpfs extension.

The result? 5-10x faster queries for repeated file access. It's disabled by default, opt-in, and configurable. No surprises.

Smarter Partition Pruning

Queries using NOW() - INTERVAL now benefit from partition pruning. Before this, a query like WHERE time > NOW() - INTERVAL '20 days' would scan every single Parquet file because the partition pruner didn't know how to evaluate relative time expressions.

Now it does. The expression gets resolved to an absolute timestamp at query time, and only the relevant partitions get scanned. If you're running on S3, this directly reduces your costs and query latency.

Query Performance

We shipped a bunch of optimizations that stack together:

Automatic time function optimization. Queries using time_bucket() and date_trunc() are automatically rewritten to epoch-based arithmetic. 2-2.5x faster GROUP BY queries, zero code changes required.

Parallel partition scanning. Queries spanning multiple time partitions now run concurrently. 2-4x speedup for time-range queries.

DuckDB engine tuning. Parquet metadata caching, prefetching, and pool-wide settings. 18-24% faster aggregation queries.

Database header optimization. Using the x-arc-database header skips regex parsing and gives you 4-17% faster queries depending on the pattern.

Compaction Got Reliable

This is the part of the release I'm most relieved about. Compaction at scale was causing pain for some users, and we threw a lot of effort at making it bulletproof.

OOM and segfault fixes. Large datasets (2B+ rows, thousands of files) would crash the compactor. We switched to streaming I/O, added file batching, and built adaptive batch sizing that automatically splits and retries when memory pressure hits.

Crash recovery. Contributed by https://github.com/khalid244. If a compaction process crashes mid-operation, a manifest-based tracking system now picks up where it left off. No more data duplication from partial compactions.

Orphaned temp cleanup. Temp directories from crashed subprocesses used to pile up. Now they're cleaned up on startup and after each subprocess completes.

WAL-based S3 recovery. Also from https://github.com/khalid244. When S3 goes down, data in the buffer used to be lost. Now the WAL preserves it and replays on recovery. This is a big deal for anyone running Arc on object storage.

More Fixes Worth Mentioning

Non-UTF8 data. If you're ingesting rsyslog or data with binary content, Arc used to choke at query time. Now it sanitizes invalid UTF-8 during ingestion with minimal overhead.

Nanosecond timestamps. InfluxDB uses nanoseconds by default. Arc's MessagePack endpoint now handles them correctly instead of producing dates in the year 57,000.

Multi-line SQL. WHERE clauses spanning multiple lines would skip partition pruning entirely. Fixed.

String literals with SQL keywords. A query containing LIKE '%GROUP BY%' would confuse the partition pruner. Fixed.

Retention policies on S3/Azure. They were silently doing nothing. Now they work across all storage backends.

Query timeouts. Contributed by https://github.com/khalid244. When S3 disconnects mid-query, you now get a 504 instead of an infinite hang.

Tiered storage routing. Cold-tier-only databases now show up in listings and respond to queries properly.

Security

API tokens are now hashed with bcrypt for storage. SHA256 is still used for fast cache lookups and database indexes, but the actual security layer is bcrypt. Legacy tokens continue to work.

One Breaking Change

The Line Protocol endpoint paths changed to match InfluxDB's API:

  • /api/v1/write is now /write
  • /api/v1/write/influxdb is now /api/v2/write

If you're using official InfluxDB client libraries, this is what they expect. If you were hitting the old Arc-specific paths directly, update your config. The MessagePack endpoint (/api/v1/write/msgpack) is unchanged.

Thank You

I want to take a moment to thank the people who made this release happen.

https://github.com/khalid244 contributed S3 file caching, manifest-based compaction recovery, WAL-based S3 recovery, query timeouts, the multi-line WHERE fix, and day-level S3 file verification. That's six significant contributions in a single release cycle.

https://github.com/schotime (Adam Schroder) fixed Azure SSL certificate issues on Linux and UTC consistency in compaction filenames.

And to everyone in the Discord who filed issues, tested pre-releases, and shared how they're using Arc in production: you're the reason this project moves as fast as it does. 470 stars and 480 servers in four months doesn't happen without a community that cares.

We're just getting started.

Get It

docker run -d \
  -p 8000:8000 \
  -e STORAGE_BACKEND=local \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:latest

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->