Arc on ClickBench: InfluxDB Isn't On It. So We Benchmarked What Powers It.

We've been running Arc against every major analytical database on ClickBench. First ClickHouse, then TimescaleDB. InfluxDB is the one people keep asking about.

There's a problem: InfluxDB isn't on ClickBench. InfluxDB 3 supports SQL through DataFusion, but nobody has submitted InfluxDB itself to the benchmark. DataFusion — the query engine underneath — is on ClickBench.

So we compared Arc against DataFusion directly, and benchmarked ingestion against InfluxDB 3 Core and Enterprise.

DataFusion: The Engine Behind InfluxDB 3.0

InfluxDB 3.0 (formerly IOx) is built on Apache DataFusion — an extensible query engine written in Rust, part of the Apache Arrow ecosystem. DataFusion reads Parquet, speaks SQL, and is the analytical core that InfluxDB 3.0 delegates to for query execution.

DataFusion is on ClickBench. It reads the same Parquet format Arc uses. It runs on the same hardware. This is the fairest comparison we can make — same storage format, same benchmark, same machine.

If DataFusion is the best InfluxDB's architecture can do on analytical queries, and Arc beats DataFusion, the conclusion writes itself.

The Results

Same methodology as our previous posts — Arc runs true cold runs (service restart + OS cache flush before every query). DataFusion is on the https://github.com/ClickHouse/ClickBench/issues/793 — when that gets fixed, their numbers get worse. Ours don't move.

Combined score — relative time, lower is better:

System	Machine	Score
Arc	c8g.metal-48xl	×1.04
Arc	c7a.metal-48xl	×1.13
Arc	c6a.4xlarge	×1.92
DataFusion (Parquet, single)	c6a.4xlarge	×2.99

Cold run — relative time, lower is better:

System	Machine	Score
Arc	c8g.metal-48xl	×1.15
Arc	c6a.4xlarge	×1.21
Arc	c7a.metal-48xl	×1.21
DataFusion (Parquet, single)	c6a.4xlarge	×1.79

Hot run — relative time, lower is better:

System	Machine	Score
Arc	c8g.metal-48xl	×1.01
Arc	c7a.metal-48xl	×1.15
Arc	c6a.4xlarge	×2.80
DataFusion (Parquet, single)	c6a.4xlarge	×5.10

On the same hardware (c6a.4xlarge), Arc is 1.6x faster on combined score and 1.8x faster on hot runs.

On cold runs — where DataFusion benefits from lukewarm methodology — the gap is already 1.5x. When issue #793 gets resolved and DataFusion switches to true cold runs, that gap widens.

What This Means for InfluxDB

DataFusion is the query engine InfluxDB 3.0 is built on. InfluxDB 3.0 adds its own layers — schema management, retention policies, authentication, catalog — but the analytical query execution runs through DataFusion.

We're comparing Arc against DataFusion directly because that's what's on ClickBench. InfluxDB 3 Core and Enterprise are available self-hosted, but neither is on ClickBench — DataFusion is the closest proxy for their query performance. What we can say: on the query engine they're built on, Arc is 1.6x faster.

Ingestion: The Bigger Gap

ClickBench only measures queries. For the workloads InfluxDB targets — IoT telemetry, observability, product events — ingestion throughput matters just as much.

We ran sustained ingestion benchmarks on a MacBook Pro (M3 Pro Max, 14 cores, 36 GB RAM, 1 TB NVMe). Both InfluxDB 3 Core (v3.8.3) and Enterprise (v3.8.4) were tested with Line Protocol over HTTP, 60-second sustained load. WAL cannot be disabled in InfluxDB 3 — we tuned the WAL flush interval to 10ms, which gave the best results across all configurations we tested.

We gave InfluxDB every advantage: large batches (100K rows), pre-generated payloads, and multiple concurrent workers to find each edition's peak.

	Arc	InfluxDB 3 Core	InfluxDB 3 Enterprise
Throughput	17.3M rec/s	3.45M rec/s	1.71M rec/s
Batch size	1,000 rows	100,000 rows	100,000 rows
Workers	100	10	10
p50 latency	4.4ms	259ms	590ms
p99 latency	26ms	939ms	950ms
60s total records	~1.03B	~207M	~103M

Arc vs Core: 5x faster. Arc vs Enterprise: 10x faster. Both with 100x smaller batches.

The Enterprise edition is roughly half the speed of Core — even in single-node mode. The clustering coordination layer, distributed WAL, and license validation add overhead on every write. This is the cost of distributed architecture when you don't need it.

InfluxDB 3 Core sustained ~3.45M rec/sec. But it needed 100K-row batches to get there, and WAL can't be disabled — even with the flush interval tuned to 10ms (the optimal setting we found), it adds latency on every write.

Arc sorts by time at ingestion and appends columnar Parquet files via Apache Arrow with no background merge pressure. No compaction storms. No write amplification. That's why it sustains 17M+ rec/sec with 1,000-row batches and single-digit millisecond latency.

The Data Portability Problem

This is the part that doesn't show up in any benchmark but matters more than any number.

InfluxDB 2.x stores data in TSM format — a proprietary, InfluxDB-specific binary format. You can't read TSM files with any other tool. You can't query them with DuckDB, Spark, Snowflake, or Pandas. If you want your data out, you export it through InfluxDB's API, one query at a time.

InfluxDB 3.0 stores data in Parquet — which is progress. The Core and Enterprise editions support this. Whether the managed cloud offering exposes Parquet files directly, we haven't verified.

Arc stores everything in standard Parquet files on storage you control — local disk, S3, MinIO. Query them with Arc, DuckDB, Spark, Polars, or any tool that reads Parquet. If Arc disappears tomorrow, your data is still there, in a format every analytical tool understands.

The Flux Problem

InfluxDB 2.x's query language is Flux. InfluxData deprecated it. Their own documentation now recommends migrating away from Flux to InfluxDB 3.0's SQL interface — which is powered by DataFusion.

If you're writing Flux queries today, you're building on a deprecated foundation. The migration path leads to DataFusion SQL — the same engine Arc already beats on ClickBench.

Arc uses DuckDB SQL from day one. Full analytical SQL — window functions, CTEs, joins, subqueries, LIKE patterns. PostgreSQL-compatible syntax. No proprietary query language, no deprecation risk.

The Honest Caveat

InfluxDB has a massive ecosystem. Telegraf, client libraries in every language, years of documentation, a large community. If you're already running InfluxDB in production with Telegraf collectors everywhere, the operational inertia is real.

Arc supports continuous queries, retention policies, deletes, and compaction — the core features most InfluxDB users depend on.

That said — Arc supports InfluxDB Line Protocol natively. Point your Telegraf agents at Arc's write endpoint and they work. The migration path is a URL change.

Operational Complexity

InfluxDB 2.x: Install the binary, configure the TSM engine, manage shard groups, tune cache sizes, monitor compaction backlog, handle shard cold starts, deal with cardinality limits.

InfluxDB 3 Core: Self-hosted, simpler than 2.x. But WAL is always on and can't be disabled. Enterprise adds clustering overhead that halves throughput even in single-node mode.

Arc:

docker run -d -p 8000:8000 \
  -e STORAGE_BACKEND=local \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:latest

Point it at S3 for production. That's it.

Reproduce It Yourself

Every claim in this post is reproducible. That's the whole point.

ClickBench results: benchmark.clickhouse.com — filter for Arc and DataFusion (Parquet, single), same hardware
Arc's run script: https://github.com/ClickHouse/ClickBench/tree/main/arc
Cold run issue: https://github.com/ClickHouse/ClickBench/issues/793
Ingestion benchmark: https://github.com/Basekick-Labs/arc/tree/main/benchmarks/sustained_bench
Arc on GitHub: https://github.com/Basekick-Labs/arc

The Bottom Line

InfluxDB isn't on ClickBench — but its query engine, DataFusion, is. On the same hardware and same storage format, Arc is 1.6x faster on combined score and 1.8x faster on hot runs. DataFusion's results are also on the lukewarm run list. When that gets fixed, the gap widens.

On ingestion: 17.3M rec/sec vs 3.45M (Core) vs 1.71M (Enterprise). 5x faster than Core, 10x faster than Enterprise — with 100x smaller batches and single-digit millisecond latency.

Your data stays in standard Parquet on storage you control. No TSM lock-in, no deprecated query languages.

Arc is open source, AGPL-3.0, single binary, and supports InfluxDB Line Protocol for zero-friction migration. Already running InfluxDB? Point Telegraf at Arc and you're done.

This is the third in our ClickBench series. We're comparing Arc against every major analytical database — not because we have anything against them, but because Arc deserves to be measured fairly, and you deserve to see the numbers. Next up: Elasticsearch.

Get started with Arc →

https://github.com/Basekick-Labs/arc

Questions or challenges to the methodology? Open an issue on GitHub or find us on Discord. We want the numbers to be right.

Analytical Database

Streaming

AI Memory

By industry

Explore

Read

Migrate from…

Forum

Source & Issues

Real-time chat