Arc on ClickBench: We Ran True Cold Runs. Arc Won.

#Arc#ClickBench#benchmark#ClickHouse#Parquet#performance#cold runs#analytical database
Cover image for Arc on ClickBench: We Ran True Cold Runs. Arc Won.

ClickBench is the industry-standard benchmark for analytical databases. It's run by the ClickHouse team, it's open source, and every result is publicly reproducible. Over 60 databases are listed. Anyone can verify.

We submitted Arc to ClickBench. We ran it on three AWS instance types. We compared it against ClickHouse — the database the benchmark is named after.

Arc won on every machine. Let me walk you through what happened.

Same Format, Same Hardware — The Only Fair Way

ClickBench lets you compare dozens of systems, but most comparisons aren't apples-to-apples. ClickHouse using its own native columnar format has a built-in advantage over systems storing data in standard Parquet. That's expected — ClickHouse is optimized for ClickHouse's format.

Arc stores everything in standard Parquet. That's not a limitation — it's a design choice. Your data stays portable. You can query Arc's files with DuckDB, Snowflake, or any tool that reads Parquet. No lock-in, ever.

So the only comparison that makes sense is: Arc vs ClickHouse (Parquet, single node), same AWS hardware.

The Results

Combined score — relative time, lower is better:

Systemc8g.metal-48xlc7a.metal-48xlc6a.4xlarge
Arc×1.04×1.14×1.94
ClickHouse (Parquet, single)×1.98×2.25×3.00

Cold run — relative time, lower is better:

Systemc8g.metal-48xlc7a.metal-48xlc6a.4xlarge
Arc×1.16×1.23×1.22
ClickHouse (Parquet, single)×1.48×1.72×1.80

Hot run — relative time, lower is better:

Systemc8g.metal-48xlc7a.metal-48xlc6a.4xlarge
Arc×1.02×1.16×2.82
ClickHouse (Parquet, single)×2.73×3.22×5.13

On the high-end hardware (c8g.metal-48xl), Arc completes the full combined benchmark in roughly half the time of ClickHouse Parquet. On the mid-range instance (c6a.4xlarge), Arc is 1.5x faster.

On cold runs, Arc is 1.3–1.5x faster across all machines. On hot runs the gap widens even more — up to 2.8x faster on the high-end instances.

Arc wins on every machine, across every metric.

You can check this yourself at benchmark.clickhouse.com — filter by Arc and ClickHouse (Parquet, single), same hardware.

Why This Matters: True Cold vs Lukewarm

This is the part most benchmark posts skip. And it changes everything.

ClickBench defines three run types per query:

  • Run 1 (cold): Database and OS cache are fresh — nothing pre-loaded
  • Run 2 (warm): Database cache warm, OS cache warm
  • Run 3 (hot): Fully cached

The combined score is computed across all three runs. Here's the thing — if your "cold" run isn't actually cold, if the database process stayed alive and the OS page cache wasn't cleared, your first run benefits from residual cache. That's a lukewarm run, and it makes your cold numbers look better than they really are.

The ClickHouse team knows about this. There's an open issue to fix it: https://github.com/ClickHouse/ClickBench/issues/793, opened February 17, 2026. It lists 80+ submissions that need to be updated — including clickhouse-parquet.

Arc is not on that list.

Here's the relevant part of Arc's run.sh, https://github.com/ClickHouse/ClickBench/tree/main/arc:

# TRUE COLD RUN: Restart Arc and clear OS cache ONCE per query
restart_arc() {
    sudo systemctl stop arc
    sync
    echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
    sudo systemctl start arc
}

Before every single query: stop the service, flush the OS page cache, restart. Maximum cold. No residual state. No warm cache bleeding into run 1.

The ClickHouse (Parquet) submission uses the lukewarm approach — the database process stays alive between queries. When issue #793 gets resolved and they switch to true cold runs, their numbers will get worse. Arc's won't move.

The published results already understate Arc's advantage.

What ClickBench Doesn't Measure: Ingestion

ClickBench measures query performance. It doesn't measure ingestion throughput, which is often the bottleneck in real analytical workloads — product analytics, IoT telemetry, observability pipelines.

We ran sustained ingestion benchmarks on a MacBook Pro (M3 Pro Max, 14 cores, 36 GB RAM, 1 TB NVMe) using IoT metrics data. Both systems ran with WAL disabled — Arc has WAL off by default, and we disabled ClickHouse's fsync_after_insert to keep it fair. We gave ClickHouse every advantage — native TCP protocol, LZ4 compression, 100K-row batches, columnar inserts:

ArcClickHouse (optimized)
Throughput17.3M rec/s7.0M rec/s
Batch size1,000 rows100,000 rows
p50 latency4.4ms136ms
p99 latency26ms279ms
30s total1.03 billion~211 million

Arc ingests 2.5x faster — with 100x smaller batches and 31x lower latency.

The difference is architectural. ClickHouse's MergeTree engine sorts data on every write, creating sorted parts that need background merging. Larger batches amortize that cost, which is why ClickHouse needs 100K-row inserts to reach peak throughput. With the same 1,000-row batches Arc uses, ClickHouse drops to ~400K rec/s.

Arc sorts by time at ingestion and appends Parquet files via Apache Arrow. No merge tree, no background compaction pressure. That's why it sustains 17M+ rec/s with tiny batches and single-digit millisecond latency.

If you're building a system that needs to ingest billions of events per day and query them in seconds, ingestion throughput matters as much as query speed. Arc handles both.

The Honest Caveat

If you compare Arc against ClickHouse using ClickHouse's native format (not Parquet), ClickHouse is faster on queries. Their proprietary columnar format is well-optimized for their query engine.

But that comes with a cost: vendor lock-in. Your data lives in ClickHouse's format. You can't query it with another tool. You can't move it without an export step. You're committed.

Our position is simple — you shouldn't have to choose between performance and data portability. The ClickBench results show you don't have to. At least when comparing the same storage format.

Operational Complexity

This doesn't show up in any benchmark, but it matters at 2am when something breaks.

ClickHouse (single node): One binary, manageable. Add a cluster for HA and you're dealing with ClickHouse Keeper or ZooKeeper, replica coordination, shard configuration, and distributed query routing.

Arc: One Go binary. No dependencies. Deploy with a single Docker command:

docker run -d -p 8000:8000 \
  -e STORAGE_BACKEND=local \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:latest

Point it at S3 for production. That's it.

If you don't have a dedicated database infrastructure engineer, this difference matters a lot. If you do, the query performance and ingestion throughput still stand on their own.

Check It Yourself

Every claim in this post is reproducible. That's the whole point.

Spin up a c6a.4xlarge on AWS, follow the run script, check the results. That's what we did.

The Bottom Line

Same storage format, same hardware, same benchmark — Arc outperforms ClickHouse on ClickBench across every machine we tested. 1.5–2x on combined score, 1.3–1.5x on cold runs, and up to 2.8x on hot runs.

Arc runs true cold benchmarks. ClickHouse's Parquet submissions currently use lukewarm runs — a known issue the ClickBench team is actively working to fix. When they do, the gap widens.

On ingestion, Arc does 17.3M records/sec with 1,000-row batches. ClickHouse peaks at 7.0M with 100K-row batches — 2.5x slower, with 31x higher latency. Both with WAL disabled.

Arc is open source, AGPL-3.0, single binary, and stores your data in standard Parquet. No vendor lock-in.

This is the first in a series. We're going to run this comparison against every major analytical database on ClickBench — not because we have anything against them, but because Arc deserves to be measured fairly, and you deserve to see the numbers. We're here to stay, and we're here to prove that you don't have to choose between simplicity, performance, and owning your data.

Get started with Arc →

https://github.com/Basekick-Labs/arc


Questions or challenges to the methodology? Open an issue on GitHub or find us on Discord. We want the numbers to be right.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->