Arc Cloud is live. Start free — no credit card required.

Arc on ClickBench: Elasticsearch. This One Isn't Close Either.

#Arc#ClickBench#benchmark#Elasticsearch#performance#cold runs#analytical database#logs#observability
Cover image for Arc on ClickBench: Elasticsearch. This One Isn't Close Either.

Elasticsearch is everywhere in observability. Logs, APM traces, security events, audit trails — if a team is searching through unstructured data, there's a good chance they're running Elasticsearch or its OpenSearch fork. It's also one of the few systems that actually shows up on ClickBench, the benchmark that runs 43 analytical SQL queries against 99.9M rows of web analytics data.

So we ran Arc against it.

What Elasticsearch Is

Elasticsearch is a distributed search and analytics engine built on top of Lucene. Its core data structure is an inverted index — optimized for full-text search, document retrieval, and relevance scoring. It's genuinely excellent at what it was designed for.

ClickBench is a stress test for a very different workload: analytical aggregations over wide columnar data. COUNT(*), GROUP BY, percentiles, window functions. The kind of queries that power dashboards and reports, not text search.

Elasticsearch can run them. It has an SQL interface and an aggregations framework. Teams do use it for analytics. But that's not the home court.

This is a benchmark of what actually happens when you run it there.

Methodology

ClickBench runs 43 queries across a 99.9M row dataset on a single instance. Lower relative time is better. ×1.0 would be theoretically perfect.

One important note on cold runs: Arc runs true cold runs — the OS page cache is dropped between benchmark runs. Most systems on the ClickBench leaderboard run "lukewarm" — the cache isn't cleared, which significantly inflates their numbers. This is https://github.com/ClickHouse/ClickBench/issues/793. Arc's cold run numbers are real cold runs.

Unlike DataFusion and InfluxDB in our previous post, Elasticsearch is on the official ClickBench leaderboard. These aren't extrapolated numbers — they're the published results on the same benchmark, same hardware classes.

The Results

Combined Score

SystemMachineScore
Arcc8g.metal-48xl×1.08
Arcc7a.metal-48xl×1.17
Arcc6a.metal×1.29
Arcc6a.4xlarge×2.00
Arcc6a.2xlarge×2.61
Elasticsearchc6a.metal×11.97
Elasticsearchc6a.4xlarge×12.43

On the same c6a.4xlarge: Arc ×2.00 vs Elasticsearch ×12.43. Arc is 6.2x faster combined.

On c6a.metal: Arc ×1.29 vs Elasticsearch ×11.97. Arc is 9.3x faster combined.

Cold Run

SystemMachineScore
Arcc8g.metal-48xl×1.32
Arcc6a.4xlarge×1.38
Arcc7a.metal-48xl×1.39
Arcc6a.metal×1.42
Arcc6a.2xlarge×1.54
Elasticsearchc6a.metal×4.46
Elasticsearchc6a.4xlarge×5.82

Cold runs narrow the gap somewhat — Elasticsearch is 4.2x behind on c6a.4xlarge, 3.1x on c6a.metal. Still significant. But this is where it gets interesting.

Hot Run

SystemMachineScore
Arcc8g.metal-48xl×1.03
Arcc7a.metal-48xl×1.17
Arcc6a.metal×1.35
Arcc6a.4xlarge×2.84
Arcc6a.2xlarge×4.30
Elasticsearchc6a.4xlarge×27.67
Elasticsearchc6a.metal×28.41

On c6a.4xlarge: Arc ×2.84 vs Elasticsearch ×27.67. Arc is 9.7x faster on hot runs.

On c6a.metal: Arc ×1.35 vs Elasticsearch ×28.41. Arc is 21x faster on hot runs.

Why the Hot Run Gap Explodes

The cold-to-hot gap tells you something about architecture.

For Arc (DuckDB under the hood), hot runs get significantly faster — the data stays in columnar format in memory, and repeated analytical queries over the same dataset benefit from vectorized execution on warm data. Arc goes from ×1.42 to ×1.35 on c6a.metal. Nearly the same.

For Elasticsearch, the opposite happens at scale. It goes from ×4.46 cold to ×28.41 hot on c6a.metal. The inverted index structure — designed for fast lookups of specific documents — isn't well-suited for the kind of full-table analytical aggregations ClickBench runs. Under repeated load it doesn't get faster; it gets worse relative to baseline.

This is the architecture gap. It's not tuning. It's what these systems were built to do.

Ingestion

We already ran a detailed log ingestion benchmark comparing Arc against Elasticsearch and several other systems. The short version:

SystemLogs/secp50 Latency
Arc4,587,9061.00ms
Elasticsearch101,08737.93ms

Arc ingests 45x more logs per second with 38x lower median latency. Elasticsearch's p99 was 807ms. Arc's was 6.27ms.

If you're running a high-volume observability pipeline through Elasticsearch, these numbers aren't hypothetical — they're what your pipeline is living with.

The Honest Caveat

Elasticsearch is genuinely excellent at what it was built for.

If your primary workload is full-text search — fuzzy matching, relevance scoring, semantic search across unstructured documents — Elasticsearch is purpose-built for that. The inverted index is exactly the right data structure. Nothing in this benchmark changes that.

The ELK stack (Elasticsearch + Logstash + Kibana) and its OpenSearch equivalent have deep integrations across the observability ecosystem. Kibana dashboards, APM integrations, Fleet management, SIEM use cases — there's a real ecosystem here that Arc doesn't replicate.

What this benchmark shows is the cost of using Elasticsearch as a columnar analytical database. If you're using it primarily for SQL aggregations over time-series or event data — dashboards, trend analysis, query-heavy workloads — you're paying a significant performance tax.

That's not a bug in Elasticsearch. It's a use case mismatch.

What This Means for Observability Teams

A lot of teams end up running Elasticsearch for logs because it's the path of least resistance. It works. It has tooling. It integrates with everything. These numbers quantify what "works" costs:

  • 45x slower ingestion means either throttling your pipeline, losing data under load, or running significantly more hardware to keep up.
  • 10–20x slower queries means Grafana dashboards that load in seconds instead of milliseconds, or aggregations you stop running because they're too slow.
  • No portable storage — Elasticsearch's index format is proprietary. Your data lives in shards you can't read with DuckDB, Spark, or anything else outside the Elasticsearch ecosystem.

If you're hitting these walls and your primary workload is analytics rather than full-text search, Arc is worth a look.

Reproducibility

The ClickBench benchmark is public. You can run it yourself:

If you run your own benchmarks and get different numbers, https://github.com/Basekick-Labs/arc/issues or find us on Discord. We want the numbers to be right.

Bottom Line

MetricArc vs Elasticsearch
Combined score (c6a.4xlarge)6.2x faster
Combined score (c6a.metal)9.3x faster
Hot runs (c6a.4xlarge)9.7x faster
Hot runs (c6a.metal)21x faster
Log ingestion45x faster
Storage formatParquet (portable) vs shards (proprietary)

This is the fourth post in the Arc ClickBench series. Previous comparisons: ClickHouse, TimescaleDB, InfluxDB/DataFusion.

Next up: CrateDB.


Get started:

Questions or challenges? Find us on Discord or open an issue on https://github.com/Basekick-Labs/arc/issues.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->