ClickBench Verified

Arc vs Elasticsearch

Elasticsearch is a search engine. Arc is a columnar analytical database. When you run analytical workloads on Elasticsearch, you pay the price of a Lucene inverted index doing work it was never designed for.

45x

faster log ingestion

6–21x

faster analytical queries

1ms

p50 vs 38ms Elasticsearch

Get Started →

Read the Benchmark →

ClickBench Results

99.9M rows, 43 analytical queries. Arc runs true cold runs: service restart and OS cache flush before every query. Verify on benchmark.clickhouse.com →

Combined Score (lower is better)

System	Machine	Score
Arc	c8g.metal-48xl	×1.29
Arc	c6a.4xlarge	×2.00
Elasticsearch	c8g.metal-48xl	×11.97
Elasticsearch	c6a.4xlarge	×12.43

Cold Run (lower is better)

System	Machine	Score
Arc	c8g.metal-48xl	×1.32
Arc	c6a.4xlarge	×1.54
Elasticsearch	c8g.metal-48xl	×4.46
Elasticsearch	c6a.4xlarge	×5.82

Hot Run (lower is better)

System	Machine	Score
Arc	c8g.metal-48xl	×1.03
Arc	c6a.4xlarge	×2.84
Elasticsearch	c8g.metal-48xl	×27.67
Elasticsearch	c6a.4xlarge	×28.41

Log Ingestion Benchmark

Sustained 60-second log ingestion load. Same log schema, same machine.

System	Throughput	p50 latency
Arc	4.58M logs/sec	1ms
Elasticsearch	101K logs/sec	38ms

Arc achieves ~45x higher throughput with 38x lower p50 latency.

Why Arc Is Different: Under the Hood

Elasticsearch was designed for full-text search. Arc was designed for analytical queries on structured data. These are different problems with different optimal data structures.

Storage Format

Columnar Parquet vs. Lucene inverted index

Arc stores data as Apache Parquet files in time-partitioned paths (db/measurement/YYYY/MM/DD/HH/). Parquet is columnar: aggregating one field across 100M rows reads only that column from disk. Elasticsearch stores data in Lucene segments, an inverted index structure optimized for finding documents by term. For GROUP BY aggregations over numeric fields, Lucene must traverse posting lists that were never designed for that access pattern.

Query Engine

Vectorized SIMD aggregation vs. bucket trees

Arc embeds DuckDB, a vectorized OLAP engine that processes 2,048 rows per SIMD operation and executes aggregations directly on columnar Arrow arrays. Arc also rewrites SQL before execution: regex calls become string functions, time bucketing becomes epoch arithmetic. Elasticsearch implements GROUP BY as nested bucket trees over Lucene data, which works well for term faceting but is not competitive for analytical aggregations over high-cardinality numeric columns.

Ingestion Protocol

4.58M logs/sec vs. 101K logs/sec

Arc accepts MessagePack binary columnar batches (19.9M records/s), InfluxDB Line Protocol for Telegraf compatibility, and bulk CSV/Parquet import with efficient batches starting at ~1,000 rows. Elasticsearch uses the Bulk API, which requires JSON-encoded documents with per-document metadata lines, index analysis, and Lucene segment writes. The per-document indexing overhead (tokenization, inverted index updates, field data structures) limits Elasticsearch to ~101K logs/sec on the same hardware where Arc reaches 4.58M/sec.

Deployment Model

Single Go binary vs. JVM + ZooKeeper

Arc ships as a single Go binary with no external dependencies and no JVM. Optional clustering uses embedded Raft consensus. A 3-node Arc cluster is 3 processes. Elasticsearch requires JVM on each node, the Elasticsearch server process, and Kibana for visualization. A production HA cluster also needs dedicated master-eligible nodes separate from data nodes, with heap tuning, GC pauses, shard rebalancing, and split-brain prevention adding to the operational surface area.

Feature Comparison

Feature	Arc	Elasticsearch
Standard SQL analytics	✓	✗
Portable Parquet storage	✓	✗
Open source	✓	✓
Edge / single-binary deployment	✓	✗
Columnar storage for analytics	✓	✗
InfluxDB Line Protocol ingestion	✓	✗
Retention policies	✓	✓

Frequently Asked Questions

Why is Elasticsearch so much slower on analytical queries?

Elasticsearch is built on Apache Lucene, an inverted index optimized for full-text search, not columnar aggregations. When you run GROUP BY or range aggregations over billions of rows, Elasticsearch must traverse the inverted index structure, which is fundamentally inefficient for that access pattern. Arc uses a vectorized columnar engine (DuckDB) that processes analytical queries order-of-magnitudes faster.

Can Arc replace Elasticsearch for log analytics?

For structured log analytics (aggregations, filtering, dashboards, and alerting on log fields): yes. Arc is purpose-built for that workload. If you need full-text search or fuzzy matching across unstructured text bodies, Elasticsearch remains the right tool. The two workloads are different.

How do I migrate log ingestion from Elasticsearch to Arc?

Most logging pipelines (Fluent Bit, Logstash, Vector, OpenTelemetry Collector) support HTTP output. Point them at Arc's HTTP ingestion endpoint. Arc accepts JSON arrays or MessagePack. Migration is typically a configuration change, not a re-engineering effort.

What about full-text search?

Arc does not have an inverted index for full-text search. DuckDB supports LIKE, regex, and ILIKE patterns on string columns, which covers most structured log filtering. For unstructured document search, Elasticsearch is still the right choice.

Pricing

Start free with open source. Scale with enterprise features when you need them.

Open Source

Freeforever

AGPL-3.0 licensed

19.9M records/sec ingestion
Full SQL analytical engine
Continuous queries + auto-compaction
Open file format on S3, Azure, GCS, MinIO, local
Docker and Kubernetes ready
Community support (Discord)

Download ->View on GitHub

Arc Enterprise Managed

Custom

Managed hosting, sized to your workload.

Everything in Arc Enterprise
Managed and operated by Basekick Labs
Dedicated physical servers, sized to spec
Daily backups to S3, monitoring, upgrades
Migration support included

Now Available

Enterprise

$5,000/year

Starting price for up to 8 cores. Clustering, RBAC, and dedicated support.

Everything in Open Source
Horizontal clustering and HA
Role-based access control (RBAC)
Tiered storage automation
Audit logging and query governance
Dedicated support and SLAs

View all plans ->

Enterprise Features

Clustering

Horizontal scaling with automatic data distribution. Query routing and load balancing across nodes.

Security

Fine-grained RBAC with database and table-level permissions. LDAP/SAML integration available.

Data Management

Automated retention policies, continuous queries for aggregation, and tiered storage for cost optimization.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in open files on your storage. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->

Analytical Database

Streaming

AI Memory

By industry

Explore

Read

Migrate from…

Forum

Source & Issues

Real-time chat