Arc vs Elasticsearch
Elasticsearch is a search engine. Arc is a columnar analytical database. When you run analytical workloads on Elasticsearch, you pay the price of a Lucene inverted index doing work it was never designed for.
ClickBench Results
99.9M rows, 43 analytical queries. Arc runs true cold runs: service restart and OS cache flush before every query. Verify on benchmark.clickhouse.com →
Combined Score (lower is better)
| System | Machine | Score |
|---|---|---|
| Arc | c8g.metal-48xl | ×1.29 |
| Arc | c6a.4xlarge | ×2.00 |
| Elasticsearch | c8g.metal-48xl | ×11.97 |
| Elasticsearch | c6a.4xlarge | ×12.43 |
Cold Run (lower is better)
| System | Machine | Score |
|---|---|---|
| Arc | c8g.metal-48xl | ×1.32 |
| Arc | c6a.4xlarge | ×1.54 |
| Elasticsearch | c8g.metal-48xl | ×4.46 |
| Elasticsearch | c6a.4xlarge | ×5.82 |
Hot Run (lower is better)
| System | Machine | Score |
|---|---|---|
| Arc | c8g.metal-48xl | ×1.03 |
| Arc | c6a.4xlarge | ×2.84 |
| Elasticsearch | c8g.metal-48xl | ×27.67 |
| Elasticsearch | c6a.4xlarge | ×28.41 |
Log Ingestion Benchmark
Sustained 60-second log ingestion load. Same log schema, same machine.
| System | Throughput | p50 latency |
|---|---|---|
| Arc | 4.58M logs/sec | 1ms |
| Elasticsearch | 101K logs/sec | 38ms |
Arc achieves ~45x higher throughput with 38x lower p50 latency.
Why Arc Is Different: Under the Hood
Elasticsearch was designed for full-text search. Arc was designed for analytical queries on structured data. These are different problems with different optimal data structures.
Storage Format
Columnar Parquet vs. Lucene inverted index
Arc stores data as Apache Parquet files in time-partitioned paths (db/measurement/YYYY/MM/DD/HH/). Parquet is columnar: aggregating one field across 100M rows reads only that column from disk. Elasticsearch stores data in Lucene segments, an inverted index structure optimized for finding documents by term. For GROUP BY aggregations over numeric fields, Lucene must traverse posting lists that were never designed for that access pattern.
Query Engine
Vectorized SIMD aggregation vs. bucket trees
Arc embeds DuckDB, a vectorized OLAP engine that processes 2,048 rows per SIMD operation and executes aggregations directly on columnar Arrow arrays. Arc also rewrites SQL before execution: regex calls become string functions, time bucketing becomes epoch arithmetic. Elasticsearch implements GROUP BY as nested bucket trees over Lucene data, which works well for term faceting but is not competitive for analytical aggregations over high-cardinality numeric columns.
Ingestion Protocol
4.58M logs/sec vs. 101K logs/sec
Arc accepts MessagePack binary columnar batches (18M+ records/s), InfluxDB Line Protocol for Telegraf compatibility, and bulk CSV/Parquet import with efficient batches starting at ~1,000 rows. Elasticsearch uses the Bulk API, which requires JSON-encoded documents with per-document metadata lines, index analysis, and Lucene segment writes. The per-document indexing overhead (tokenization, inverted index updates, field data structures) limits Elasticsearch to ~101K logs/sec on the same hardware where Arc reaches 4.58M/sec.
Deployment Model
Single Go binary vs. JVM + ZooKeeper
Arc ships as a single Go binary with no external dependencies and no JVM. Optional clustering uses embedded Raft consensus. A 3-node Arc cluster is 3 processes. Elasticsearch requires JVM on each node, the Elasticsearch server process, and Kibana for visualization. A production HA cluster also needs dedicated master-eligible nodes separate from data nodes, with heap tuning, GC pauses, shard rebalancing, and split-brain prevention adding to the operational surface area.
Feature Comparison
| Feature | Arc | Elasticsearch |
|---|---|---|
| Standard SQL analytics | ✓ | ✗ |
| Portable Parquet storage | ✓ | ✗ |
| Open source | ✓ | ✓ |
| Edge / single-binary deployment | ✓ | ✗ |
| Columnar storage for analytics | ✓ | ✗ |
| InfluxDB Line Protocol ingestion | ✓ | ✗ |
| Retention policies | ✓ | ✓ |
Frequently Asked Questions
Why is Elasticsearch so much slower on analytical queries?
Elasticsearch is built on Apache Lucene, an inverted index optimized for full-text search, not columnar aggregations. When you run GROUP BY or range aggregations over billions of rows, Elasticsearch must traverse the inverted index structure, which is fundamentally inefficient for that access pattern. Arc uses a vectorized columnar engine (DuckDB) that processes analytical queries order-of-magnitudes faster.
Can Arc replace Elasticsearch for log analytics?
For structured log analytics (aggregations, filtering, dashboards, and alerting on log fields): yes. Arc is purpose-built for that workload. If you need full-text search or fuzzy matching across unstructured text bodies, Elasticsearch remains the right tool. The two workloads are different.
How do I migrate log ingestion from Elasticsearch to Arc?
Most logging pipelines (Fluent Bit, Logstash, Vector, OpenTelemetry Collector) support HTTP output. Point them at Arc's HTTP ingestion endpoint. Arc accepts JSON arrays or MessagePack. Migration is typically a configuration change, not a re-engineering effort.
What about full-text search?
Arc does not have an inverted index for full-text search. DuckDB supports LIKE, regex, and ILIKE patterns on string columns, which covers most structured log filtering. For unstructured document search, Elasticsearch is still the right choice.
Pricing
Start free with open source. Scale with enterprise features when you need them.
Open Source
- 18M records/sec ingestion
- Full SQL query engine (DuckDB)
- Parquet storage (S3, GCS, local)
- Docker and Kubernetes ready
- Community support (Discord)
Arc Cloud
Managed hosting. No infrastructure. Free 30-day trial.
- Deploy in 30 seconds
- Dedicated physical servers
- Daily backups to S3
- Arc Enterprise included
- No credit card required
Enterprise
Starting price for up to 8 cores. Clustering, RBAC, and dedicated support.
- Everything in Open Source
- Horizontal clustering and HA
- Role-based access control (RBAC)
- Tiered storage and auto-aggregation
- Dedicated support and SLAs
Enterprise Features
Clustering
Horizontal scaling with automatic data distribution. Query routing and load balancing across nodes.
Security
Fine-grained RBAC with database and table-level permissions. LDAP/SAML integration available.
Data Management
Automated retention policies, continuous queries for aggregation, and tiered storage for cost optimization.
Ready to handle billion-record workloads?
Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.