Arc Cloud is live. Start free — no credit card required.

Arc 26.04.1: Native Arrow Query Path, Security Audit, and a Lot of Things That Quietly Got Better

#Arc#release#v26.04.1#performance#security#Arrow#DuckDB#compaction#deduplication#Decimal128#S3
Cover image for Arc 26.04.1: Native Arrow Query Path, Security Audit, and a Lot of Things That Quietly Got Better

This one took longer to write than most release posts. Not because there isn't much to say — there's too much.

26.04.1 is the kind of release that happens when you spend a month not chasing features but going deep: auditing the security model, fixing the things that could hurt you in production, and reworking the query path from the ground up. The result is a release that's faster, safer, and handles a lot of edge cases that were quietly waiting to bite you.

Let me walk through what actually matters.

Query Performance: Native Arrow Path

The biggest performance change in this release is how Arc reads data out of DuckDB.

Previously, the query path used database/sql — the standard Go database interface. It works, but it has a cost: every row gets scanned into interface{}, then type-switched, then serialized. For analytical workloads with millions of rows, that per-cell overhead adds up.

26.04.1 bypasses database/sql entirely and uses DuckDB's native Arrow API. Query results come out as Arrow record batches — columnar, typed, zero-copy — and go straight to the serializer without boxing.

Results on a 1.88B row dataset:

EndpointBeforeAfterImprovement
JSON (/api/v1/query)1.43M rows/sec2.28M rows/sec+59%
Arrow IPC (/api/v1/query/arrow)2.45M rows/sec6.29M rows/sec+157%

The Arrow IPC endpoint especially benefits — batches go straight from DuckDB to the IPC writer with no intermediate conversion. If you're using Pandas, Polars, or any Arrow-native client, this is a significant win.

No API changes. No config. It's just faster.

Typed JSON Streaming

On top of the Arrow path, we also replaced the JSON serialization with a streaming writer. Instead of accumulating all rows and calling json.Marshal (which uses reflection), the new path maps column types once and streams directly to the HTTP response using strconv.AppendInt, strconv.AppendFloat, time.AppendFormat — zero allocation per cell.

The practical impact: constant ~8KB memory usage regardless of result set size, instead of holding the full result in memory. On very large result sets, this eliminates OOM risk entirely. The micro-benchmark shows 2.3x faster serialization and 99.9% fewer allocations (5 vs 30,016 per 10K rows).

Ingestion: 18.23M Records/Sec

We migrated from vmihailenco/msgpack/v5 to our own fork Basekick-Labs/msgpack/v6 with a lower-allocation decode path. Less GC pressure means more consistent throughput under sustained load.

MetricBeforeAfter
Avg throughput16.78M rec/s18.23M rec/s
p50 latency0.52ms0.47ms
60s degradation22%13%

The flatter degradation curve matters more than the peak number. 13% drop over 60 seconds vs 22% means your ingestion pipeline stays consistent under load instead of degrading as GC pressure builds.

Security: Full Audit, 9 Critical Fixes

We ran a comprehensive security audit across all components. Nine critical issues found and fixed:

The one you need to know about: RBAC write permission bypass. CheckWritePermissions was using the wrong context key — "token" instead of "token_info" — which silently bypassed all RBAC write restrictions. If you're running Arc with RBAC enabled, update immediately.

The other fixes:

  • Token privilege escalation — Token create/update API accepted arbitrary permission strings without validation
  • DuckDB profiling connection race — Profiling PRAGMAs could execute on random pooled connections
  • MessagePack decoder data race — Non-atomic counters on concurrent goroutines
  • Ingestion buffer Close() race — Iterator/lock race during shutdown
  • WAL reader OOM — Corrupt WAL could trigger ~4GB allocation; now validated before allocation
  • Tiering memory exhaustioncopyFile loaded entire Parquet files into memory; replaced with streaming
  • memory_limit SQL injectionARC_DATABASE_MEMORY_LIMIT is now validated against an allowlist before being passed to DuckDB

We also added RequireAdmin middleware to all mutating endpoints (continuous queries, delete, retention policies, compaction, scheduler) that previously accepted any valid token. Read-only endpoints are unchanged.

And expanded the delete WHERE clause forbidden keyword list to block UNION, SELECT, CREATE, COPY, ATTACH, LOAD, PRAGMA, CALL, and SET.

Compaction: Automatic Deduplication

Compaction now automatically deduplicates rows with identical tag values and timestamps, using last-write-wins semantics. This is the same model as InfluxDB's series key — if the same combination of tag values and timestamp is written multiple times, only the most recent survives compaction.

Zero configuration. Tag column names are stored as Parquet metadata (arc:tags) at ingestion time for Line Protocol and MessagePack row-format data. The compactor reads that metadata and deduplicates automatically.

Zero overhead when no duplicates exist — the window function runs, finds nothing to remove, and compaction proceeds normally. Files written before this release have no arc:tags metadata and compact as before.

New: Decimal128 Type Support

Arc now supports native Decimal128 columns for precision-sensitive workloads — financial data, scientific measurements, anything where float64's ~15 significant digits aren't enough.

Configure per-measurement:

[ingest]
decimal_columns = ["trades:price=18,8;amount=18,8"]

For highest precision, send values as strings over MessagePack ("123.456789012345678") — the string-to-decimal conversion is exact with no float64 intermediate.

New: S3 Path Prefix

Added ARC_STORAGE_S3_PREFIX for shared-bucket multi-tenant deployments. All storage operations — compaction, tiering, queries, backups — use the prefix transparently.

ARC_STORAGE_S3_PREFIX=instances/abc123/

This is how Arc Cloud isolates tenants on shared storage infrastructure.

New: Bootstrap Token + Auth Recovery

Two new environment variables for deployment and recovery:

ARC_AUTH_BOOTSTRAP_TOKEN — Set a known admin token at deploy time instead of catching a random one from startup logs. Stored as bcrypt hash, never persisted as plaintext.

ARC_AUTH_FORCE_BOOTSTRAP — Recovery path when the admin token is lost. Adds a new arc-recovery admin token without removing existing tokens — legitimate admins keep access.

ARC_AUTH_BOOTSTRAP_TOKEN=your-recovery-token-min-32-chars
ARC_AUTH_FORCE_BOOTSTRAP=true

After recovery, revoke the recovery token via the API and remove the env var.

Observability: WAL Drops + Slow Query Logging

WAL drops are now a Prometheus metric (arc_wal_dropped_entries_total). Previously, drop counts were only available at WAL close time via Stats(). Now you can alert on drops in real time. The WAL buffer size is also configurable:

ARC_WAL_BUFFER_SIZE=10000  # default

Slow query logging is now configurable with WARN-level output and a Prometheus counter (arc_slow_queries_total):

ARC_QUERY_SLOW_QUERY_THRESHOLD_MS=1000  # 0 = disabled (default)

When a query exceeds the threshold, Arc logs the SQL, execution time, row count, and token name. Covers all query paths including Arrow IPC.

Bug Fixes

The 26.03.2 release shipped the critical compaction filename collision fix that could lose up to 84% of data. That fix is included here. Additional compaction fixes in 26.04.1:

  • Hourly compaction race with active ingestion — Compaction could delete source files while the ingestion pipeline was still writing to the same partition. Now enforces a minimum 1-hour file age before compacting. The default config is corrected from hourly_min_age_hours = 0 to 1.
  • CQ scheduler reload on update — Continuous query updates now immediately reload the scheduler. Previously required a restart to pick up changes.
  • Atomic CQ execution recording — CQ execution state updates are now wrapped in a SQLite transaction to prevent duplicate or missing windows on failure.
  • S3 delete-rewrite streaming — Was loading entire Parquet files into memory; now streams through a temp file.
  • Backup restore streaming — Same fix: large Parquet files no longer OOM on restore.
  • Token expiration display — Non-expiring tokens were showing as "Expired" in the UI. Fixed by using *time.Time so null serializes correctly.
  • Auth bootstrap race — TOCTOU race on initial admin token creation fixed with INSERT ... WHERE NOT EXISTS.
  • Helm: deployment strategy defaults to RecreateRollingUpdate deadlocks with a single replica + ReadWriteOnce PVC. Recreate is now the default.

Dependencies

  • DuckDB 1.4.3 → 1.4.4 — Parquet UTF-8 string stats tolerance, Arrow string view pushdown correctness fix, date_trunc stat propagation, mode() use-after-free fix, S3 credential secure clear
  • Arrow Go v18.4.1 → v18.5.2 — Fixed large string Parquet writes (data corruption on large log messages), decompression regression, reduced GC pressure
  • gRPC 1.79.1 → 1.79.3 — Authorization bypass fix: malformed :path headers missing leading slash could bypass path-based deny rules

How to Update

docker pull ghcr.io/basekick-labs/arc:latest

For binary installations, download 26.04.1 from the https://github.com/Basekick-Labs/arc/releases.


Get started:

Questions? Discord or https://github.com/Basekick-Labs/arc/issues.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->