Object Storage

Object storage is a way of storing data as discrete objects in a flat namespace, accessed over an API, rather than as files in a hierarchy or blocks on a disk. Amazon S3 is the best known example, along with MinIO, Google Cloud Storage, and Ceph.

Why analytics runs on object storage

Object storage became the backbone of analytics for three reasons: it is cheap, it scales almost without limit, and it separates storage from compute. You can store petabytes for a low per-gigabyte price and spin up query engines against it on demand.

Compared with block storage like EBS, object storage is dramatically cheaper per gigabyte. The tradeoff is higher latency per request, which is fine for analytical workloads that read large chunks at a time and a poor fit for transactional workloads that need tiny fast reads.

This is the model behind the data lake and lakehouse: keep data as open files in object storage, and point whatever engine you want at it.

How Arc handles Object Storage

Arc reads and writes directly to object storage you control: S3, MinIO, GCS, or Ceph. Your data sits in your bucket as open Parquet. Compute and storage stay separate, so you pay object-storage prices for history and scale them independently.

Arc is a high-performance columnar database. Open Parquet on storage you own, single Go binary, production-ready in 30 seconds.