A data lake is a central repository that stores large volumes of raw data in its native format, structured or unstructured, usually as files on cheap object storage. You apply structure when you read the data, not when you store it.

What a data lake is for

The data lake idea is to keep everything in one affordable place rather than forcing it into a rigid warehouse schema up front. You land raw data, often as open files like Parquet, on object storage, then point whatever processing and query engines you want at it.

The strength is flexibility and cost: store anything, scale cheaply, and avoid early lock-in to one schema or one engine. The historical weakness was performance and governance, raw lakes could become slow, messy "data swamps". That weakness is what the lakehouse set out to fix.

The modern lake is built on open formats so that storage and compute stay separate and portable.

How Arc handles Data Lake

Arc fits the data lake model: it stores data as open Parquet on object storage you own, separating storage from the query engine. Your data lives in the lake in an open format, and Arc is one of the engines that can query it fast.

Related terms

Arc is a high-performance columnar database. Open Parquet on storage you own, single Go binary, production-ready in 30 seconds.

Get Arc->See it live->