Glossary

Plain-English definitions of the data and database terms that come up around columnar analytics, time-series, observability, and infrastructure.

A

ACID Transactions

ACID stands for Atomicity, Consistency, Isolation, and Durability, the four guarantees that make database transactions reliable.

Anomaly Detection

Anomaly detection finds data points that deviate from normal patterns, used for fraud detection, equipment monitoring, and security.

Apache Parquet

Apache Parquet is an open columnar file format for analytics. It compresses well, reads fast, and is supported by almost every data tool.

C

Cardinality

Cardinality is the number of unique values in a dataset or column. High cardinality can break time-series and observability systems.

Columnar Database

A columnar database stores data by column instead of by row, making large-scale analytics and aggregations dramatically faster.

Columnar vs Row Storage

Row storage keeps each record together and suits transactions. Columnar storage keeps each column together and suits analytics. Here is the difference.

Compaction

Compaction merges many small data files into fewer large ones, cutting storage cost and making queries faster. Here is how it works.

D

Data Compression

Data compression shrinks stored data to save space and speed up queries by reducing how much must be read from disk. Columnar data compresses especially well.

Data Historian

A data historian is a system that records time-series data from industrial equipment and processes. Traditional historians are proprietary and costly.

Data Ingestion

Data ingestion is the process of importing data from sources into a storage or analytics system. It can be batch or streaming.

Data Lake

A data lake is a central repository that stores raw data of any type at scale, usually as files on cheap object storage.

Data Retention Policy

A data retention policy defines how long data is kept before it is deleted or archived. It balances cost, compliance, and usefulness.

Downsampling

Downsampling reduces the resolution of time-series data by aggregating it into larger time buckets, saving storage at the cost of detail.

E

Edge Computing

Edge computing processes data close to where it is generated, instead of sending everything to a central cloud. It cuts latency and bandwidth.

Event Sourcing

Event sourcing stores every change to application state as an immutable sequence of events, rather than just the current state.

Eventual Consistency

Eventual consistency means all copies of data become consistent over time, but not instantly. It trades immediate consistency for availability.

H

High Cardinality

High cardinality means a very large number of unique values. It is a common cause of slow queries and runaway costs in monitoring systems.

Hot vs Cold Storage

Hot storage is fast and expensive for frequently used data. Cold storage is slow and cheap for rarely used data. Tiering balances the two.

I

Ingestion Rate

Ingestion rate is how fast a database can accept incoming data, usually measured in records or rows per second. It is critical for telemetry.

L

Lakehouse Architecture

A lakehouse combines the low cost and openness of a data lake with the performance and structure of a data warehouse.

Line Protocol

Line protocol is a simple text format for writing time-series data points, popularized by InfluxDB. Each line is one measurement with tags and fields.

M

Materialized View

A materialized view stores the precomputed result of a query, so repeated reads are fast. It trades storage and freshness for query speed.

MELT (Metrics, Events, Logs, Traces)

MELT stands for Metrics, Events, Logs, and Traces, the four core data types of observability. Unifying them is a major challenge.

O

Object Storage

Object storage keeps data as objects in flat buckets, like Amazon S3. It is cheap, scalable, and the foundation of modern data lakes.

Observability

Observability is the ability to understand a system's internal state from its outputs, like metrics, logs, and traces. It is key to running reliable software.

OLAP (Online Analytical Processing)

OLAP is a class of database workload focused on fast analytical queries over large datasets, like aggregations, rollups, and reporting.

OLAP vs OLTP

OLAP handles analytical queries over large datasets. OLTP handles many small transactions. Here is how they differ and when to use each.

Open Format Databases

An open format database stores your data in a non-proprietary format like Parquet, so you can read it with other tools and avoid lock-in.

OpenTelemetry

OpenTelemetry is an open standard for collecting metrics, logs, and traces from software. It frees telemetry from any single vendor.

Order Book

An order book is the real-time list of buy and sell orders for an asset, organized by price. Reconstructing it from history is a hard data problem.

P

Partition Pruning

Partition pruning skips entire partitions of data that a query does not need, dramatically reducing how much data must be scanned.

Predicate Pushdown

Predicate pushdown moves query filters as close to the data as possible, so less data is read and processed. It is key to fast analytics.

Predictive Maintenance

Predictive maintenance uses sensor data to predict equipment failure before it happens, reducing downtime and avoiding unnecessary repairs.

Q

Query Latency

Query latency is the time between submitting a query and getting results. Lower latency means faster dashboards, analysis, and decisions.

R

Real-Time Analytics

Real-time analytics is analyzing data the moment it arrives, so insights are available in seconds rather than hours. It needs fast ingest and query.

Relational Database

A relational database organizes data into tables of rows and columns with defined relationships, queried using SQL. It is the classic OLTP design.

S

Schema on Read

Schema on read applies structure to data when you query it, not when you store it. It offers flexibility for evolving and varied data.

Sensor Data

Sensor data is the stream of measurements produced by physical sensors, like temperature, vibration, and pressure. At fleet scale it is high volume.

SQL Query Engine

A SQL query engine parses, plans, and executes SQL queries against data. Modern engines can run directly on open files like Parquet.

Storage Tiering

Storage tiering automatically moves data between fast expensive storage and cheap slow storage based on how often it is accessed.

Stream Processing

Stream processing handles data continuously as it arrives, rather than in scheduled batches. It powers real-time pipelines and analytics.

T

Telemetry Data

Telemetry data is automatically collected measurements sent from remote sources like sensors, servers, and devices for monitoring and analysis.

Tick Data

Tick data is the record of every individual trade and quote in a market, timestamped to the microsecond. It is the rawest form of market data.

Time Bucketing

Time bucketing groups time-series data into fixed intervals, like per minute or per hour, to summarize and chart trends over time.

Time-Based Partitioning

Time-based partitioning splits a table into segments by time, so time-range queries read only the relevant segments and run faster.

Time-Series Data

Time-series data is a sequence of data points indexed by time. Examples include metrics, sensor readings, stock prices, and application events.

Time-Series Database

A time-series database is built to store and query data points indexed by time, like metrics, sensor readings, events, and financial ticks.

V

Vectorized Execution

Vectorized execution processes data in batches of values at once instead of row by row, making analytical queries much faster.

Vendor Lock-In

Vendor lock-in is when switching away from a product is so costly or hard that you stay against your will. Open formats are the main defense.

W

Window Function

A window function performs a calculation across a set of rows related to the current row, like running totals and moving averages, without collapsing them.

Write-Ahead Log (WAL)

A write-ahead log records changes before they are applied, so a database can recover without data loss after a crash.

Analytical Database

Streaming

AI Memory

By industry

Explore

Read

Migrate from…

Forum

Source & Issues

Real-time chat

Glossary

A

C

D

E

H

I

L

M

O

P

Q

R

S

T

V

W