Liftbridge Joins Basekick Labs: Building the IoT Data Platform

A few days ago, we sent Tyler Treat an email.

Tyler created Liftbridge back in 2017—a lightweight, durable message streaming system built on top of NATS. Think Kafka, but without the JVM, without ZooKeeper, and packaged in a single 16MB binary.

The project had been quiet since 2022. No releases. No commits. 2,600+ stars on GitHub, but no one at the wheel.

We asked Tyler if Basekick Labs could take over maintenance. His response:

"I would be absolutely thrilled if someone were to continue to maintain, extend, and improve it rather than have it simply die."

So here we are. Liftbridge is now part of Basekick Labs.

Thank You, Tyler

Before anything else: thank you, Tyler.

Liftbridge isn't just another streaming system. The architecture decisions—using NATS as the transport layer, separating metadata consensus (Raft) from data replication (ISR), the commit log design borrowed from Kafka—these aren't obvious choices. They come from someone who deeply understood the tradeoffs.

Tyler was the #2 contributor to NATS Streaming before building Liftbridge. He saw the limitations of storing messages redundantly in both a Raft log and a message log, and designed something better. His blog posts on distributed systems are still some of the best technical writing in the space.

Taking over a project like this is a privilege. We don't take it lightly.

Why Liftbridge

We've been building Arc—a time-series database on DuckDB and Parquet—for the past year. Arc is great at storing and querying time-series data. But storage is only part of the problem.

In IoT, data doesn't arrive neatly. Sensors send data in bursts. Networks go down. Devices reconnect and dump hours of buffered readings. You need something between your sensors and your database that can:

Buffer data durably — If Arc is down for maintenance, messages don't disappear
Handle backpressure — Traffic spikes don't overwhelm your database
Enable replay — Re-process data from a specific point in time
Decouple producers from consumers — Sensors don't need to know about Arc

This is what message streaming systems do. Kafka does it. Redpanda does it. Confluent Cloud does it.

But Kafka is heavy. It needs the JVM, ZooKeeper (or KRaft), and careful tuning. For IoT deployments—especially edge deployments—that's overkill.

Liftbridge is Kafka's semantics in a Go binary. Durable. Replicated. Lightweight.

Why Now

On December 8th, IBM announced they're acquiring Confluent for $11 billion.

We've seen this movie before.

IBM acquires a company. Prices go up. Innovation slows down. Enterprise sales cycles get longer. The product becomes part of "IBM's hybrid cloud and AI strategy" instead of being the best version of itself.

Some teams will stick with Confluent. But others—especially smaller teams, IoT deployments, edge use cases—will look for alternatives.

NATS with JetStream is one option. But JetStream is newer, and some teams want the Kafka-style log semantics that Liftbridge provides.

An abandoned project with solid architecture is now an opportunity.

How Liftbridge Fits Into Arc

Here's the architecture we're building toward:

┌─────────────────────────────────────────────────────────────────┐
│                        IoT Data Platform                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌──────────┐    ┌─────────────┐    ┌──────────┐    ┌───────┐  │
│   │  Sensors │───▶│  Telegraf   │───▶│Liftbridge│───▶│  Arc  │  │
│   │  Devices │    │  (collect)  │    │ (buffer) │    │(store)│  │
│   │   Edge   │    └─────────────┘    └──────────┘    └───────┘  │
│   └──────────┘                             │              │     │
│                                            │              │     │
│                                            ▼              ▼     │
│                                    ┌─────────────────────────┐  │
│                                    │        Grafana          │  │
│                                    │    (visualize/alert)    │  │
│                                    └─────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Each component has a clear job:

Component	Role	Why this choice
Telegraf	Collect metrics from sensors, protocols, APIs	Industry standard, 300+ plugins, Arc already has output plugin
Liftbridge	Durable message buffer with replay capability	Lightweight, NATS-based, Kafka semantics without Kafka complexity
Arc	Time-series storage and query	DuckDB performance, Parquet portability, no vendor lock-in
Grafana	Visualization and alerting	Industry standard, Arc already has data source plugin

The key insight: we don't need to build everything. Telegraf and Grafana are best-in-class and open source. We focus on the storage and streaming layers where we can differentiate.

Liftbridge Architecture

For those who want to go deeper, here's how Liftbridge works under the hood.

The Core Idea

Liftbridge adds durability to NATS. NATS is a high-performance pub/sub messaging system—fast, simple, but ephemeral. Messages that aren't consumed immediately are gone.

Liftbridge sits alongside NATS and subscribes to subjects you want to persist. It writes them to a durable, replicated commit log. Consumers can read from any point in that log—not just "what's happening now" but "everything since yesterday" or "everything since offset 1,847,293."

                    ┌─────────────────────────────┐
                    │         NATS Cluster        │
                    │  (fast, ephemeral pub/sub)  │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │     Liftbridge Cluster      │
                    │  (durable, replicated log)  │
                    │                             │
                    │  ┌─────┐ ┌─────┐ ┌─────┐    │
                    │  │Node1│ │Node2│ │Node3│    │
                    │  │(L)  │ │(F)  │ │(F)  │    │
                    │  └─────┘ └─────┘ └─────┘    │
                    └─────────────────────────────┘

                    L = Leader, F = Follower

Dual Consensus Model

This is where Liftbridge gets clever. It uses two different consensus mechanisms for different purposes:

Raft for Metadata

Cluster membership, stream assignments, consumer group state—this is metadata. It changes infrequently and needs strong consistency. Liftbridge uses Raft (via HashiCorp's implementation) with a single elected controller.

ISR for Data Replication

Actual message data uses Kafka's ISR (In-Sync Replicas) protocol. This is faster than Raft for high-throughput data because:

No redundant logging (messages are only written to the commit log, not a separate Raft log)
Leader handles writes, followers pull and acknowledge
Flexible durability: wait for leader only, wait for all replicas, or fire-and-forget

┌─────────────────────────────────────────────────────────────────┐
│                    Liftbridge Consensus Model                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────────────┐    ┌───────────────────────────────┐  │
│  │   Metadata Plane     │    │        Data Plane             │  │
│  │   (Raft Consensus)   │    │    (ISR Replication)          │  │
│  ├──────────────────────┤    ├───────────────────────────────┤  │
│  │                      │    │                               │  │
│  │ • Cluster membership │    │ • Message replication         │  │
│  │ • Stream assignments │    │ • Commit log writes           │  │
│  │ • Consumer groups    │    │ • High watermark tracking     │  │
│  │ • Leader election    │    │ • Follower catch-up           │  │
│  │                      │    │                               │  │
│  │ Single Controller    │    │ Per-Partition Leader          │  │
│  │ Strong consistency   │    │ Tunable durability            │  │
│  │ Low write volume     │    │ High throughput               │  │
│  │                      │    │                               │  │
│  └──────────────────────┘    └───────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Streams and Partitions

Liftbridge organizes data into streams. Each stream maps to a NATS subject and can have multiple partitions for parallelism.

┌─────────────────────────────────────────────────────────────────┐
│                     Stream: "sensors"                           │
│                     Subject: "sensors.*"                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  Partition 0    │  │  Partition 1    │  │  Partition 2    │  │
│  │  Leader: Node1  │  │  Leader: Node2  │  │  Leader: Node3  │  │
│  ├─────────────────┤  ├─────────────────┤  ├─────────────────┤  │
│  │ Offset 0: msg   │  │ Offset 0: msg   │  │ Offset 0: msg   │  │
│  │ Offset 1: msg   │  │ Offset 1: msg   │  │ Offset 1: msg   │  │
│  │ Offset 2: msg   │  │ Offset 2: msg   │  │ Offset 2: msg   │  │
│  │ ...             │  │ ...             │  │ ...             │  │
│  │ Offset N: msg   │  │ Offset N: msg   │  │ Offset N: msg   │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
│                                                                 │
│  Replication Factor: 2                                          │
│  Each partition replicated to 2 nodes (leader + 1 follower)     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Commit Log Structure

Each partition is stored as a commit log—an append-only sequence of segments:

Partition Directory: /data/streams/sensors/0/
├── 00000000000000000000.log       # Segment file (messages)
├── 00000000000000000000.index     # Offset index (offset → position)
├── 00000000000000000000.timestamp # Time index (timestamp → offset)
├── 00000000000000001000.log       # Next segment (starts at offset 1000)
├── 00000000000000001000.index
├── 00000000000000001000.timestamp
└── ...

Segment files contain the actual messages. When a segment reaches a size limit, a new one is created.
Index files are memory-mapped for fast lookups. Given an offset, find the file position in O(1).
Timestamp indexes enable time-based queries: "give me everything since 2pm yesterday."

Message Format

Every message in Liftbridge has an envelope:

┌─────────────────────────────────────────────────────────────────┐
│                     Message Envelope                            │
├─────────────────────────────────────────────────────────────────┤
│  Magic Bytes    │  4 bytes  │  0xB9 0x0E 0x43 0xB4              │
│  CRC-32C        │  4 bytes  │  Checksum of payload              │
│  Payload Length │  4 bytes  │  Size of protobuf message         │
│  Payload        │  N bytes  │  Protobuf-encoded message         │
└─────────────────────────────────────────────────────────────────┘

Payload (Protobuf):
├── Offset           (int64)   - Position in partition
├── Key              (bytes)   - Optional routing key
├── Value            (bytes)   - Actual message content
├── Timestamp        (int64)   - Unix nanos
├── Headers          (map)     - Key-value metadata
├── Subject          (string)  - Original NATS subject
├── ReplySubject     (string)  - For request-reply patterns
├── CorrelationID    (string)  - For tracing
└── AckInbox         (string)  - For publisher acks

Two Ways to Publish

Via NATS (zero code changes)

If you're already publishing to NATS, Liftbridge can subscribe and persist. Your publishers don't change at all.

Publisher → NATS subject "sensors.temperature" → Liftbridge subscribes → Durable log

Via gRPC (full control)

For durability guarantees, use the Liftbridge API directly:

Publisher → Liftbridge gRPC → Write to log → Wait for ack → Confirm

The gRPC path gives you:

AckPolicy: LEADER (fast), ALL (durable), NONE (fire-and-forget)
Exactly-once delivery with idempotent producers
Transactional writes across partitions

Metadata: What the Controller Tracks

The Raft-based controller maintains cluster state:

┌─────────────────────────────────────────────────────────────────┐
│                    Controller Metadata                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Cluster State                                                  │
│  ├── Brokers: [node1:9292, node2:9292, node3:9292]              │
│  ├── Controller: node1                                          │
│  └── Controller Epoch: 7                                        │
│                                                                 │
│  Streams                                                        │
│  ├── sensors                                                    │
│  │   ├── Subject: sensors.*                                     │
│  │   ├── Partitions: 3                                          │
│  │   ├── ReplicationFactor: 2                                   │
│  │   ├── Retention: 7 days                                      │
│  │   └── Partition Assignments:                                 │
│  │       ├── P0: Leader=node1, ISR=[node1, node2]               │
│  │       ├── P1: Leader=node2, ISR=[node2, node3]               │
│  │       └── P2: Leader=node3, ISR=[node3, node1]               │
│  └── events                                                     │
│      └── ...                                                    │ 
│                                                                 │
│  Consumer Groups                                                │
│  ├── arc-ingest                                                 │
│  │   ├── Coordinator: node2                                     │
│  │   ├── Members: [consumer-1, consumer-2]                      │
│  │   └── Offsets:                                               │ 
│  │       ├── sensors/0: 1847293                                 │
│  │       ├── sensors/1: 1632841                                 │
│  │       └── sensors/2: 1923847                                 │
│  └── ...                                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Failure Handling

Scenario	What Happens	Recovery Time
Follower crashes	ISR shrinks, writes continue	Follower rejoins, catches up from leader
Leader crashes	Controller elects new leader from ISR	Seconds (Raft election)
Controller crashes	Raft elects new controller	Seconds (Raft election)
Network partition	Depends on which side has quorum	Automatic when partition heals

The key invariant: messages acknowledged with AckPolicy.ALL are never lost as long as one ISR member survives.

What's Next

Immediate (Q1 2026):

Modernize codebase (Go 1.25+, updated dependencies)
Security audit
Fix CI/CD
First release under Basekick Labs: Liftbridge 26.01.1

Near-term:

Documentation refresh
Integration guide: Liftbridge + Telegraf + Arc
Docker Compose for the full stack
Performance benchmarks

Future:

Tiered storage (offload old segments to S3/object storage)
Enhanced observability (Prometheus metrics, distributed tracing)
Arc-native integration (direct ingest from Liftbridge without Telegraf hop)

The Bigger Picture

Basekick Labs is building a data platform for IoT. Not a single product—a stack.

Arc handles storage. Liftbridge handles streaming. Telegraf handles collection. Grafana handles visualization.

Each piece is open source. Each piece can be used independently. But together, they're a complete solution for industrial IoT, sensor networks, and edge computing.

We're not trying to be Confluent. We're not trying to be InfluxData. We're building something lighter, more portable, and honest about its licensing from day one.

If that resonates with you, check out https://github.com/liftbridge-io/liftbridge or https://github.com/basekick-labs/arc.

And if you're using Liftbridge in production—or want to contribute—join our Discord. The project has been quiet for two years. Let's change that.

Resources:

Analytical Database

Streaming

AI Memory