Kafka

Last updated:

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

At PostHog we mainly use it to stream events from our ingestion pipeline to ClickHouse.

Dictionary

  • broker: a cluster is built by one or more servers. The servers forming the storage layer are called brokers
  • event: an event records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers
  • producers: client applications that publish (write) events to Kafka
  • consumer: client application subscribed to (read and process) events from Kafka
  • topic: group of events
  • partition: topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers
  • replication: to make your data fault-tolerant and highly-available, every topic can be replicated, so that there are always multiple brokers that have a copy of the data just in case things go wrong

Questions?

Was this page useful?

Next article

Resize disk

How-to List your pods Connect to the Kafka container to verify the data directory filesystem size (in this example 15GB) Resize the underlying PVC (in this example we are resizing it to 20G) Note: while resizing the PVC you might get an error disk resize is only supported on Unattached disk, current disk state: Attached (see below for more details). In this specific case you need to temporary scale down the StatefulSet replica value to zero. This will briefly disrupt the Kafka service…

Read next article