Apache Pulsar is structured around several core components that work together to provide a flexible, scalable, and resilient messaging and event streaming platform. Each component plays a crucial role in the system's overall functionality and performance. This guide explores these fundamental elements, enhancing your understanding of Pulsar's architecture and capabilities.
A topic is a named channel where messages are published. Topics in Pulsar are the primary way data is categorized and streamed between producers and consumers. For higher scalability and performance, topics can be segmented into partitions.
Producers are the clients responsible for publishing messages to topics. In Pulsar, producers automatically manage message batching and encoding, optimizing network utilization and throughput. Producers can choose specific ways to ensure message delivery, including sending asynchronously for maximal throughput.
Consumers subscribe to topics to receive messages. Pulsar supports different subscription modes — exclusive, (key-)shared, and failover — providing flexibility in how messages are consumed across multiple clients. Consumers in Pulsar handle back-pressure and message acknowledgment, ensuring reliable data processing.
Brokers manage and coordinate the transfer of messages between producers and consumers. They keep track of subscriptions and maintain topic metadata, playing a pivotal role in Pulsar's scalability and fault tolerance. Brokers also handle things like authentication and authorization for secure client data communication.
Bookies are the storage component of Pulsar and part of the Apache BookKeeper project. They store the messages in a distributed log, ensuring durability and fault tolerance. Bookies work in an ensemble for replication, safeguarding data against hardware failures and providing low-latency read access.