Alexander Preuß photo
Alexander Preuß

Table of Contents

Tutorial

How to create a Data Catalog for Apache Pulsar

Introduction

As your Pulsar deployment grows, keeping track of all the topics and understanding what data they carry, can become a challenge. The Streamvisor data catalog helps you document, organize and search your topics. This way you will always understand what is flowing through your pipelines and who owns it.

In this quick guide, we’ll show you how to use the data catalog to register metadata, filter by owners and applications, and quickly navigate to your detailed topic information.

Navigating the Explorer

To get started, open the Explorer page. This is your main starting point for interacting with your Pulsar environment.

Once you are on the Explorer page, you'll see a list of tenants, the logical groups that help organize your messaging infrastructure. Each tenant contains one or more namespaces, which in turn contain the actual topics where your messages live.

Here’s a quick path through the structure:

  • Select a Tenant → reveals available Namespaces inside
  • Select a Namespace → shows all the Topics inside
  • Click on a Topic → opens the Topic Overview

This clear hierarchy ensures you always know exactly where you are in the system.

Adding documentation to a Topic

Once you’ve selected a topic, you’ll land on its overview page. This page gives you general information about the topic, such as its configuration and metrics.

If you have not previously added documentation, you will notice that no catalog information is registered yet.

Click the Edit button to update the documentation for this topic. In the dialog that opens, you can define:

  • Owner → the person or team responsible for the data
  • Application → the application producing or consuming the data
  • Description → a long-form explanation of the purpose or nature of the data
  • Labels → the tags for easier filtering and grouping of the data

This metadata forms the entry in your data catalog. You can also leave fields empty if you do not need them.

Browsing the Data Catalog

Once you have added documentation, head to the Catalog section in the sidebar. Here, you'll see a different visualization compared to the Explorer. The catalog shoes all documented topics in one place, along with their descriptions, owners and labels.

From here, you can:

  • Search
  • Filter by application
  • Filter by data owner
  • Filter by labels

This makes it easy to find the topics you care about.

Navigating to Topic details

The Data Catalog is not just a static list. When you find an interesting entry, click on it to open an extended dialog. Here the full metadata will be displayed and clicking on the context menu at the top will give you the option to jump right into the topic details by clicking on the Explore button.

Wrapping Up

The Data Catalog makes Pulsar more transparent and easier to manage. By documenting topics with owners, descriptions and labels, you can create a shared understanding across your different teams and roles. Try it out next time you want to bring clarity to your topics, improve collaboration, or just make your Pulsar streams more discoverable!

Alexander Preuß photo
About the Author
Alexander Preuß is a seasoned expert in the data streaming field with extensive experience as a software engineer at both startups and large enterprises. Specializing in distributed systems, he has contributed to various open source projects, including Apache Flink, Apache Kafka, and Apache Pulsar, along with their ecosystems. Prior to founding Streamvisor, Alexander worked at Ververica (acquired by Alibaba) and StreamNative.

Let's stay in touch

Get notified of new developments or blogposts.
Checkmark icon
You've joined the mailing list!
Oops! Something went wrong while submitting the form.