Streamvisor blog

Streamvisor Wrapped - Queryable Pulsar Topics using TableView and Spring

Alexander Preuß
September 19, 2024
TL;DR the code for our example application can be found in this GitHub repository.

In today's data-driven landscape, businesses increasingly rely on real-time data processing to gain insights and drive decision-making.

Beyond just being a robust, scalable, low-latency solution for messaging and streaming use cases, Apache Pulsar offers a range of advanced features to help businesses unlock the value of their data.

In this blogpost we are going to create an example streaming application that creates Spotify Wrapped-like analytics for a user's listening behavior.

The example data is based on real Spotify streaming history files.

Developing Pulsar Applications with Spring

With Spring for Apache Pulsar having achieved GA status, setting up Pulsar applications has become a breeze.

We simply add it as a dependency in the Spring Initializr to generate the project scaffolding for us and we are ready to go.

Furthermore, adding the Docker Compose Support dependency, Spring can automatically start a Pulsar container together with the application and magically connect against it.

This makes the development experience incredibly smooth.

Creating Statistics from Music Streaming Data

Imagine a music streaming service, where every time a user listens to a track, a message is sent to a Pulsar topic. The event payload follows the structure as seen below:

Based on these stream events, we want to create statistics about a user's listening preferences, such as how many times a user has listened to a specific track, or which artists he likes the most.

Therefore we need a service that will process the incoming streaming events and create aggregations from them. Luckily, Apache Pulsar offers abstractions that will help us solve this problem:

The TableView interface  provides a continuously updated key-value map view of the compacted topic data.

It allows us to query the value for a given key, as well as register listeners that perform actions on any new incoming events.

Computing the number of plays is rather straightforward.

Whenever a new stream event arrives, we use a TableView to query a topic containing the counts and send a message containing the incremented count to that same topic.

The messages sent to the count topic need the identifier set as the message key (e.g. for the total play count the user id, or for counting the track plays a composite key of user id and track id).

With our counting TableViews in place, we can now go on to create statistics about the top artists or top tracks for a user. 

Again, we create a TableView but this time, for our top charts, on a topic keyed by the user id where the message payload contains a key-value map of the number of plays for a track/artist id.

On our TableViews for artist and track counts, we register a listener that will be executed whenever a new count event arrives.

The listener will then fetch the current key-value map from the new chart TableView and compare if the chart items need to be updated or replaced.

If yes, a new message with the updated state is sent to the chart topic.

With this code in place, the events emitted into our track-charts topic look like this:

Exposing aggregated Topic Data through a REST-ful API

With our TableViews providing continuously updated aggregations, we just need to build a service that simply retrieves the current value from the TableView when a REST endpoint is queried.

Having all the individual parts in place, we can now combine our different statistics to create our S̶p̶o̶t̶i̶f̶y̶ Streamvisor Wrapped endpoint, that when queried returns a response like this one:

Conclusion

Apache Pulsar makes it easy to build a wide range of messaging or streaming applications.

With the right integrations, powerful abstractions and an evolving ecosystem around the technology, we showed how quickly and easily even complex use cases can be solved.

If you want to try out the application yourself, you can find the code here: https://github.com/streamvisor/queryable-topics

Managing Pulsar does not need to be difficult - if you haven't, try out the free Streamvisor Community version.