Skip to content

Clickstream analysis

In this tutorial you learn about a project template that analyzes a clickstream in real time, and generates events based on behavior. You use Python, rather than kSQL to perform aggregations and implement event logic based on user behavior.

This tutorial uses the Quix Clickstream analysis template project.

Clickstream analysis pipeline

You'll fork the complete project from GitHub, and then create a Quix project from the forked repo, so you have a copy of the full application code running in your Quix account. You then examine the data flow through the project's pipeline, using tools provided by Quix.

Technologies used

Some of the technologies used by this template project are listed here.

Infrastructure:

Backend:

Frontend:

GitHub repository

The complete code for this project can be found in the Quix GitHub repository.

Getting help

If you need any assistance while following the tutorial, we're here to help in the Quix Community.

Prerequisites

To get started make sure you have a free Quix account.

Redis Cloud

If you want to run the project in your own Quix account, so you can customize it, then you'll need a free Redis Cloud account. You'll need the following credentials:

  • Hostname (can be found in the General section for the database)
  • Port (can be found in the General section for the database)
  • Username (can be found in the Security section for the database)
  • Password (can be found in the Security section for the database)

Git provider

You also need to have a Git account. This could be GitHub, Bitbucket, GitLab, or any other Git provider you are familar with, and that supports SSH keys. The simplest option is to create a free GitHub account.

Tip

While this tutorial uses an external Git account, Quix can also provide a Quix-hosted Git solution using Gitea for your own projects. You can watch a video on how to create a project using Quix-hosted Git.

The pipeline

There are several main stages in the pipeline:

  1. Clickstream producer - loads clickstream data from a CSV file. This represents user interactions with a shopping website over a period of 15 days. This runs as a service and the CSV file is automatically read repeatedly.
  2. Data aggregation - this service reads enriched data, performs various aggregations, and writes the results to Redis Cloud. These aggregations are then consumed by a Streamlit dashboard for visualization and analysis.
  3. Data enrichment - this service enriches the click data with additional data read from Redis Cloud. This data includes the product category, and the visitor's gender, birthday and age.
  4. Data ingestion - loads details of products in the web store and users from JSON files, and writes to Redis Cloud.
  5. Event detection - a simple state machine that triggers a product offer event when conditions have been met. The offer is tailored to the demographic of the user.
  6. Webshop frontend - this implements the online store.
  7. Real-time dashboard - A Streamlit dashboard service displaying real-time data about the clickstream. It reads its data from Redis Cloud.

More details are provided on all these services later in the tutorial.

Topics

The following Kafka topics are present in the project:

Topic Description Producer service Consumer service(s)
click-data The raw clickstream data loaded fro a CSV file Clickstream generator Data enrichment, Personalization demo
enriched-click-data Adds product catgory and user data to clickstream data Data enrichment Data aggregation, Behavior detection
special-offers Offers generated by Behavior detection - these are customized to the user demographic Behavior detection Personalization demo

The parts of the tutorial

This tutorial is divided up into several parts, to make it a more manageable learning experience. The parts are summarized here:

  1. Get the project - you get the project up and running in your Quix account.
  2. Clickstream producer - take a look at the clickstream producer service.
  3. Data ingestion - data ingestion job. Loads products and users from JSON files into Redis Cloud.
  4. Data enrichment - this service enriches the clickstream data with product category, and additional user information.
  5. Data aggregation - performs various aggregations on the data and adds them to Redis Cloud. RocksDB is used to hold state.
  6. Event detection - implements a simple state machine that detects when conditions for an offer have been met.
  7. Webshop frontend - the UI for the online shop.
  8. Real-time dashboard - a very useful real-time dashboard implementation using Streamlit.
  9. Lab: change offer - in this part you customize the event detection service.
  10. Summary. In this concluding part you are presented with a summary of the work you have completed, and also some next steps for more advanced learning about Quix.

🏃‍♀️ Next step

Part 1 - Get the project