Rittman Analytics

View Original

Dynamic Data Model Definition in Cube using Python and Jinja

At Rittman Analytics we’re big fans of Cube, pioneer of the “Headless BI” movement and used in many of our client projects including the subject of one of our most recent case studies, Osprey Charging in the UK.

Cube now supports the use of Python and Jinja for defining dynamic data models, a feature introduced as part of Cube’s improved integration with dbt Core via the new cube_dbt package that simplifies the process of defining the Cube semantic layer on top of dbt models by enabling developers to load metadata from a dbt project’s manifest.json file.

Even more interestingly this new ability to declare and invoke Python functions from within a Jinja template gives us a means to de-duplicate common patterns in our data models and the dynamic generation of data models from a remote data source or conditional logic.

Dynamic Generation of Data Models

Consider a situation where you have a multi-tenant Cube setup where a number of tenants have variations on a base Cube data model that is then extended or amended for each tenant based on their particular reporting needs.

Up until now we would need to handle these variations in each tenants’ cube data model by storing the base version in version control as a template and then write our own custom devops process to inject additional Cube YAML or JS dimension and measure definitions into those base models, hoping at the end that it’ll all still work and compile correctly.

Now we can simplify this tenant-level customisation by putting the customisable part of the cube model into a Python function, for example, and then referencing that via Jinja in the cube data model definition, like this:

See this content in the original post
We could take this down to the individual dimension column level and tie this to user input in the front-end application, like this:
See this content in the original post

We could also dynamically change the aggregation method for a measure in this way too, like this:

See this content in the original post

However that this doesn’t mean that you can now redefine a cube data model on the fly, as you still need to restart the Cube server to pick up the new data model definition and rebuild any pre-aggregations that may have changed from those defined in the old data model.

In projects we deliver using Cube we would typically handle ad-hoc analysis and on-the-fly groupings in the BI tool, for example Preset, rather than requiring amendments to the cube configuration. This approach allows for greater flexibility and ease of use for end-users, while also maintaining the integrity and consistency of the cube data model.

Environment Separation & Data Validation

Perhaps a more appropriate use of this new feature would be to use Python and Jinja to generate different cube definitions based on the current environment; for example, you might want to use different SQL tables for development and production environments.

See this content in the original post

Or you might want to add data validation in your cube definitions, defining a Python function that checks the validity of user input or the results of a query for use with a Jinja template that includes this logic in the cube definition.

See this content in the original post

Generating Cube Data Models from an API Call

Finally though let’s look at what’s probably the most relevant use of now having Python and Jinja support in Cube; the ability to generate data models from a remote data source, for example an API.

This new ability could be very useful, for example, in situations where you want to dynamically serve-up Cube data model definitions for a data source that evolves frequently, or where you want to programatically generate model definitions for new or temporary data sources.

In the code example below we’ve mocked-up an API to provide us with the necessary data to generate cube data models, along with a helper function to load data from the the mocked-up remote API that we’d then store in the model/globals.py file:

See this content in the original post

The code example defines a load_data function that uses a MyApiClient class to load data from a remote API. Note that the load_data function is decorated with the @template.function decorator, allowing it to be called from within a Jinja template.

Then we use the load_data function called from within a Jinja template to generate the Cube data model, with a loop around cube creation logic to generate a cube for each item returned by the API.

See this content in the original post

Interested? Find Out More!

Rittman Analytics is a boutique data analytics consultancy that works with growth-stage, mid-market and enterprise businesses in the UK, EU and North America who want to build a world-class data capability based around a modular, scalable modern data stack.

We’re authorised delivery partners for Cube along with Dagster, Preset, dbt Labs, Fivetran, Rudderstack and Google Cloud and are experts at helping you choose and implement the right ones for your organisation’s needs, use-cases and budget.

If you’re looking for some help and assistance with Cube or would just like to talk shop and share ideas and thoughts on what’s going on in your organisation and the wider data analytics world, contact us now to organise a 100%-free, no-obligation call — we’d love to hear from you!