Open Source Terraform Modules for Data Products on the Google Cloud Platform

We've developed two open source Terraform Modules, Confluent Kafka to GCP BigQuery and GCP Bigquery Transform, to make provisioning of data products on GCP easier. The figure below shows a working example of the two Terraform modules.

Terraform modules in action

Terraform Module "Confluent Kafka to GCP BigQuery"

The module leverages Confluent Kafka Connect and GCP BigQuery to facilitate the transfer of data from and existing Confluent Kafka cluster to GCP BigQuery.

To use this module, users must provide input parameters for the Kafka cluster and topics, and the credentials to access the resources. Once the module is deployed, it sets up a GCP BigQuery table for every topic provided.

The module also creates an RESTful endpoint via Google Cloud Functions where meta information of the dataproduct are published: Name, domain and the BigQuery tables created. This endpoint can be used as an input for another data product or to retrieve information about this data product.

Overall, this Terraform module provides a simple and efficient way to integrate data from Confluent Kafka to GCP BigQuery, allowing users to easily collect and store data in a centralized location for further processing and analysis.

The module is open-source and can be found on GitHub and the Terraform Registry.

Terraform Module "GCP Bigquery Transform"

The goal of the Terraform module is to facilitate the creation and configuration of a dataproduct based on GCP BigQuery views. These views contain aggregated data from a GCP BigQuery table.

The module creates a BigQuery dataset named "aggregations" and one or multiple views based on a source table and SQL scripts provided by the user. The names of the views are derived from the names of the SQL scripts. Besides the SQL files and the source table, users must provide the domain and the name of the data product to be created.

The module also creates an RESTful endpoint via Google Cloud Functions where meta information of the dataproduct are published: Name, domain and a list of the BigQuery views. This endpoint can be used as an input for another data product or to retrieve information about this data product.

The module is open-source and can be found on GitHub and the Terraform Registry.

Example usage

A working example which shows both modules in action can be found in a separate GitHub repository.

Join us!

Our Terraform modules don't have all features envisioned by Zhamak Dehghani in her Data Mesh book. Instead, our modules are just starting point for a collaborative, open-source development driven by the need of data product developers. Join us in making these modules even better or by providing even more modules.

References