Azure Synapse Analytics

Data mesh is primarily an organizational approach, and that's why you can't buy a data mesh from a vendor. Technology, however, is important still as it acts as an enabler for data mesh, and only useful and easy to use solutions will lead to domain teams' acceptance. The available offerings of cloud providers already provide a sufficient set of good self-serve data services to let you form a data platform for your data mesh. We want to show which services can be used to get started.

Microsoft offers Azure Synapse Analytics, along with both Data Lake Storage Gen2 and SQL database, as the central components for implementing a data mesh architecture.

Data Mesh Architecture with Azure Synapse Analytics

We evaluated the Azure tech stack, especially the services around Azure Synapse Analytics. We like the idea of an integrated environment to manage and analyze data and Microsoft even provides data mesh specific guidance in their Cloud Adoption Framework.

To be honest, we struggled a bit during this evaluation, because we expected a more supportive developer experience. We couldn't figure out how to set up a data mesh architecture MVP that made sense for us. The services are convoluted: What database should we use to store analytical data? The SQL database or lake database that are integrated into the Synapse Workspace, or should we use an external object storage with Parquet files that we link into Synapse Analytics? When to use Azure Blob Storage, when to use Azure Data Lake Storage Gen2? Some features are supported by the built-in serverless SQL pool, some features require a dedicated SQL pool. We even were surprised how difficult it is to ingest messages from our operational Kafka in the Confluent Cloud. Currently, there is no managed Kafka Connector for Azure sinks available, so we had to run our own microservice to forward messages (not directly Azure's fault, but this was part of the overall experience we had). We had similar experiences with implementing transformations, data sharing, and visualization—there is no straightforward way to accomplish these tasks. In addition, due to recent rebrandings on Azure services, many tutorials on the internet are outdated.

Our learning is that you should not underestimate the efforts for upskilling the domain teams to use Azure Synapse Analytics as a data platform. Invest in an enabling team that guides and supports domain teams on their journey implementing data mesh.

We assume that many of our difficulties are due to a lack of extensive experience with the platform. We are open to chat with Azure experts who might know more than we do - feel free to reach out to us.

If you're on Azure, we currently would recommend to consider Snowflake deployed to Azure, or, if many of your domains do machine learning, have a look at Azure Databricks.

References