Real World Learnings

Implementing data mesh is nontrivial and highly dependent on the organization. Here, we capture real world learnings based on observations and experiences, so you might be able to get something out of it for your own data mesh journey.

We want this collection of real world learnings to be a community effort. Please share your observations and learnings with us so we can put them up here for you, or add them yourself via a pull request yourself.

Conflicting Goals

We are already more than busy implementing the user stories in our ever-growing backlog to meet the roadmap goals. We can't take on this additional responsibility for analytical data and data products.

Engineering team
  • Team takes protective stance on their roadmap-driven goals
  • Team's current goals don't align with data mesh
  • Structural and cultural change in the organization necessary

Cognitive Load

We are responsible for frontend, backend, databases, operations, and security. And now we have to perform data analysis and build data products with tools we have no experience with.

Engineering team
  • Respect the cognitive load limit that is individual for every team
  • Reduce their cognitive load with a supportive enabling team and a developer-friendly self-service data platform
  • Let them grow in their own pace

It is an Investment

It took us 3 sprints to build our first data product. And after a while, we had to invest even more because of hiccups and things we haven't thought about before.

Engineering team
  • Building a data product is a significant investment
  • It takes time to learn
  • Hiccups are to be expected as it's something new/different
  • Maintaining a data product requires continuous care
  • Enabling team and community of practice can help

Choose the Right Time

We are in the middle of a huge transformation from on-premise monolith to cloud-native microservices. And you come to us with this new hype topic, now?

Engineering team
  • Not the right time for this team
  • Come back when the current transformation comes to an end

Profit Right Away

We previously were roadmap-driven by management. By using the self-service data platform, we learned a lot about our product that sometimes even surprised us. Based on hypotheses by our product owner and the team, we were able to craft and defend our own roadmap with user stories that had high business potential according to our data. It feels great to measure the business value you contribute.

Engineering team
  • Starting the data mesh journey can quickly result in business value
  • The availability of a simple-to-use self-service data platform already brings high benefits
  • Qualify user stories before implementing allows focus
  • All this is already possible without even consuming or providing data products by the team

Strong Reject

We get a lot of feature requests by our business and stakeholders. After we started to validate those feature requests with data, we could reject at least half of them that weren't worthwhile from a business perspective.

Product owner
  • Increased capacity as they reject feature requests
  • Line of argument is easy with data

Teams are different

In an ideal world, each team could develop a data product and profit right away through analytics. In practice, this is not true for all teams. Teams may be on level 1 and gain good-enough results running queries on operational databases may have no immediate benefit to go the journey to level 4 to make their data available for other teams. Other teams may not profit from analytics, as their business is driven by fulfilling standardized processes, or they use standard software that already has good reporting for their domain.

Engineering team
  • Teams have different starting points
  • The return on investment may be unbalanced across the organization
  • Standard products, commercial off-the-shelf software, and cloud services may already provide good-enough data analytics capabilities

SQL is the Common Ground

We knew SQL pretty well from our operational PostgreSQL database. Our self-service data platform is based on Google BigQuery. So, after some initial efforts getting data into BigQuery, we were quite at home defining analytical queries and building data products. We needed to learn some SQL features that we did not use in the past, such as common-table-expressions and window functions, but this was not really an issue.

Engineering team
  • Engineers know SQL
  • SQL-based data platforms lower the barriers

Developer Experience Matters

dbt is awesome.

Engineering team
  • Optimize the data platform for developer experience
  • The enabling team can provide best practices and tools that work well

Pricing Drives Architecture

We use Athena with S3 for our data mesh. We were surprised, how much additional know how we had to learn, particularly for performance and cost optimization: How to efficiently transfer data from Kafka to S3? Single events that we have in production got expensive and took too long to process. Building batches increased latency and led to inconsistencies. JSON as storage format took too much space and costs, so we had to learn Parquet. With Athena, every query is billed and needs to be considered, e.g., when defining regular data quality checks.

Engineering team
  • Substantial efforts needed for data platform specific optimizations
  • Learning of additional unexpected technologies required
  • Cloud pricing models have significant impact on data architecture

Chicken or the egg

We need data from an upstream team. They don't get round to provide it. So we helped ourselves and simply import operational data from a feed they already provided.

Engineering team
  • Dependencies can hurt and slow down
  • Workarounds using operational data can unblock
  • One can switch from using the workaround to data product once it's available

Trust Your Data

We are annoyed. We need data from an upstream team. When they finally provided the data product, it had such a low quality that it was incomplete every other day. We inform them, and they fix it after a few hours. But it still happens again and again.

Engineering team
  • Teams might struggle to provide a high quality data product
  • Not enough know how and skills
  • Data without quality doesn't really help
  • Data quality monitoring is crucial

Wished but Unused

We don't see any return of invest. No one uses our data products, although they were requested by stakeholders. For us, providing them was a waste of time.

Engineering team
  • Data as a product might fail similar to any other product
  • Own data analysis neglected

We want this collection of real world learnings to be a community effort. Please share your observations and learnings with us so we can put them up here for you, or add them yourself via a pull request yourself.