In this vast (not to say infinite) field of data, over the past few years, you've probably come across the concept of modern data architecture, Business Intelligence and numerous publications including images, articles, and writings that summarize the "modernity" of data architecture in the use of cloud resources, such as orchestrators, data lakes, distributed processing systems, and trends around lifecycle management like Data/ML Ops. Although these elements represent progress and assist in resolving computing, storage, and development issues, the true modernity doesn't reside there. We've merely transferred the same old problems to a more robust infrastructure that allows us to work with data on a larger scale.
Issues such as the lack of genuine governance beyond the implementation of a cataloging system, data silos, poor quality, incipient data management policies in large organizations, lack of real accountability (usually IT or the CDO office), slow responsiveness to change, and a false self-service of data persist. These problems remain unsolved despite the technologies offered by major manufacturers. And herein lies the point: the responsibility for changing the paradigm and addressing these problems doesn't rest with the major manufacturers or their products. The true responsibility lies in the type of data strategy that organizations manage to deploy.
Thanks to one of our team members who shared an article criticizing the weaknesses of monolithic data lakes, I embarked on a journey towards a new paradigm that, from my perspective, is genuinely disruptive and challenges our traditional ideas about data architecture and management. These ideas, although they leverage the modernity of technological platforms, haven't progressed beyond the dimensions of computing capacity and the storage of various large data volumes.
Today, we're facing a new data management paradigm called Data Mesh. This approach genuinely disrupts how data management tackles the challenges I've broadly described in the previous lines. Data Mesh finally clarifies the path towards what it truly means to be a data-driven organization, aiming to enhance and improve various aspects of the functional or business areas, as we commonly refer to them.
In essence, Data Mesh goes beyond technical components and infrastructure. It encourages us to talk about Modern Distributed Data Architecture, where data and its handling are no longer the responsibility of IT or the CDO, but now the responsibility of different business areas that generate data. This turns data into a product of vital and real importance, and engineering and data science capabilities are no longer exclusive to digital or technology offices where there's little to no business knowledge. This new paradigm drives the idea of moving away from data centralization (monolithic data lakes) to bring the ownership and responsibility of data to business areas. This implies that instead of sending domain data (business units or processes) to a centrally owned data platform or lake, business units or domains should store and expose their datasets in a way that's easy to query and exploit.
From this point, one can envision a whole generation of decoupled data microservices, pipelines, data as a product, and multifunctional independent teams in business areas that have integrated data engineers and scientists generating real value and ensuring data quality, of which they're now the owners. All this under a central governance and a real and distributed self-service data infrastructure in use. Fascinating, isn't it?
I invite you to delve deeper into this new and modern approach by reading O'Reilly's recent publication titled "Data Mesh," written by Zhamak Dehghani. She is undoubtedly the driver or custodian of this new and intriguing data management paradigm. This is a paradigm that Igerencia and its team of professionals are already adopting and deepening to assist you, dear reader, in managing data within your organization.
By Felipe Moreno Director UEN-BI & BA
Comments