Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content improvement and diagrams for data mesh #5

Open
JayGhiya opened this issue Apr 22, 2023 · 0 comments
Open

Content improvement and diagrams for data mesh #5

JayGhiya opened this issue Apr 22, 2023 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@JayGhiya
Copy link
Member

Please check the below content and improve on it with data product description

  • Uno Data Mesh

What is Uno Data Mesh?

Uno Data Mesh aims at managing data in a decentralized and domain-driven manner. The Uno Data Mesh Should empower cross functional teams to own and manage their data products enabling faster time-to-maket , better data quality and increased scalability.

Why is it needed?

Now let us try to understand traditional data warehousing and data lake approaches
And the issues they often faced.

Let us first dissect the term data warehouse - Data warehousing is a method of centralizing data from various sources into a single repository typically in a structured format for reporting
And analysis purposes. Data is extracted from different source systems transformed into a common format and loaded into warehouses for analysis. A good example of data warehouse is amazon redshift.

However data warehousing has several limitations. Data warehouses are typically expensive to set up and maintain. Amazon reviews of redshift also indicate careful management and optimization to achieve the best results. Typically data warehouses were also designed to handle structured data making it challenging to integrate unstructured data such as text and images. Also data warehouses are typically optimized for read-intensive workloads. Example teradata. Example - Teradata.

Now let us understand the term data lakes - Data Lakes are designed to store vast amounts of raw, unstructured and semi-structured data in its native format without the need for upfront transformation or schema definition. Data Lakes were developed to address some of the limitations of data warehousing, such as high cost and complexity of integrating data from various sources and limitations of structured data.

However, data lakes also have significant drawbacks. First data lakes can easily become data swamps due to poor data quality , data lineage clarity issues and data governance.

So to tackle all of the issues caused by traditional approaches we believe that a centralized metadata registry that enables data lineage tracking ,data quality , auto-discovery of data, and collaboration is the need of the hour. The Centralized metadata registry needs to have integrations with a wide range of data tools.

  • How does Uno Data Mesh improve the data journey of any organization ?

Uno Data Mesh makes use of openmetadata (https://open-metadata.org/) to meet its data mesh objectives in terms of data discovery , data lineage, data quality and collaboration across organization.

@JayGhiya JayGhiya added the documentation Improvements or additions to documentation label Apr 22, 2023
@JayGhiya JayGhiya added this to UnoPlat Apr 22, 2023
@JayGhiya JayGhiya moved this to Done in UnoPlat May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants