Automate your data quality chceks

Description:
Automate regular DQ checks for your data with automation software. Add this tool into your pipeline and then consume DQ reports. Data team then can find weaknesses in its data and make change based on this DQ reports.

Links:
https://github.com/lisehr/dq-meerkat,
https://github.com/great-expectations/great_expectations

Keywords:
data quality, data completeness, data reliability

Motivation:
Having clean and accurate data is crucial for business to work precisely. DQ team should therefore work on DQ measurement and make changes in pipelines based on these findings.

Requirements/Prerequisities:
databeses, knowledge of data

Level:
generic: high level abstract best practice, metalevel category (e.g. manage architectures)

Application domain:
Data science (analysis & visualisation)

Main phase:
Data Science: Preparation/Integration, Data Science: Modeling/Training/Evaluation

Related literature:
MOSES, Barr. Data Quality Fundamentals. O'Reilly Media, 2022. ISBN 1098112040.

In which projects do/did you use this practice?
Covid Dashboard, OpenData API, OpenData

Software Engineer, Data Analyst

0–2 years of experiences
Masaryk University

1. How do ​you rate the potential benefit for your projects? 5
2. How often are you using that practice? 2
3. What is the effort to introduce the practice in your project upfront? 4
4. What is the effort to apply the best practice in your project daily basis? 3

Questions 1, 3 and 4 (1 = Low, 5 = High)
Question 2 (1 = Never, 5 = Always)

You are running an old browser version. We recommend updating your browser to its latest version.

More info