This work proposes the TruData platform, which allows reducing time and effort, in addition to increasing the satisfaction in performing data cleaning tasks through reuse and collaboration. To do this, a solution model was developed to support the sharing of data quality requirements and data cleaning operators from multiple platforms (e.g., Python and R), in addition to collecting usage data from people with different roles and from different domains, which experience similar data quality issues. In addition, an application based on the solution components related to the reuse of data cleaning operators was implemented and validated.
- A solution model that enables the reuse of data cleaning elements and broad collaboration among professionals who experience similar data quality issues;
- A Proof of Concept (PoC) of this model focused on operator reuse, which validates the scenario of obtaining and applying data cleaning operators;
- A comparative analysis of data cleaning tools available in the market, highlighting reuse and collaboration characteristics;
- A comparative analysis of related works from academia and industry regarding reuse and collaboration in data cleaning tasks.
Author: Rogério Carminé
Type: MSc thesis
Partner: FEUP – Faculdade de Engenharia da Universidade do Porto