Data Preprocessing

Description:
Data cleaning, adaptation of missing values, feature selection, correlation analysis, Removal of constant & duplicated values, normalization, outlier removal

Links:
https://en.wikipedia.org/wiki/Data_pre-processing

Keywords:
Data cleaning, Data handling, Feature selection

Motivation:
Data preprocessing is required to transform raw data into informative data for further usage in modeling, analysis, ... . Overfitting, bias and worse information is avoided with proper preprocessing.

Requirements/Prerequisities:
Domain Knowledge & Methodical Knowledge

Level:
activity: description what you have to do in your specific level (e.g. define interface)

Application domain:
Data science (analysis & visualisation)

Main phase:
Data Science: Preparation/Integration

Related literature:
GARCÍA, Salvador; LUENGO, Julián; HERRERA, Francisco. Data preprocessing in data mining. Cham, Switzerland: Springer International Publishing, 2015.

In which projects do/did you use this practice?
FDI, COGNIPLANT, SmartDD, DeepRed

Data analyst

3–5 years of experiences
Software Competence Center Hagenberg

1. How do ​you rate the potential benefit for your projects? 5
2. How often are you using that practice? 5
3. What is the effort to introduce the practice in your project upfront? 4
4. What is the effort to apply the best practice in your project daily basis? 3

Questions 1, 3 and 4 (1 = Low, 5 = High)
Question 2 (1 = Never, 5 = Always)

You are running an old browser version. We recommend updating your browser to its latest version.

More info