Data Preprocessing

Data cleaning, adaptation of missing values, feature selection, correlation analysis, Removal of constant & duplicated values, normalization, outlier removal


Data cleaning, Data handling, Feature selection

Data preprocessing is required to transform raw data into informative data for further usage in modeling, analysis, ... . Overfitting, bias and worse information is avoided with proper preprocessing.

Domain Knowledge & Methodical Knowledge

activity: description what you have to do in your specific level (e.g. define interface)

Application domain:
Data science (analysis & visualisation)

Main phase:
Data Science: Preparation/Integration

Related literature:
GARCÍA, Salvador; LUENGO, Julián; HERRERA, Francisco. Data preprocessing in data mining. Cham, Switzerland: Springer International Publishing, 2015.

In which projects do/did you use this practice?

Data analyst

3–5 years of experiences
Software Competence Center Hagenberg

1. How do ​you rate the potential benefit for your projects? 5
2. How often are you using that practice? 5
3. What is the effort to introduce the practice in your project upfront? 4
4. What is the effort to apply the best practice in your project daily basis? 3

Questions 1, 3 and 4 (1 = Low, 5 = High)
Question 2 (1 = Never, 5 = Always)

You are running an old browser version. We recommend updating your browser to its latest version.

More info