Features, Combined: Normalization, Discretization and Outliers

Pablo Duboue

doi:10.1017/9781108671682.004

2 - Features, Combined: Normalization, Discretization and Outliers

from Part One - Fundamentals

Published online by Cambridge University Press: 29 May 2020

Pablo Duboue

Show author details

Pablo Duboue: Affiliation:
Textualization Software Ltd.

Book contents

Get access

Summary

This chapter discusses Feature Engineering techniques that look holistically at the feature set, therefore replacing or enhancing the features based on their relation to the whole set of instances and features. Techniques such as normalization, scaling, dealing with outliers and generating descriptive features are covered. Scaling and normalization are the most common, it involves finding the maximum and minimum and changing the values to ensure they will lie in a given interval (e.g., [0, 1] or [−1, 1]). Discretization and binning involve, for example, analyzing a feature that is an integer (any number from -1 trillion to +1 trillion) and realize that it only takes the values 0, 1 and 10 so it can be simplified into a symbolic feature with three values (value0, value1 and value10). Descriptive features is the gathering of information that talks about the shape of the data, the discussion centres around using tables of counts (histograms) and general descriptive features such as maximum, minimum and averages. Outlier detection and treatment refers to looking at the feature values across many instances and realizing some values might present themselves very far from the rest.

Keywords

normalization binning outliers outlier detection histogram descriptive statistics whitening zca whitening scaling standardization

Type: Chapter
Information: The Art of Feature Engineering
Essentials for Machine Learning
, pp. 34 - 58

DOI: https://doi.org/10.1017/9781108671682.004 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

2 - Features, Combined: Normalization, Discretization and Outliers

Summary

Keywords

Access options

Book purchase

Temporarily unavailable

Book contents

2 - Features, Combined: Normalization, Discretization and Outliers

Summary

Keywords

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive