Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- Part One Fundamentals
- 1 Introduction
- 2 Features, Combined: Normalization, Discretization and Outliers
- 3 Features, Expanded: Computable Features, Imputation and Kernels
- 4 Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
- 5 Advanced Topics: Variable-Length Data and Automated Feature Engineering
- Part II Case Studies
- Bibliography
- Index
2 - Features, Combined: Normalization, Discretization and Outliers
from Part One - Fundamentals
Published online by Cambridge University Press: 29 May 2020
- Frontmatter
- Dedication
- Contents
- Preface
- Part One Fundamentals
- 1 Introduction
- 2 Features, Combined: Normalization, Discretization and Outliers
- 3 Features, Expanded: Computable Features, Imputation and Kernels
- 4 Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
- 5 Advanced Topics: Variable-Length Data and Automated Feature Engineering
- Part II Case Studies
- Bibliography
- Index
Summary
This chapter discusses Feature Engineering techniques that look holistically at the feature set, therefore replacing or enhancing the features based on their relation to the whole set of instances and features. Techniques such as normalization, scaling, dealing with outliers and generating descriptive features are covered. Scaling and normalization are the most common, it involves finding the maximum and minimum and changing the values to ensure they will lie in a given interval (e.g., [0, 1] or [−1, 1]). Discretization and binning involve, for example, analyzing a feature that is an integer (any number from -1 trillion to +1 trillion) and realize that it only takes the values 0, 1 and 10 so it can be simplified into a symbolic feature with three values (value0, value1 and value10). Descriptive features is the gathering of information that talks about the shape of the data, the discussion centres around using tables of counts (histograms) and general descriptive features such as maximum, minimum and averages. Outlier detection and treatment refers to looking at the feature values across many instances and realizing some values might present themselves very far from the rest.
Keywords
- Type
- Chapter
- Information
- The Art of Feature EngineeringEssentials for Machine Learning, pp. 34 - 58Publisher: Cambridge University PressPrint publication year: 2020