Graph Data

Pablo Duboue

doi:10.1017/9781108671682.009

6 - Graph Data

from Part II - Case Studies

Published online by Cambridge University Press: 29 May 2020

Pablo Duboue

Show author details

Pablo Duboue: Affiliation:
Textualization Software Ltd.

Book contents

Get access

Summary

This chapter starts Part II (domain dependent feature engineering) by describing the creation of the base WikiCities dataset employed here and in the next three chapters. This dataset is used to predict population of cities using a semantic graph built from Wikipedia infoboxes. Semantic graphs were chosen as an example of handling and representing variable-length raw data as fixed-length feature vectors, particularly using the techniques discussed in Chapter 5. The intention with this dataset is to provide a task that can be attacked with regular features, with time series features, textual features and image features. The chapter discusses how the dataset come to be, an Exploratory Data Analysis over it, resulting in a base, incomplete featurization. From there, a first featurization was produced, with an error analysis process including feature ablation and mutual information feature utility. From this error analysis, a second featurization is proposed and an error analysis using feature stability concludes the exercise. All insights are captured in the two final feature sets, one conservative and other expected to have higher performance.

Keywords

graph data machine learning on graphs variable-length feature vector one hot encoding examples error analysis examples exploratory data analysis examples dbpedia population prediction

Type: Chapter
Information: The Art of Feature Engineering
Essentials for Machine Learning
, pp. 139 - 162

DOI: https://doi.org/10.1017/9781108671682.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

6 - Graph Data

Summary

Keywords

Access options

Book purchase

Temporarily unavailable

Book contents

6 - Graph Data

Summary

Keywords

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive