From Bytes to Bites; Advancing Data Collection Methodologies for Enhanced Branded Food Insights

L.B. Kirwan; E. O’Sullivan; S. Hogan; F. Douglas; D’O Kelly

doi:10.1017/S0029665124007249

From Bytes to Bites; Advancing Data Collection Methodologies for Enhanced Branded Food Insights

Published online by Cambridge University Press: 16 December 2024

L.B. Kirwan ,

E. O’Sullivan ,

S. Hogan ,

F. Douglas and

D’O Kelly

Show author details

L.B. Kirwan: Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
E. O’Sullivan: Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
S. Hogan: Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
F. Douglas: Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
D’O Kelly: Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Nutrition research relies on food databases which are extensively used in dietary surveys, clinical practice, research, and policy development (1). Online data volume is expected to increase up to 180 zettabytes by 2025, due to a proliferation of internet-connected devices, the growth of social media platforms, and a digital transformation of industries (2). Webscraping, a method to extract data from websites, has been previously used in Ireland to evaluate online retailer information as a potential source for monitoring food reformulation efforts in the Irish retail market (3). This study aims to outline a process for, and evaluate the use of, webscraping on online supermarket websites to increase data availability to researchers.

An online supermarket website was selected to trial the new process. Octoparse software version 8 was downloaded. 12 data fields of interest were identified; cost, lifestyle, net weight, Directions for use, Storage instructions, Nutrition information, Front of pack information, legal name, brand name, manufacturer, ingredients, and allergy advice. A process was defined for data web scraping in four main steps; 1) collection of category level URL’s, 2) collection of product level URL’s, 3) collection of data at product level within defined fields and 4) data cleaning and re-structuring. A workflow was created in Octoparse for steps i - iii and step iv was completed using Excel version 16.69.1.

83 category level page links were generated and entered into Octoparse software. Webscraping was completed on 3,095 product level URLs. Data on 1,450 products (47%) were successfully scraped as they had data within the 12 defined data fields. A new dataset was created for the 1,450 products with data fields including information on nutrition (energy, fat, of which saturates, carbohydrate, of which sugars, fibre, protein and salt), costs per serving and per kg, lifestyle factors (e.g. whether a product was vegetarian or vegan), ingredient lists and allergy advice. 637 products (44%) were found to have vegetarian/vegan claims. Micronutrient level data was limited.

An increased availability of online data presents an opportunity for the development of new and more systematically updated datasets, and may increase the availability of information on branded products. Webscraping enables researchers to create new databases, and systematically update datasets, with less resources. This study enhances the availability of data and may enable researchers to explore new avenues for understanding food environments. Future research should test the process on additional websites to increase coverage of the Irish retail market and across different regions, identify sources with more in-depth nutritional data, and evaluate use case in mobile applications. Web scraping offers a promising tool for advancing research in food science and nutrition, and providing access to diverse datasets for research and innovation that change with the times.

Type: Abstract
Information: Proceedings of the Nutrition Society , Volume 83 , Issue OCE4: Nutrition Society Congress 2024, 2–5 July 2024 , November 2024 , E486

DOI: https://doi.org/10.1017/S0029665124007249 [Opens in a new window]

References

Yeung, Andy Wai Kan (2023) Nutrients 15, no. 16: 3548. https://doi.org/10.3390/nu15163548.CrossRef Google Scholar

Taylor et al. (2023) Statista. Amount of data created, consumed, and stored 2010-2020, with forecasts to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/.Google Scholar

O’Neill, M et al. (2022) Proceedings of the Nutrition Society. 82 (OCE4), E241.CrossRef Google Scholar

Article contents

From Bytes to Bites; Advancing Data Collection Methodologies for Enhanced Branded Food Insights

Abstract

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests