Evidence suggests that the local food environment has implications for diet and physical activity behaviours( Reference Caspi, Sorensen and Subramanian 1 – Reference Gustafson, Hankins and Jilcott 6 ), but a lack of accurate environmental data remains problematic for much of this research( Reference Rossen, Pollack and Curriero 7 – Reference Liese, Colabianchi and Lamichhane 10 ). Most studies have relied on two secondary commercial data sources, Dun & Bradstreet (D&B) and InfoUSA, to characterize the retail food environment( Reference Fleischhacker, Rodriguez and Evenson 9 – Reference Gustafson, Lewis and Wilson 14 ). Results from field validation studies demonstrate moderate levels of agreement between these data sources and ground-level observations( Reference Gustafson, Hankins and Jilcott 6 , Reference Fleischhacker, Rodriguez and Evenson 9 – Reference Han, Powell and Zenk 11 , Reference Powell, Han and Zenk 13 , Reference Longacre, Primack and Owens 15 ), suggesting that these data are best used in combination when characterizing the retail food environment( Reference Auchincloss, Moore and Moore 12 , Reference Bader, Ailshire and Morenoff 16 – Reference Svastisalee, Holstein and Due 18 ).
However, no studies have assessed the validity of Nielsen TDLinx (TDLinx), a commercial database known for its rigorous data collection and research-based outlet type classification( Reference Auchincloss, Moore and Moore 12 , 19 , Reference Hoehner and Schootman 20 ). Unlike other commercial databases that update listings on a quarterly basis (e.g. D&B), TDLinx updates its listings on a monthly basis( 21 ), providing an advantage in areas with rapid food outlet turnover.
Few studies have evaluated the accuracy of commercial listings to characterize the food environment in areas with a high proportion of Latino residents, a fast-growing segment of the US population with high risk for diet-related chronic diseases( Reference Perez-Escamilla 22 ). While Latinos, particularly less acculturated Latinos, tend to shop at tiendas ( Reference Ayala, Mueller and Lopez-Madurga 23 , Reference Kaufman and Karpati 24 ), it is unknown how well represented tiendas and other small specialty stores are in commercial data sources. Obtaining valid, reliable measures of food environments in Latino communities is important for understanding barriers to healthy eating in this at-risk population.
Our primary aim was to examine agreement between retail food outlet data from two commercial databases, TDLinx (food stores only) and the D&B Duns Market Identifiers File (food stores and restaurants), relative to a field-based census of food stores and restaurants in thirty-one census tracts in Durham County, NC, USA of varying Hispanic population composition. We also tested whether agreement differed by Hispanic composition of the census tract and by field-based classification of ‘Latino’ stores.
Methods
Geographic area
Direct field observations were conducted in thirty-one of the sixty census tracts in Durham County, NC, USA, an area experiencing rapid population growth and increase in its Hispanic population( 25 ). Census tracts were selected to obtain a balanced representation of neighbourhoods with predominantly Hispanic, Black and White populations. Census tracts with the highest proportions each for non-Hispanic White (n 10), non-Hispanic Black (n 10) and Hispanic (n 10) were visited. Given its population and food outlet density, we also included the census tract containing the Central Business District (CBD). The observed tracts represented 49·9 % of the Durham County population.
Data sources
We obtained data for Durham County from two commercial databases for 2012: Nielsen TDLinx (referenced May 2012; Nielsen, New York, NY, USA)( 19 ) and the D&B Duns Market Identifiers File (referenced July 2012; Dun & Bradstreet, Inc., Short Hills, NJ, USA)( 26 ). TDLinx uses official industry-standard definitions for food store categories when available or its own rigorously developed definitions supported by trade associations (e.g. Food Marketing Institute) and trade publications (e.g. Progressive Grocer), classified with a standard trade channel and sub-channel code (see online supplementary material, Supplemental Table 1). D&B uses eight-digit US Census standard industry classification (SIC) codes to categorize food outlets. TDLinx only captures food stores with ≥$US 1 million in sales, while D&B does not have a criterion for sales volume and collects both food store and restaurant data.
N/A, not applicable.
* Match defined as a food store or restaurant observed in the field and listed in a secondary data source.
† Food outlets that were classified as both a food store and a restaurant by the field auditors (n 10) were included in both the food store and restaurant counts, regardless of their classification in secondary data sources.
‡ One food store was not given a category during the field audit.
§ One food store in the grocery and supermarket category was not given a sub-category during the field audit.
|| Two restaurants were not given a store sub-category during the field audit.
¶ Three matches were categorized as ‘both food store and restaurant’ by the field team.
** Eight matches were categorized as ‘both food store and restaurant’ by the field team.
†† Eight matches were categorized as ‘both food store and restaurant’ by the field team.
Field census
We developed an iPad data collection program adapted from a web-based Counter Tobacco Audit Tool( Reference Ribisl and Myers 27 ), that was preloaded with harmonized categories of food stores (from TDLinx and D&B) and restaurants (D&B); categories not found in the TDLinx and D&B databases for Durham County were classified as ‘Other’ (Supplemental Table 1). Between July and August 2012 (4 weeks), two teams of two trained data collectors each conducted a driving census of all food stores and restaurants in the thirty-one census tracts, recording and classifying all food outlets and collecting latitude and longitude of the locations using the iPad data collection tool.
The pairs of field data collectors (one driver, one data collector) drove all roads and streets in each census tract except private, unpaved or residential roads. All food outlets open for business and selling publicly accessible food were included and the following data were collected: name, address, latitude/longitude, currently open/closed, outlet type and whether it was a primarily Latino outlet. Conjoined outlets (e.g. KFC/Taco Bell) were separately classified as two outlets. All field censuses took place between 09·00 and 17·00 hours, with data collection from the car except in the CBD where, due to store density, data were collected on foot.
Outlets were classified as a food store, restaurant or both using categories and type sub-categories (Supplemental Table 1) based on characteristics observed from the outside and at the entrance of each establishment. Size of the facility, items sold, type of service provided and posted menus (restaurants only) guided the selection of outlet type. Stores and restaurants were classified as Latino/non-Latino on the basis of store name and language of signage on windows and doors (English, mostly Spanish, both languages equally)( Reference Emond, Madanat and Ayala 28 ).
Reliability analysis
Inter-rater reliability for identifying food outlets was conducted in the census tract that contained the CBD and a second census tract containing the largest number of food outlets. The observed proportion of agreement for both census visits in each tract (i.e. number of agreements divided by the total observations) was calculated for food stores, restaurants and total food outlets.
Statistical analysis
Sensitivity (proportion of outlets observed on the ground that were listed in the commercial databases) was calculated to assess the level of agreement between field census and secondary data sources of food stores (TDLinx and D&B) and restaurants (D&B), with the field census considered as the ‘gold standard’. Food outlets from the field census and commercial databases were matched based on food outlet name and address. Sensitivity was calculated by Latino/non-Latino classification and by Hispanic composition of the census tract (defined as ≥23·4 % Hispanic population (upper quartile of distribution)). Food outlets present in TDLinx or D&B and absent from the field census were investigated using the databases’ latitude and longitude coordinates and ArcGIS and Google Earth.
Results
Inter-rater reliability was 91 % for all food outlets in one census tract and 79 % in the census tract containing the CBD, a tract with relatively high number of outlets. The data collectors identified 174 food stores on the ground across the thirty-one census tracts (Table 1). One hundred and eleven (64 %) and ninety-five (55 %) of these food stores were listed in TDLinx and D&B, respectively. For TDLinx and D&B combined, 131 (75 %) food stores observed on the ground were listed in either source. For TDLinx, sensitivity was highest for convenience stores (76 %), whereas agreement in D&B was highest for grocery stores and supermarkets (65 %); levels of agreement in TDLinx and D&B were lowest for small specialty stores (6 % and 29 %, respectively).
The field data collectors identified 337 restaurants (Table 1). Among these, 228 (68 %) were listed in D&B. A moderately high number of counter-service restaurants and sit-down restaurants were missing from D&B (40 % and 32 %, respectively).
Twenty-five food stores were characterized by data collectors as Latino food stores, with 20 % identified in TDLinx, 52 % in D&B and 56 % in either D&B or TDLinx (Table 2). The data collectors identified twenty-six Latino restaurants, 38 % of which were listed in D&B. Agreement between the databases and the field census of food stores and restaurants did not differ substantially by Hispanic composition of census tracts (Table 2).
N/A, not applicable.
* Match defined as a food store observed in the field and listed in a secondary data source.
† Food outlets that were classified as both a food store and a restaurant by the field auditors (n 10) were included in both the food store and restaurant counts, regardless of their classification in secondary data sources.
‡ One food store was not given a store sub-category during the field audit.
§ Hispanic census tract defined as ≥23·4 % Hispanic population (upper quartile).
Discussion
Studies investigating associations between neighbourhood food environments and health outcomes commonly use commercial data sources to characterize the food environment. These data are usually less expensive and more time-efficient than direct field observations, albeit of lesser quality and validity. Secondary data sources often underestimate total food outlets, resulting in inaccuracies that may bias study findings( Reference Liese, Colabianchi and Lamichhane 10 ). Furthermore, the quality and validity of these data may differ by racial/ethnic composition of the population( Reference Rossen, Pollack and Curriero 7 , Reference Fleischhacker, Rodriguez and Evenson 9 , Reference Han, Powell and Zenk 11 , Reference Powell, Han and Zenk 13 ). While others have investigated validity of food stores in rural( Reference Sharkey, Dean and Nalty 29 , Reference Pitts, Bringolf and Lawton 30 ) and Native American communities( Reference Fleischhacker, Rodriguez and Evenson 9 ), there has been little research in Latino communities and by Latino food outlets, despite the fact that Latinos are at high risk for diet-related chronic diseases.
No research to date has investigated the validity of TDLinx, a comprehensive and time-varying database of retail food stores. We found that overall agreement between field census and TDLinx data in Durham, NC, USA was higher than that for D&B, suggesting that TDLinx may be more useful for characterizing total food stores. Additionally, we found that combining both secondary data sources improved overall accuracy by 12 % (75 % for both databases minus 63 % for TDLinx alone). On the other hand, the comparatively low levels of agreement in TDLinx and D&B for small specialty stores (6 % and 29 %, respectively) suggests that smaller stores were poorly identified by both databases. Our reliability assessment in the CBD indicated agreement for thirty-seven of forty-seven food stores and restaurants. We speculate reasons for relatively poor reliability included: stores were closed (n 2); lack of signage or poor signage (n 5); and human error (n 3), potentially due to high density of stores in the CBD (n 47).
In our study, the accuracy of food outlet listings in both databases did not differ considerably between Hispanic and non-Hispanic census tracts. TDLinx captured only 20 % of Latino stores (compared with 68 % of overall stores), while D&B performed better, capturing 58 % of Latino stores (compared with 52 % of overall stores). However, Latino-specific accuracy was much poorer than in the total sample. Furthermore, the added value of using both databases for this purpose was minimal (56 %), suggesting that both secondary data sources may be inadequate for characterizing local Latino food stores. However, it is possible that such food stores in Durham County, NC, an area with a new and growing Latino population( 25 ), may not have yet become part of these commercial food listings.
A potential limitation of TDLinx is that the database only captures food outlets with ≥$US 1 million in sales. Latino food stores, such as tiendas and bodegas, tend to be smaller than non-Latino food stores( 31 ) and thus more likely to be missed in the commercial databases. Latino food stores captured in D&B and absent from TDLinx had sales volumes in the hundreds of thousands of dollars, and thus did not meet the sales volume criterion of TDLinx. Tiendas and bodegas are an important food resource in Latino communities and immigrant neighbourhoods, but it is unclear to what extent these stores are supportive of healthy eating.
Although data collectors were extensively trained before collecting data, the field team may have under-counted food outlets. These data were obtained approximately a month prior to data collection, during which food outlets may have opened, closed or moved, resulting in additional variation in food outlet counts. These results may also not be generalizable for other areas with different neighbourhood characteristics (e.g. communities with more long-standing Latino communities or a higher percentage of Hispanic residents). Nevertheless, ours is the first study to assess the validity of a novel commercial database, TDLinx, in Latino and non-Latino food outlets. In addition, we compare findings using D&B, a more commonly used database, which had relatively similar sensitivity compared with other studies( Reference Fleischhacker, Evenson and Sharkey 32 ).
Because of the comparatively higher agreement between TDLinx and the field census for total food stores, our study provides support for using TDLinx, alone or combined, with other commercial databases such as D&B to characterize neighbourhood food stores. However, both secondary data sources poorly identified small and independent food stores, with D&B performing slightly better for Latino food stores. Investigators should be cautious of using these data to characterize neighbourhoods with small and ethnic food stores, and consider supplementing secondary data sources with primary data collection if resources are available.
Acknowledgements
Acknowledgments: The authors would like to thank Marc Peterson, of the University of North Carolina at Chapel Hill (UNCH-CH) Carolina Population Center (CPC), and the CPC Spatial Analysis Unit for creation of the environmental variables; Nicole Wilkes, Matthew Lewis, Andrew Bousquet, Molly O’Dwyer and Antony Wambui for the audit; Dr Kurt Ribisl for the iPad program; and Ms Erica Brody for her helpful administrative assistance. Financial support: This study was supported by the National Institutes of Health (NIH; grant numbers R01-HL104580 and R01-HL 114091). Additional support came from the NIH, the UNC-CH Clinic Nutrition Research Center (grant number NIH DK56350) and the UNC-CH CPC (grant number R24 HD050924); and from contracts with the University of Alabama at Birmingham, Coordinating Center (contract number N01-HC-95095); the University of Alabama at Birmingham, Field Center (contract number N01-HC-48047); the University of Minnesota, Field Center (contract number N01-HC-48048); Northwestern University, Field Center (contract number N01-HC-48049); and the Kaiser Foundation Research Institute (contract number N01-HC-48050 from the National Heart, Lung, and Blood Institute). Support for S.S.A. (PhD, MPH) was from the Postdoctoral Ruth L. Kirschstein National Research Service Award (award number T32 HD07168-33) through the UNC-CH CPC. The NIH and related contracts had no role in the design, analysis or writing of this article. Conflict of interest: None. Authorship: P.G.-L. and S.S.A. designed the study; P.G.-L. and S.S.A. coordinated data collection; P.E.R. carried out data analysis, interpretation and drafted the manuscript; P.G.-L. and S.S.A. made major revisions to the manuscript and all authors approved it for submission. Ethics of human subject participation: Ethical approval was not required.
Supplementary material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1368980014001281