INTRODUCTION
The recruitment of population controls in a timely fashion is an established limitation of case-control studies. Controls are selected independently of exposure from the population from which cases arise, do not have the health event of interest and should be eligible as cases if they were to develop the health event of interest. Established methods of control recruitment – including case- or physician-nominated controls and random or sequential digit dialling – can be time consuming, resulting in delays in identifying the source or vehicle of infection and might also introduce selection biases. Alternative methods that have been proposed include case-crossover [Reference Maclure1], case-case [Reference McCarthy and Giesecke2] and case-cohort [Reference Prentice3, Reference Le Polain de Waroux, Maguire and Moren4] study designs. As part of a review undertaken by Public Health England (PHE) to strengthen investigation of national outbreaks of infectious disease we identified the need to evaluate the potential use of commercial market research panels as a source of controls for epidemiological studies as part of outbreak investigations.
Commercial market research panels are pre-recruited groups of individuals who have opted in to receive online questionnaires in return for a reward. Review of the literature indicates that the use of market research panels as a sampling frame for prospectively identifying controls and collecting data using a web-survey has not yet been explored. Commercial market research databases that hold residential telephone numbers have previously been used as a sampling frame for recruiting controls by phone for epidemiological studies [Reference Connally, Yousey-Hindes and Meek5–Reference Valery7]. Pre-collected market research data have also been used as control data in a case-control study [Reference Gillespie8] and as denominator data [Reference Gillespie9]. However, the use of a market research panel for recruiting controls and collecting data with a web-survey has the potential for time and cost savings compared to traditional methods.
We evaluated the timeliness and cost of using a market research panel as a sampling frame for recruiting controls and capturing data against a control recruitment strategy employed by an outbreak control team (OCT) as part of a case-control study conducted in response to a national outbreak of Salmonella Mikawasima in the UK in 2013. The distribution of exposures among the two control groups and analytical findings of the case-control studies using each control group were also compared.
METHODS
Of those cases of S. Mikawasima identified as part of the outbreak, 61 were eligible for inclusion in a national case-control study. The OCT agreed that two controls per case, frequency-matched on sex and area of residence and restricted to those aged between 18 and 65 years, were required (n = 122) [Reference Freeman10].
Recruitment of controls from public health staff
The OCT (including authors G.K.A., G.D., R.F., A.C.) recruited controls using public health staff as a sampling frame (henceforth referred to as ‘staff controls’). Eligible controls were recruited from randomly sampled public health employees based in regions which corresponded to the residence of cases using random number generation in conjunction with lists of staff. These controls were interviewed by telephone by four investigating teams using a paper-based questionnaire. Double data entry from the paper questionnaire into a bespoke database was conducted to identify and address transcription errors prior to analyses; one team conducted all double data entry on behalf the four investigating teams.
Recruitment of controls from the market research panel
In parallel, the evaluation team (authors P.M., S.K., H.M., I.O.) recruited a second control population from a market research panel (henceforth referred to as ‘panel controls’). The evaluation team had no influence on the choice of controls by the OCT. The control definition was identical to that defined by the OCT with the substitution of public health staff with a market research panel member. The recruitment target, age restriction and the target gender and area of residence strata also mirrored that of staff controls.
The evaluation team developed a web-survey version of the staff control paper-based questionnaire using the web-survey software Select Survey (ClassApps, USA) and created a copy for each frequency-matching strata (i.e. each gender and geographical area combination). The web-survey, required number of responses and age, gender and geographical distribution of our target population was shared with market research company A, which maintains a volunteer panel with more than 600 000 respondents in the UK. A quote for the service of forwarding the web-survey link by email to a selection of panel members who met our criteria and rewarding those who participated was provided within 1 day. This quote was based on estimated time to complete the web-survey and the size and demographic distribution of the target population. The estimated time to complete the web-survey determines the point value received by participating panel members and collected points can be exchanged for rewards. Two links provided by company A were included on the completion and screen out pages so they might monitor completeness and screen out rates.
Company A stratified panel members into homogenous, mutually exclusive subgroups – based on frequency-matching criteria – and sent survey invitations by email to randomly selected panel members within each defined strata. The number of panel members randomly selected and to whom emails were sent was greater than the required number of responses and was determined by the number of complete responses required, the anticipated response rate of the panel (or strata) and expected proportion of panel members who meet the inclusion criteria (85% for this survey). Web-survey data from recruited individuals were captured via Select Survey and strata-specific web-surveys closed once strata-specific targets were achieved. It was the policy of company A that name, telephone number and postcode (items to be collected by telephone interview of staff controls) could not be collected from panel controls using the web-survey.
Comparison of exposures between control groups
The distribution of age, gender and place of residence among panel controls, staff controls and cases were summarized and compared. A ‘control-control’ study employing univariable analysis and multivariable logistic regression was conducted to compare the distribution of exposures among panel controls against that of staff controls. Odds ratios, 95% confidence intervals and P values were estimated.
Timeliness of control recruitment
The time required to recruit and capture data for panel and staff controls was summarized and compared. For panel controls, this period was calculated from the point at which company A launched the web-survey until the total required number of controls was obtained. For staff controls this period was calculated from when recruitment interviews by each investigating team began until the regional target for controls was obtained. The time required to develop the web-survey and paper-based questionnaire for panel and staff controls, respectively, was not considered.
Efficiency of control recruitment
Percentage of target panel controls recruited and details of how many panel members declined to participate by not opening the web-survey, opened but did not complete the web-survey or were not eligible were captured and summarized. These metrics were compared against percentage of target staff controls recruited and details of how many public health staff could not be contacted, declined to be interviewed or were not eligible.
Resource required for control recruitment
The direct resource required to recruit panel and staff controls was summarized and compared as an average cost per control recruited. The resource required to recruit panel controls was based on procurement costs for the services provided by company A. The resource required to recruit staff controls was estimated using information captured on a recruitment log: time per call to potential staff controls, outcome of the attempted recruitment and pay band of the staff member recruiting staff controls (assumed minimal level of progression into any given pay band unless otherwise stated). Additional cost of entering all hard copy questionnaires onto a bespoke database was also estimated.
Case-control analyses using each control group
Univariable analysis and multivariable logistic regression was conducted using the case data collected by the OCT and panel controls as the comparator to investigate the effect of exposures on outcome. Odds ratios, 95% confidence intervals and P values were estimated. The findings of these analyses were compared against those derived using staff controls by the OCT. The approach to refining models and other decision making in the analysis using panel controls was harmonized with that used by the OCT for analyses using staff controls [Reference Freeman10] and was validated by a third party to the evaluation team and the OCT (author A.W.). Analyses and data manipulation were performed using Stata v. 12 (Stata Corporation, USA), OpenEpi (http://www.openepi.com) and Excel (Microsoft, USA).
RESULTS
Response rates
Of 61 cases initially identified as eligible for inclusion in the study as per a case definition developed by the OCT, 21 (34%) were lost to follow-up or could not be followed up due to limited resources in some investigating teams. In addition, on further review one case did not meet the inclusion criteria [Reference Freeman10]. The number of responses from panel controls (n = 123) surpassed the initial target (n = 122) and met the target for each gender and investigating region strata. The number of responses from staff controls (n = 82) was 67% of the initial target.
Of 1329 market research panel members sent a link, 123 (9%) completed the web-survey, 262 (20%) started but did not complete, 20 (2%) were screened out based on eligibility criteria [Reference Freeman10] and 924 (70%) did not follow the link while the web-survey was live. Where data on outcome of recruitment attempts were available for staff controls (n = 69), 64% of staff for whom an attempt to contact was made were ultimately interviewed, 26% could not be contacted, 9% did not meet the eligibility criteria and 1% declined.
Characteristics of control groups
The distribution of demographic characteristics among cases and both control sets were similar for gender, investigating region and age group but there was evidence of a difference in ethnicity (P = 0·006, Table 1). The proportion of staff controls collected by the Scotland investigating team compared to elsewhere was less than that for panel controls (Fisher's exact test, P = 0·010). The proportion of both staff controls and panel controls in the 18–25 years age group compared to other ages was less than that for cases (Fisher's exact test, P = 0·027 and P = 0·048, respectively). In addition, the proportion of panel controls that were white was greater than that for cases (Fisher's exact test, P = 0·007).
* Target only set for frequency-matching criteria (region and gender).
† χ 2 test, cases vs. staff controls vs. panel controls.
The distribution of demographic characteristics (ethnicity, gender, age and investigating region) was similar between control sets (see Supplementary Appendix). When the distribution of clinical, travel and recreational, food consumption and food purchasing exposures were compared between control sets, panel controls more often reported having a headache, receipt of proton pump inhibitors (PPIs), eating pomegranate, chicken sandwiches, chicken pasties, chicken pies and chicken nuggets and eating in Chinese restaurants than staff controls (differences at the 5% level presented in Table 2; full details in the Supplementary Appendix). Compared to panel controls, staff controls more often reported recent travel in the UK, any outdoor activity, walks, other outdoor activities, using commercial bird food, buying raw chicken, eating or handling pumpkins, eating whole chicken or chicken portions, salad garnish, side dishes, herbs/spices, apples, dips, turmeric, cumin seed, mint, coriander, buying chicken from three national supermarket chains, eating out and eating in sandwich bars, fast-food restaurants, British restaurants or other locations (Table 2).
OR, Odds ratio; CI, confidence interval.
* In the panel control web-survey the question on commercial bird food was not conditional on feeding birds.
In total, there were differences at the 5% level for 29% of the 115 exposure and demographic items compared between control sets. Multivariable analysis identified eight exposures (7% of all items compared) with independent associations with either set of controls. Buying raw chicken, recent travel in the UK, eating or handling pumpkins, eating apples and participating in outdoor activities were positively associated with staff controls. Experiencing a headache in the last 7 days, receiving PPIs and eating any other chicken products were positively associated with being a panel control (Table 3).
aOR, Adjusted odds ratio; CI, confidence interval.
* Retained as a priori confounders.
Timeliness of control recruitment
Panel controls (n = 123) were recruited within 14 h of deployment of emails (Fig. 1) and staff controls (n = 82) were recruited over a 15-day period (Table 4) – this included the date of panel control recruitment. Double data entry of staff controls and case paper-based questionnaires onto a database took 76·5 h and was conducted intermittently over a 37-day period, after which staff control data were available for analysis (Table 4). Transfer of paper-based questionnaires from other investigating teams to the investigating team responsible for double data entry was a rate-limiting step.
* Double data entry of cases and staff controls. On review of the data one case and four staff controls met the exclusion criteria; the remaining 121 questionnaires comprised of 39 cases and 82 staff controls.
Resource required for control recruitment
Where data on time required to recruit and interview staff controls was available (n = 56, 68% of staff controls recruited), the average staff cost per collected questionnaire was estimated to be £3·88. These data were available for three of the four investigating teams; the total number of staff involved in follow-up in these three teams was 13 (range 2–6) and data was available for 11 of these staff members. The average staff cost to double-enter case or staff control questionnaires was £9·25; the total cost to recruit, interview and enter data for each staff control was £13·13. The average direct, invoiced cost per completed web-survey for panel controls was £3·60 (Table 5).
n.a., Not available.
* Based on available staff cost data.
† Double data entry of cases and staff controls; five controls were entered which were later excluded.
Findings of case-control studies
The findings of the OCT analyses, which used staff controls, indicated that receipt of PPIs, having visited a fast-food outlet regardless of chicken consumption history, having visited a restaurant and eaten chicken and having bought chicken from a local butcher were factors independently associated with illness [Reference Freeman10] (Table 6a ). Analyses using panel controls, indicated that having visited a fast-food outlet regardless of chicken consumption history, visited a restaurant and eaten chicken, bought chicken from a local butcher and consumed coriander or garnishes were factors independently associated with illness (Table 6b ). When the model derived using staff controls was applied to the analytical study using panel control data, those exposures independently associated with illness remained so with the exception of receipt of PPIs, which was not selected to be included in the panel control model (Table 6c ).
aOR, Adjusted odds ratio; CI, confidence interval.
* Retained in model as a priori confounder.
DISCUSSION
We have demonstrated that the use of market research panel members as a sampling frame for recruiting population controls and data collection through a web-survey offers value for money and substantial time savings compared to telephone interviews of public health staff. At £3·60, the average cost per control using market research panel members was over three times cheaper than the estimated average staff cost to interview and double-enter questionnaire data per control using public health staff (£13·13). The initial target of frequency-matched controls was surpassed using panel controls with 123 controls collected in 14 h at which point data were available to be cleaned and analysed. Eighty-two staff controls were recruited in 15 days using staff controls and subsequent double data entry was required before data were available to be cleaned and analysed. While there were differences in the distribution of exposures in the two control groups, both analytical studies found that illness was associated with consumption of chicken outside of the home, and purchasing chicken from local butchers. There was no microbiological evidence to support or refute these findings.
An ideal evaluation of a novel approach for recruiting controls and collecting data would be made against best current practice but all existing methods of control selection have limitations. In this instance, we have compared the use of a market research panel as a sampling frame for recruiting controls against a quasi-novel method using public health staff – a mechanism that has been used once before (PHE, unpublished data). It is not possible to assess which control set is most representative of the at-risk population but it is likely that both have inherent biases. It is feasible that in addition to a ‘healthy worker’ effect, public health staff might have healthier ‘lifestyle’ behaviours and more disposable income than the general population. Members of the public who volunteer to participate as a panel member might have systematic differences to those that do not, e.g. access to the internet, age, gender or lifestyle behaviours, and previous studies would suggest that volunteer panels do provide biased estimates [Reference Erens11, 12]. It is therefore difficult to interpret the findings of the comparison of the distribution of exposures between recruited public health staff and market research panel members reported here. Testing panel controls against other methods of control section may help understand any inherent biases with such controls.
Systematic differences between the distribution of demographic characteristics among the panel and that of the general population should be addressed using frequency matching where possible (e.g. for age and sex) to ensure that data are collected from controls that most closely resemble the at-risk population. Age of the panel member is known to affect promptness to respond and therefore might impact on the age distribution of responders if age strata targets are not used. Company A reported that younger age groups in its panel are slower to respond to surveys than older age groups and this may have resulted in the difference in age distributions observed between panel controls and cases even though the overall age distribution of panel members is younger than that for the general population (personal communication with company A). In addition, time of year might also affect how quickly panel members respond to a survey.
The process by which panel controls were recruited will, by design, lead to only a small proportion of those that receive an invitation to participate being recruited. Approximately 9% of panel controls invited to participate were recruited. A low response rate has been reported elsewhere for volunteer panels [12, Reference Craig13], but more panel members might have responded had the survey remained open once the target was achieved. To better assess ascertainment bias in future studies using panel controls, a comparison of the breakdown by demographic characteristics of those that did respond with those that did not (at least while the survey was open) is recommended and these data should be available from market research companies. We were unable to conduct this assessment due to 12 survey completers being misclassified as non-completers having not selected the completion link provided by company A. However, we were able to address this for the totals presented in the results by reviewing the collected data.
We observed a difference in the ethnicity distribution between cases and panel controls with the latter containing fewer individuals from a non-white ethnic group. Such a distribution has been observed previously in volunteer panels compared to the general population [Reference Erens11] but it cannot be addressed by frequency matching on ethnicity because this is not a data item collected when a panel member first volunteers. Ethnicity was not found to improve the fit of the presented multivariable models or be independently associated with outcome in this study.
The inherent differences in the distribution of exposures between control sets resulted in differences in the final multivariable models using each control set. PPIs (5% exposure in staff controls vs. 17% exposure in panel controls) were independently associated with illness only when staff controls were used, most likely as a predisposing factor [Reference Bavishi and DuPont14–Reference Bowen16] rather than a causal exposure, while coriander (7% vs. 2%) and salad garnishes (15% vs. 2%) were independently associated with illness only when panel controls were used. However, it is not possible to assess which control set provides the most accurate measures of association. It is possible that for a given exposure biases exist in opposite directions for the two control sets and that the distributions of these exposures in the population from which cases arose might be between that for the two control sets. In addition, the ultimate interpretation of findings from any outbreak analysis would take into account not just the significance test but also the size of the effect, the proportion of cases that can be accounted for by a given exposure and biological plausibility.
Both investigations identified the consumption of chicken outside of the home and the purchase of chicken from local butchers as being independently associated with illness. This demonstrates that the two control sets were similar enough to identify the same risk exposures when there is a strong causal association as appears to be the situation here.
There was incomplete data on timeliness and resource required for staff controls. We have assumed that there was no systematic difference in the time and resource required to recruit staff controls where data were not available but ascertainment bias could not be assessed. In addition, the opportunity cost for those staff recruiting and interviewing staff controls and the staff controls themselves while being interviewed was not estimated and therefore the total cost to PHE might be underestimated.
The exclusion of cases resulted in a change of gender and regional distributions which staff controls mirrored, maintaining a 2:1 ratio of controls to cases. Panel controls ultimately had more than a 3:1 ratio because they were recruited on the presumption of no case drop-out and ensured frequency matching as per the protocol. This increased ratio might have resulted in additional power to detect effects. As a result of the revised target, the investigating teams needed to only collect two thirds of the number of staff controls compared to panel controls which makes the difference in timeliness more apparent. The pricing policy of company A for panel controls was dependent on the required number of completed surveys, subject and the expected exclusion rate. Had the target number and geography been revised as per that for staff controls, the overall cost for collecting panel control data would have been less.
As with all outbreak investigations, the scale and severity of the outbreak and competing priorities probably influenced availability of staff resource which in turn might have impacted on the timeliness of recruiting staff controls and dispatch of questionnaires to the investigating team responsible for double data entry. The Christmas break might also have had some impact on how quickly staff control questionnaires were sent to the investigating team responsible for double data entry and emphasizes the impact of contextual considerations on any outbreak investigation. Similarly, deployment of the web-survey to panel members on 23 December might have had some effect on recruitment of panel controls.
Public health staff are likely to be more available during working hours and, as a more engaged population, more likely to respond to such a survey than members of the general public. Recruiting staff controls was therefore likely to represent the ‘best case’ in terms of accessibility, compliance and cost compared to other methods of population control recruitment through telephone interviews, e.g. random digit dialling. As a result, the use of panel controls might be even more cost-effective and timely when compared to other such methods. Panel controls receive a real time payment for a response and are therefore likely to be more compliant than the general public.
With the exception of name, telephone number and postcode, all questions in the paper-based questionnaire used to interview staff controls were replicated in the web-survey. It was important that those designing the web-survey were clear how questions were applied to staff controls and the two data collection instruments were as harmonized as possible. Responses to most questions in the web-survey were mandatory unless they were conditional on a previous response, ensuring high levels of data completeness.
There was a systematic difference in the mechanism of data collection for cases and staff controls (telephone interview) vs. that for panel controls (self-reporting by web-survey). Telephone interviews allow for further explanation of questions where the respondent is unclear or has misunderstood and is likely to improve quality of responses. This might have resulted in the introduction of information bias, specifically differential misclassification between panel controls and staff controls or cases. In future cases should ideally be interviewed by the same mechanism, i.e. by web-survey, to minimize potential differential misclassification.
A more direct comparison of the speed and resources required to recruit panel controls might have been made against contacting public health staff or members of the public by phone (e.g. by random digit dialling) and requesting an email address to which an online survey could be sent. This would have helped to assess whether the shorter times for recruitment and data collection for panel controls was as a result of recruiting from a market research panel or the use of a web-survey.
Company A audit data provided by panel members and remove individuals as necessary to maintain a high-quality panel. This should minimize instances where, given the incentive to respond, panel members provide quick but low-quality responses to surveys. They assert that given the size of the panel, panel members do not suffer from survey fatigue and this also benefits data quality. As per market research industry standards in confidentiality, full postcode details were not available and consequently it was not possible to determine an index of multiple deprivation (IMD) score for each recruited panel member although it might have been useful to consider as an independent variable in the analyses. Alternative suppliers of panels have indicated that an IMD score could be provided for each completed web-survey respondent if they were provided with an appropriate postcode to IMD score look-up table and the email address with which the respondent signed up to the panel was captured on the web-survey.
The use of a market research company to recruit controls for a case-control study using random digit dialling as part of the response to an outbreak of hepatitis A in Ireland was described as effective and efficient and required on average 1 h of calls per recruited control (n = 42), although the process was reported to have been delayed as a result of having to match to county [Reference Fitzgerald17]. The cost of using a commercial database of telephone numbers as a sampling frame to recruit controls by phone interview for a Lyme disease case-control study in the United States was not insignificant and the median time required to reach the person who ultimately completed the study questionnaire was approximately 19 min (n = 353) [Reference Connally, Yousey-Hindes and Meek5]. In both instances, these were lengthier processes per recruited control than the method we present using panel controls and a web-survey. However, it is difficult to fairly compare time and resource required with our findings as previous prospective studies using market research companies had different inclusion criteria, survey lengths and controls targets as well as differences in recruitment approach. Using retrospective market research panel data in a listeriosis case-control study in England was not considered to be inexpensive, was limited to those items that had been pre-collected and could not examine confounding because only grouped data were available (individual-level data was prohibitively expensive) [Reference Gillespie8].
A review of published outbreaks of a similar impact in Europe between 2003 and 2013 and for which a case-control study was conducted determined a median time interval of 16 days between hypothesis generation and availability of analytical results and a range of 7–59 days (n = 16) [Reference van de Venter, Oliver and Stuart18]. Control recruitment and data collection within a matter of a few days if not within a day using panel controls might contribute to reducing this interval in the future. The potential for substantial time savings has important consequences for public health benefit as more rapid investigations would enable more prompt implementation of control measures and might mitigate the burden of disease associated with an outbreak. Applying this method does not restrict subsequent use of a traditional recruitment mechanism if a clear answer is not apparent, or the exposure distribution of the panel controls appears unusual.
This approach could be used to recruit controls from further refined geographical granularity – at least to local authority level (personal communication with company A). Such panels are available in many high- and middle-income countries and by approaching companies that hold or collaborate with panels elsewhere this approach could be used to recruit population controls as part of an international outbreak investigation. Data are not directly available from such panels for individuals aged <18 years but consenting panel members might report on their children's exposures if an outbreak included children or consent for their child to complete a survey under their supervision. Rather than using self-selected volunteers, some panels in the United States and Europe have employed probability-based sampling methods, which might address any sampling bias [Reference Erens11].
While all methods of recruiting population controls are likely to have some inherent bias, the additional benefits of this method, particularly in terms of timeliness, make this a potentially valuable addition to the range of existing methods for use in outbreak investigations where self-reporting via a web-survey is acceptable. Panel controls should be considered as a possible method of control selection in future outbreaks even though further evaluation work, particularly against a range of control selection methods, would be useful to better understand the value and limitations of this approach.
SUPPLEMENTARY MATERIAL
For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S0950268815002290.
ACKNOWLEDGEMENTS
This research was funded by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Evaluation of Interventions at the University of Bristol in partnership with Public Health England. The views expressed are those of the authors and not necessarily those of the NHS, the Department of Health or Public Health England.
We thank staff in the public health investigating teams within Public Health England and Health Protection Scotland who contributed to this evaluation and Drs James Stuart and Sam Bracebridge for their assistance in the preparation of the manuscript.
DECLARATION OF INTEREST
None.