INTRODUCTION
Public health agencies and epidemiologists use spatially referenced health data to identify clusters of infection. Timely identification of disease clusters can guide disease prevention efforts and stem current outbreaks but the accuracy of location data is crucial to the outcome of such epidemiological analyses. Here, we use a 21-year dataset of South Australian Ross River virus (RRV) infections to address how the accuracy and scale of location data can impact the conclusions of spatial analyses.
RRV is a zoonotic mosquito-borne pathogen; although it can amplify in at least 18 vertebrate reservoir species (predominantly mammals) and infect over 30 species of mosquito, the most important vectors in South Australia (SA) appear to be Aedes camptorhynchus and Ae. vigilax in coastal regions, and Culex annulirostris inland [Reference Russell1]. It is well-known that mosquito-borne infections can be acquired anywhere within a person's activity space [Reference Jacquez, Waller, Mowrer and Congalton2]. However, in RRV epidemiology, the residential address of infected patients is almost exclusively used as a proxy for the place of infection [Reference Gatton3–Reference Williams, Fricker and Kokkinn5], although some studies acknowledge the limitations of this assumption [Reference Carver, Sakalidis and Weinstein6] or the benefits of recording presumed source of infection [Reference Bi7]. The accuracy of using the patient's address as the location of infection depends largely on the transmission dynamics of the infecting agent. For example, urban-adapted mosquito-borne infections like dengue and malaria are more likely to infect individuals at home, particularly if those homes lack screens and other mosquito-exclusion designs [Reference Tusting8–Reference Dale11]. In Australia, where most houses have screens and air conditioning, people are more likely to contract a mosquito-borne infection (e.g. RRV), when working or recreating outside the home [Reference Harley12].
To address the question of location accuracy, some public health agencies record the patient-reported ‘source’ of the infection (i.e. the place the patient believes they were infected). While this field can correctly identify infections acquired while travelling or at work, it also can introduce errors by ascribing infections to the wrong location. Both fields (place of residence or source of infection) therefore, hold potential biases which are difficult to identify or control.
Compounding the location accuracy problem is the issue of spatial resolution; disease cases are typically aggregated to geographical units. If the aggregation units are larger than the scale at which disease dynamics occur, epidemiological relationships may be masked or altered; in general, the larger the scale of aggregation, the more bias is introduced [Reference Diggle and Elliott13]. Therefore, disease data should be assessed at the finest scale possible to improve the accuracy and utility of research outcomes.
Previous epidemiological studies of RRV in SA have only been conducted on a regional scale, units which are too large to be useful in implementing mitigation strategies [Reference Williams, Fricker and Kokkinn5, Reference Bi7, Reference Weinstein14–Reference Selden and Cameron17]. Those previous studies suggest that RRV is mainly a problem of endemic regions along the Murray River where the virus circulates annually at low rates. Metropolitan Adelaide has been considered at low risk of RRV due to overall low RRV case and seroprevalence rates [Reference Weinstein14, Reference Faddy18] but recent mosquito surveillance has identified both RRV and Barmah Forest virus in metropolitan Adelaide [Reference Flies19], prompting a reassessment of these conclusions.
To advance the effectiveness of contemporary public health interventions, we analysed 21 years of South Australian RRV case data from 1992 to 2012. We constructed risk maps using patients’ place of residence and patient-reported source of infection at the suburb level. We compared the conclusions from these two types of location data and interpret them through the lens of known RRV ecology. We compared conclusions from a suburb-level spatial analysis to the prior analyses using regional data. Finally, we present how a binomial representation of LandScan ambient population data can improve the interpretation of risk maps and how this can lead to management actions at a finer scale than previously possible.
METHODS
Data acquisition
Arbovirus infection data were acquired from the South Australian Department of Health's Notifiable Infectious Disease Surveillance System database. Data from confirmed cases of RRV infection notified between 1 January 1992 and 31 December 2012 (n = 5261) were extracted on 24 April 2013, de-identified, and provided to the authors (human ethics approval from UniSA: 0000030917 and SA Health: HREC/13/SAH/05). RRV cases are usually confirmed by detecting RRV-specific IgM or a fourfold rise in RRV-specific IgG titres. Of the confirmed cases, 113 had no geolocating information and five had no patient's age; these cases were removed prior to analysis.
Organization of spatial data
State suburbs (SSCs) cover the majority of SA but do not cover Indigenous locations (ILOCs). Therefore, the suburb and ILOC shapefiles were merged using the ‘identity’ function in ArcGIS [20] and then dissolved by unique code. This method identified the 12 ILOC areas in rural SA which are not covered by a suburb which were therefore added to the suburb list. Hereafter, this list of SSCs plus ILOCs (n = 858) is referred to as suburbs. The resulting suburbs vary greatly in size, population and disease occurrence; relevant suburb metrics are reported in Table 1. The statistical computing platform R [21] was used for all data manipulation and statistical computations herein.
SMR, Standardized morbidity ratio.
During the 21 years spanned by this dataset, the Australian Bureau of Statistics (ABS) made some changes to their geographical classification system. One common implication of these changes was that many ‘localities’ were recorded in early case records instead of a suburb; these were corrected to the corresponding suburb using ArcGIS. If a locality corresponded with more than one suburb, the most populous suburb was selected as its replacement to produce a more conservative standardized morbidity ratio (SMR). When a source of infection was not listed or was not identifiable to suburb level (e.g. ‘South Australia’, ‘outback’), the suburb of residence was used [the ‘Riverlands (indeterminate)’ designation was reported separately before being reassigned].
Once the above corrections were made to the residence and source data, the two fields were compared; if they did not match, the case was identified as having been ‘acquired away from home’.
Calculating SMRs and SMR differentials
SMRs were age-standardized; census population data, using place of usual residence binned by 5-year age groups (up to age 84 years, followed by an ⩾85 years category, resulting in 18 bins), were acquired from the ABS 2011 census for each suburb through the Table Builder program [22]. Populations listed as ‘unclassified’, ‘no usual address’ and ‘migratory’ were removed from the census data because they cannot be mapped and since any corresponding cases of infection would have been removed due to lack of location data.
Case rates were age-standardized using state averages as follows:
where r a is the per person case rate for age group a, o ia is the observed number of cases for age group a in suburb i, y is the number of years in the dataset (21 years) and n ia is the total number of residents in age group a in suburb i. The expected number of cases (e i ) for suburb i was then calculated according to the following formula:
Finally, the SMR was calculated for suburb i:
where o i is the average annual number of observed cases for suburb i and e i is the expected number of cases for suburb i. Source vs. residence differentials were calculated by subtracting residence SMR from source SMR. The standard deviation (s.d.) of the resulting differentials was used to symbolize the maps.
Accounting for small populations and stochasticity
As with many arboviruses, RRV is expected to be highly under-reported and the awareness and practices of medical doctors or clinics can impact whether cases are confirmed and reported. For suburbs with small populations, the difference of a few cases could have a large impact on the SMR and, in initial calculations of SMRs, we found this to occur. To account for these issues, we identified all suburbs which were expected to have <1 case for the 21-year study period (e i *21<1). These suburbs had their SMR adjusted to 1 (n = 189) unless during the study period, the suburb had >1 observed case per year (o i > 1).
The SA population grew 13% during the study period, so using the 2011 populations would usually create a conservative SMR. However, in some suburbs, the population declined. Therefore, for all suburbs originally identified as highest risk according to both source and residence (n = 13), we calculated new SMRs using the 1991 population. This recalculation identified two suburbs (Leigh Creek and Woomera) which had significant population declines (63% and 87%, respectively) which inflated their SMRs. These two suburbs were reassigned the SMR based on the 1991 population and removed from Table 3.
One suburb, Port Pirie (population 244) had an extremely high SMR for both source and residence (316 and 320, respectively). Port Pirie suburb is one of six suburbs within the city with the same name (Port Pirie, population 14 059). We presumed that patients reporting their residence/source as Port Pirie were referring to the city, not the suburb, which led to the extremely high SMRs. We recalculated new SMRs using the summed cases and populations for the six Port Pirie suburbs, resulting in SMRs of 3·81 for residence and 3·66 for source. These adjusted SMRs were then applied to all six suburbs within Port Pirie city.
Mapping symbology and methods
SMR maps were colour-coded according to thresholds of 0·1, 0·5, 2 and 10 (Figs 1 and 4). Source vs. residence differential map was centred at zero and symbolized based on standard deviations (s.d. = 4·12). Areas with differentials 2 s.d. above or below zero were considered significantly different (Fig. 2).
LandScan
We used the LandScan™ high-resolution global population dataset copyrighted by UT-Battelle, LLC, operator of Oak Ridge National Laboratory under contract no. DE-AC05–00OR 22 725 with the US Department of Energy [Reference Bright23]. LandScan shows the average ambient population count/km2. To calculate this, LandScan uses local census data and road proximity, slope, land cover, etc. to determine where humans are distributed over a 24-h period [Reference Dobson24]. Contrary to census data, distributing population this way shows where people are active and thus includes outdoor recreation areas and work areas for which no permanent residence exists. We re-projected the LandScan data to GDA94/SA Lambert and re-sampled at a 1-km resolution before using it to visualize our data. We present the use of LandScan data [Reference Dobson24, Reference Bhaduri25] as a binomial mask; if a LandScan cell (or pixel) has a population of ⩾1, it is set to no colour so the underlying SMR value is visible; if the LandScan cell has zero population, it is greyed-out and the underlying calculated risk is not visible.
RESULTS
Of the 5143 cases of RRV infection during this time period, 1928 (37%) were reported to have been acquired away from home. Of these 1928 cases, the top three reported sources are ‘Riverlands (indeterminate)’ (284, 15%), Northern Territory (105, 5%), and Queensland (103, 5%). New South Wales was the sixth (77, 4%), Victoria was the seventh (76, 4%), and Western Australia was the fifteenth (31, 2%) most commonly reported non-residence source of infection. The rest of the analysis focused on the South Australian suburbs. Table 2 shows the 15 South Australian suburbs most commonly reported as non-residence sources of infection.
Rank includes ‘Riverlands (indeterminate)’ and interstate sources. Total source cases excludes resident cases ascribed to a different source.
* Borders the Murray River.
† Has at least one caravan park.
Adelaide, Whyalla and Coffin Bay are the only frequently reported sources of infection that do not border the Murray River. Adelaide and Whyalla are urban sources, Coffin Bay is a small, coastal tourist town bordered by a national park and samphire saltmarshes.
Comparing residence and source SMR maps
Regardless of which location field was used (source or residence), most of metropolitan Adelaide had below average SMRs and most of the high SMRs either bordered the Murray River or the coast or were in the northeast outback of the state (Fig. 1).
It was evident from our mapping that case rates can vary distinctly among adjoining suburbs, suggesting that RRV risk varies at the level of suburb or smaller.
Figure 2 highlights the areas where residence SMR and source SMR are significantly different. No suburbs were reported significantly more as patient residence than as an infection source. Six suburbs reported significantly more as the source of infection than as patient residence (clockwise from northeast): Marree, Blanchetown, Morgan, Punyerloo, Coffin Bay and Elliston (indicated by the red arrow in Fig. 2).
Areas of highest risk
Figure 3 and Table 3 show the suburbs where both source and residence SMRs are >10. There are 14 and 22 suburbs with a SMR >10 when using residence and source, respectively, and 25 suburbs when either residence or source SMR was >10. There are 11 suburbs which are reported to have >10 times higher risk of RRV infection compared to what is expected given the population, regardless of which georeferencing data is used. We consider these to be the locations of highest risk in the state; most (73%) of these suburbs border the Murray River (Table 3).
* Borders the Murray River.
† Has at least one caravan park.
‡ Indulkana and Indulkana Homelands.
LandScan binomial mask addition
With the statewide SMR map without LandScan (Fig. 1), a dominant feature is the large areas of elevated risk in the sparsely populated northern regions of the state. When our binomial LandScan layer is overlaid on this map (Fig. 4), the unpopulated areas of the North are no longer visible.
DISCUSSION
Patient-identified source of RRV infection has been collected since 1992 in SA but an analysis explicitly comparing risk using source vs. patient residence has never been published for any disease data, to the best of our knowledge.
We have shown here the importance of having reliable location data for epidemiological mapping of mosquito-borne disease. Neighbouring suburbs can vary greatly in their SMRs, suggesting that RRV disease risk varies at the scale of suburbs or smaller. Furthermore, different suburbs were identified as highest risk when using source vs. residence locations; recording the patient-presumed source could be important for public health interventions, risk mapping and understanding the transmission ecology of this and other mosquito-borne infections.
Highest-risk suburbs
Eleven suburbs had a RRV disease risk 10 times higher than expected, according to both source and residence (Table 3, Fig. 3). To sustain high levels of RRV transmission, an area must have animal reservoirs, mosquito vectors and humans, all of which rely heavily on water to survive. Therefore, it is not surprising that all of the highest-risk areas border the Murray River or the coast, except Indulkana and Indulkana Homelands.
In Indulkana and Indulkana Homelands, 82% of residence cases were contracted in the summer of 1997 when substantial February rainfall (www.bom.gov.au) resulted in ephemeral water bodies and mosquito outbreaks in otherwise arid regions. Therefore, risk in this and other arid regions, appears to be episodic and highly dependent on rainfall events.
Cowell and Melrose are towns on the Spencer Gulf coast, both with substantial mangrove and saltmarsh habitat for saltmarsh-breeding mosquitoes. One study trapped adult mosquitoes along the Spencer Gulf coast and found RRV vectors Ae. camptorhynchus and Ae. vigilax to form the majority of the mosquito community and Cowell to have the highest density of these species out of the four towns sampled [Reference Williams26].
The rest of the highest-risk suburbs border the Murray River reinforcing prior evidence that the Murray ‘Riverlands’ region has elevated RRV infection prevalence [Reference Weinstein14–Reference Selden and Cameron17]. The Murray River region has an abundance of vector mosquitoes, particularly Ae. camptorhynchus and Cx. annulirostris [Reference Keith and Jennings27], which feed on a variety of potential reservoir mammals (although not macropods) [Reference Flies28], and has a higher prevalence of mosquito-borne pathogens per trap night and per mosquito, compared to metropolitan Adelaide [Reference Flies19]. Camping is a popular activity along the Murray River and a known risk factor for RRV infection [Reference Harley12]. Most of these highest-risk river towns have at least one caravan park (Table 3) and those that do not (Moorook South and Cadell) border conservation parks with camping options. Outdoor activities always pose a greater risk of exposure to vector-borne disease and particularly so in areas or times of high mosquito abundance. These highest-risk regions seem to represent areas where an abundance of vector mosquitoes combines with prominent outdoor activities to produce higher disease risk for residents and tourists.
Source vs. residence
The differences between source and residence SMRs (Fig. 2) revealed some areas that may represent heightened risk but would be missed if risk was quantified using patient's residence. However, the question remains as to whether patient-identified source is a reliable indicator of the true source of the infection. The two states most frequently identified as the infection source (Northern Territory and Queensland) also have the highest rate of RRV infection [29]. Furthermore, of the 14 suburbs identified as common sources (Table 2), 11 border the Murray River and five are identified as highest-risk areas (Table 3). All but Adelaide have caravan parks, despite some being small towns with a population <1000. These factors all suggest to us that patients’ reports may accurately identify infection source.
The locations with significantly higher source SMR, compared to residence SMR (Fig. 2) are generally areas that attract tourists which raises the question of whether these areas present a high RRV risk or only appear to do so because the population exposed is actually the resident population (used for SMR calculations) plus the tourist population.
In SA, Adelaide has by far the greatest share of visitation nights and day trips (37% and 35%, respectively), followed by Fleurieu peninsula (10% and 20%, respectively [30]). In our analysis, suburbs in both of these regions generally have average or below-average SMRs, suggesting that the tourist areas identified as having significantly high source SMRs do represent areas of elevated infection risk. Furthermore, in contrast to tourism in Adelaide, visitors to the Murray River Valley often engage in outdoor activities (e.g. camping, fishing, boating) which expose people to the abundant vector mosquitoes found there [Reference Flies19, Reference Flies28]. Elliston and Coffin Bay both have beaches and conservation areas as well as saltwater wetlands that are prime breeding habitat for two of the main RRV vectors, Ae. camptorhynchus and Ae. vigilax, again suggesting the validity of patient-reported sources of infection.
Residents living in these highly reported source areas are also likely exposed to RRV. However, children infected with RRV rarely develop disease [Reference Kay, Aaskov and Monath31] but do develop antibodies which are presumed to be protective despite some evidence for antibody-dependent enhancement [Reference La Linn, Aaskov and Suhrbier32, Reference Lidbury and Mahalingam33]. Therefore, the holiday locations identified frequently as the source of infection may represent areas where non-immune people (who were not exposed as children) engage in risk behaviours (camping, outdoor activities, etc.) in areas with an abundance of vector mosquitoes, which are true areas of high RRV infection risk. However, without a statewide dataset of RRV-infected mosquito densities with which to compare source and residence suburbs, our conclusions remain subject to further research.
Using LandScan to improve map interpretation
When mapping disease in sparsely inhabited areas, choropleth maps can be misleading. Although we mapped using the finest geographical division available, the large, sparsely populated suburbs in the north of the state dominate the map. Although it is statistically accurate to apply an SMR to the entire area for which it is calculated, it might be ecologically inaccurate to do so, especially when large portions of the area are unoccupied. For example, the suburb that comprises the large, high-risk area in the northeast of the state in Figure 1 b is 20 s.d. above the mean area (km2) for suburbs in the state but has a below-average population (636; state average population is 1857, Table 1). Such large, sparsely populated areas present two problems: (a) disease risk can artificially appear high due to the small population size and (b) the large portion of the map occupied by these areas can overshadow interpretation. We tried to correct for the small population size with our SMR calculations. The addition of LandScan (Fig. 4) helps refocus our attention on the areas that are both high risk and inhabited. This approach, while not altering the basic epidemiological calculations used when creating the SMR maps, corrects for risk values assigned to locations where humans do not exist to be infected thereby providing a more realistic spatial presentation of disease risk.
In addition to improving map interpretation, the LandScan layer further fine-tunes the scale from suburb to the 1 km2 level at which LandScan is calculated. The scale at which RRV risk operates in SA is unknown but different RRV ecologies exist across the state [Reference Flies28] and vector and virus prevalence appears to vary at the scale of kilometres [Reference Johnston34]. In Western Australia for example, RRV risk was found to be highly localized, with the greatest risk of infection occurring within 1–2 km of mosquito breeding sites [Reference Jardine, Neville and Lindsay35]. The LandScan layer can be used by public health officials to focus disease abatement efforts on areas where high infection risk intersects with human activity.
In this analysis we saw significant variation occurring at the level of suburbs, suggesting that RRV infection risk varies at a smaller scale than was previously believed in SA. This finding is supported by studies across Australia which have found that RRV infection rates (human and mosquito) vary at a relatively small scale [Reference Ryan4, Reference Jardine, Neville and Lindsay35–Reference Hu37]. However, although some studies have sought to explain the temporal patterns of RRV outbreaks in SA [Reference Williams, Fricker and Kokkinn5, Reference Bi7], very little work has been conducted to explain the spatial distribution in SA as has been done elsewhere [Reference Ryan4, Reference Jardine, Neville and Lindsay35–Reference Ryan, Do and Kay38]. The fine-scale variation in disease rates we see here could be due to variation in reporting/testing by medical practitioners, behavioural differences of the patients or could represent variation in transmission ecology [Reference Flies19, Reference Johnston34]; potentially all three factors are influencing disease rates to some degree. Long-term, statewide surveillance of mosquitoes and the viruses they are carrying, combined with spatial analysis of how social and ecological variables correlate with RRV cases would allow us to tease apart the influence of these factors on RRV infection risk.
Our analysis agrees with previous studies which identified the Murray ‘Riverlands’ as presenting high and metropolitan Adelaide as presenting low RRV infection risk [Reference Bi7, Reference Weinstein14–Reference Mudge16, Reference Mudge39]; we extend those findings to identify the suburb-level risk variation (Fig. 1). We identify the suburbs of highest risk (Table 3, Fig. 3) which (excluding the inland region of Indulkana and Indulkana Homelands, which had episodic infections) should be the focus of regular vector control, public education, disease warnings and future RRV vaccination campaigns, should a vaccine become available [Reference Wressnigg40]. We enlisted LandScan data to improve map interpretation and identify, on a 1 km2 scale, where infection risk and human activity intersect (Fig. 4); being able to identify areas of highest risk at this fine scale means improved implementation of public health measures and a greater ability to prevent RRV infections across the state. Finally, we recommend public health agencies collect infection source data, as this seems to be a reliable indicator of where infection occurs.
Overall, we have highlighted the value of high-resolution mapping techniques to better target public health interventions, and illustrate a novel application of ambient population data (LandScan) to improve the interpretation of these risk maps. These combined approaches are particularly pertinent to vector-borne disease management in regions of high variation in population density, and can allow disease abatement efforts to be implemented at a smaller scale than for which disease data are available.
ACKNOWLEDGEMENTS
We are very grateful to South Australian Department for Health and Ageing, the hundreds of public health workers who collected these data over 21 years and particularly Dr Ann Koehler for providing us with the case data used herein. We also thank Dr Andrew Flies and Professor Paul Sutton for providing helpful feedback which enhanced this manuscript and Dr David Slaney and Dr Johannes Foufopoulos for early comments on this project.
This research was supported by the University of South Australia President's Scholarship and the School of Pharmacy and Medical Sciences Scholarship (both to E.J.F.).
DECLARATION OF INTEREST
None.