Introduction
The rapid spread of coronavirus disease-2019 (COVID-19) has created a worldwide pandemic with high morbidity and mortality rates [1]. Nationally, there is a disproportionate impact on rural populations in terms of deaths and hospitalizations [2–Reference St Sauver, Grossardt and Yawn5]. While rural areas initially experienced lower testing and cases [Reference Goetz, Tian, Schmidt and Meadowcroft6,Reference Souch and Cossman7], since August 2020, the trends have reversed with COVID-19 cases per capita in small/medium metro and non-metro areas exceeding large metro central and fringe areas after mid-August 2020 [Reference Duca, Coyle, McCabe and McLean8]. Death rates in non-metro areas also exceeded death rates in metro areas from late August to mid-December 2020 [9]. Compared with urban residents, people living in rural areas report less willingness to be vaccinated for COVID-19 [Reference Khubchandani, Sharma, Price, Wiblishauser, Sharma and Webb10,Reference Kirzinger, Muñana and Brodie11], and, as our community-based survey indicated, less engagement in COVID-19 preventive behaviors, for example, masking [Reference Maciejko, Fox and Steffens12].
Minnesota’s Governor issued a Shelter-in-Place order from March 27, 2020, to May 13, 2020. The first COVID-19 case in rural parts of the four-county Southeast Minnesota study area (Dodge, Goodhue, Olmsted, and Wabasha Counties) was reported on March 17, 2020. From March through October, the total number of COVID-19 cases was 142,311 in Minnesota and 4880 in the four-county area that is the focus of this study [13], with rural areas accounting for an estimated 41% of area cases based on Rochester Epidemiology Project data (REP; an NIH-funded data linkage system for study populations).
We performed geospatial and temporal trend analysis of COVID-19 experience in rural parts of four Southeast Minnesota counties (50% rural, i.e., with RUCA Code other than 1.1) [14], to examine the influence of geographic factors in COVID-19 epidemiology in a Midwest region. We also examined whether identified hotspots are associated with social determinants of health (SDOH), focusing on socioeconomic status (SES) and housing characteristics. Understanding the effects of where people live within counties, as well as SDOH, could more precisely guide outreach efforts and public health interventions (e.g., COVID-19 testing and vaccination) for rural populations.
Methods
Study Setting
Medical records-based research of the area population was performed through access to COVID-19 laboratory test data from the REP database. The REP database includes a majority of residents in the study area, with their inpatient and outpatient clinical diagnoses and address information [Reference St Sauver, Grossardt and Yawn5]. Comparing REP data to population estimates from the Census 2018 5-year ACS data, geocoded REP records for those with research authorization covered 75.0% of rural residents in the four-county study area (90,975 of 121,241), ranging from 63.7% in Goodhue County to 93.2% in Olmsted County [Reference St Sauver, Grossardt and Yawn5]. Since REP has an overall research authorization level of 90.1% for a 27-county area of which the study area is a part [Reference Rocca, Grossardt and Brue15] the variation in coverage is the result of residents getting health care from providers not covered by REP.
Similar coverage applies to residents with COVID-positive tests in the region (including both urban and rural parts), with geocoded REP records representing 76.4% of cases (3728 of 4880 cases) through the study period. The rural population of these four counties is 95.3% White (93.3% non-Hispanic White [NHW]), with 3.1% Hispanic of any race, 0.7% Black, 0.5% American Indian, 0.9% Asian, and 2.6% Other/Mixed [14]. Rural portions of the region had a lower proportion of households in poverty (3843 households of 48,401, 7.9%) as compared to urban households (9.6%) [14].
Study Design and Cohort
This is a population-based retrospective cohort study assessing the temporal (semi-monthly) and geospatial distribution of test-confirmed COVID-19 cases in the rural population from March 17, 2020, to October 31, 2020. We used the geocoded portion of the REP population living in rural areas (see rural classification below) (denominator N = 90,975) and utilized the REP database to identify people who had COVID-19 tests and corresponding test results. For people tested multiple times, the date of the first negative test was retained for temporal analysis purposes, unless superseded by a positive test. In that case, the date of the first positive test was used for temporal analysis. The unit of analysis is thus persons tested (n = 24,243), and not tests. SARS-CoV-2 testing was performed according to manufacturer’s instructions for the real-time reverse transcription polymerase chain reaction (RT-PCR)-based cobas SARS-CoV-2 assay (Roche Molecular Systems, Inc., Branchburg, NJ), which received emergency use authorization from the US Food and Drug Administration. This assay detects the SARS-CoV-2 ORF1ab and E gene sequences; test results were reported as target detected, target not detected, presumptive positive (when only the E gene sequence was detected), or inconclusive (when PCR inhibition was present).
The study was approved by the Mayo Clinic and Olmsted Medical Center Institutional Review Boards.
Rural Classification
We identified “urban” as populations residing inside the City of Rochester or the block groups (BGs) identified in RUCA Class 1.1 (see map in supplement). All other areas, including smaller cities and townships, were considered rural and included in the analysis [Reference Kurani, McCoy and Lampman16,17]. By this two-way classification, the four-county region is 50.4% rural and 49.6% urban [14]. The rural population of the 66 townships (unincorporated jurisdictions) in the study area ranged from 170 to 2873 [14]. Township rural REP population density ranged from 4.0 to 115.0 (mean 14.5) per square mile. The population of the 31 rural cities (incorporated jurisdictions) in the study ranged from 91 to 16,338 (median 1268). There are 11 cities with populations over 2500 (urban clusters by Census definition) [18]. City REP population density averaged 553 per square mile.
Geospatial Analysis
-
1. Geocoding: The addresses of persons in the REP were geocoded using parcel-based geocoding methods, yielding precise household location and housing characteristics (e.g., apartment, mobile home community [MHC], or single-family house), in relation to the epidemiology of COVID-19.
-
2. Weighting: Case density was weighted as in other related studies [Reference Rocca, Grossardt and Brue15]. As positivity of COVID-19 testing depends in part on the level of testing, we accounted for the proportion of persons tested for COVID-19 within the Census block group compared to the overall rural population in each county, by applying a weight derived by the formula: W=(BGpop/Rurpop)/(BGTP/RurTP), where W is the weight, BGpop is the Census block group population, Rurpop is the rural county population, BGTP is the number of tested persons in the Census block group, and RurTP is the number of tested persons in the rural portion of the county. The resulting weights were then applied to each positive test in subsequent analysis steps.
-
3. Trend analysis: To examine temporal trends in the spatial locations of hotspots, we collected data for COVID-19 cases and testing for all of March 2020 (3/17–3/31), early April (4/1–4/15), late April (4/16–4/30), early May, late May, and so on, mapping concentrations of cases for each time period. For purposes of analysis, due to low numbers of rural cases, we grouped periods March–June (148 city cases, 63 township cases), July–August (194, 129), September (194, 130), October 1–15 (130, 98), and October 16–31 (264, 147).
-
4. Determining hotspots: We applied a similar geospatial analysis approach as used for our previous studies [Reference Wi, Wheeler, Kaur, Ryu, Kim and Juhn19,Reference Patel, Wheeler and Wi20]. For areas within cities, we mapped the kernel density of weighted positive cases using a half-mile bandwidth. The half-mile bandwidth made it possible to detect the influence of individual apartment complexes, mobile home parks, individual subdivisions, and so on. Larger bandwidths made it harder to detect the influence of these geographic characteristics. Smaller bandwidths increased the number of areas having high weighted case density but lacking cases, due to the influence of positive cases located surrounding but outside the “hotspot.” For each period and for the combined analysis for March through October 2020, we defined “hotspots” as areas with case density in the 90th percentile or higher across rural cities in the four-county area AND with relative difference at least 33% higher than expected case density. Relative difference was derived using the formula RD = (wOCD-ECD)/ECD, where RD is the relative difference, wOCD is the weighted observed case density, and ECD is the expected case density based on average incidence applied to the REP population.
In all but 2 of 66 townships, population density is so low throughout the township that expected case density per square mile is less than 1, such that the relative difference approach described above yielded distorted results. As an alternative, we identified households with positive cases and applied kernel density methods with a one-mile bandwidth to identify positive-case households that were within one mile of and within the same grouped time period as another household with a positive case. We assigned township hotspot status to these concentrations.
Other Pertinent Variables
Basic demographic characteristics (age, sex, race/ethnicity) were extracted from medical records available from REP. For SES, we used the HOUSES index, a validated individual-level socioeconomic measure linked with addresses at the time of testing (or at the end of study period for those not tested) [Reference Juhn, Beebe and Finnie21]. Since its original validation, HOUSES index has been widely used for clinical and epidemiological studies concerning 38 different health outcomes and behaviors as well as health care delivery in both children and adults [Reference Juhn, Beebe and Finnie22–Reference Lynch, Finney Rutten and Jacobson46].
Data Analysis
Apart from geospatial and temporal trend analysis for COVID-19 cases in the community, we compared sociodemographic characteristics of study subjects within hotspots with those outside hotspots using logistic regression models. Separate analysis was performed for small cities and townships. We also described patterns of COVID-19 laboratory testing and positivity during the study period, stratified by hotspot status (within vs outside hotspots) and locations (small cities vs townships). Geospatial analysis was performed using ArcMap 10.4.1 (produced by ESRI).
Results
Characteristics of Study Subjects
Of 90,975 rural residents included in the analysis, 51.7% were female, 92.9% were White (90.1% NHW), 1.0% African American, 0.8% Asian, 0.4% American Indian, and 3.6% other race or two or more races (Other/Mixed); 4.7% reported Hispanic ethnicity. The median age was 42.7 years old (inter-quartile range 21.1–62.1).
Prevalence of COVID-19, Temporal Trends and Characteristics COVID-19 Cases
A total of 24,243 geocoded rural subjects (26.6%) were tested at least once of whom 1498 (6.2% of tested and 1.6% of rural population) tested positive. Since the first COVID-19 case was confirmed on 3/17/2020, new cases per month peaked in early July, declined in late July and early August, and increased again to a higher peak (four times the July peak) in late October (see Fig. 1). Similar trends were observed for positivity rate.
Figs. 2a–d show temporal (monthly) trends of COVID-19 cases in relation to demographics (2a for age, 2b for sex, 2c for race/ethnicity, and 2d for SES).
While there were some fluctuations of COVID-19 prevalence over time, the proportions by gender and age subgroups had similar temporal trends. The highest prevalence of COVID-19 was among 20–44 years of age (2.3%), followed by 45–65 (1.9%). The proportion of racial and ethnic minorities (12.2% of rural population, excluding unknown) among COVID-19 cases was 33.3% in April and 22.8% in July and dramatically decreased in August through October, to 7.0%. The proportion of Hispanic persons (4.7% of the rural population) among cases was as high as 17.6% of total cases in April and 14.8% of cases in July, falling to 3.9% of cases in October. Overall, Hispanic persons accounted for 7.8% of cases. White cases exceeded the White proportion of population (94.3%) only in October (94.6%). Despite disparities in COVID-19 cases, little difference in testing rates was found by race: 29.1% African American, 26.5% Hispanic, 26.1% Asian, 26.6% White, 27.5% American Indian, and 28.0% Other/Mixed race (24.7% unknown/refused). Positivity rates were higher among several minority groups than among NHWs, with rates of 5.9% for NHW versus 9.7% for African American, 8.8% for Other race/Mixed race, and 10.2% for Hispanic persons of any race.
The proportion of the HOUSES quartile Q1 (lowest SES) testing positive exceeded its share of population during the March to June period. The share of cases among persons in the lowest HOUSES quartile was lower for all remaining periods, ranging from 68% to 96% of the overall average. For the share of cases among the highest quartile (Q4) exceeded the average of all SES levels in all periods. Hotspots in the March through June period (see Supplement) included 3 large MHCs (2 in cities, 1 in a township) and 12 apartment complexes in cities, identified through aggregating cases by address and verifying the structure types at the addresses with more than 11 REP records.
Testing proportion varied with city population. In cities under 1000, 23.2% of the population was tested versus 30.2% for cities over 5000. Positivity rates in small cities ranged from 5.2% in cities with population over 5000 to 6.1% in cities under 1000. Township population size was related to testing but not positivity. The testing rate in townships over 1200 was 27.6% versus 21.8% in townships under 500. Cities had more cases per capita in March–June, then townships had more cases per capita until late October.
Geospatial and Temporal Trends of COVID-19 in Rural Areas
Temporal geospatial analysis county maps are provided in the Supplement. Geospatial analysis results based on the entire study period are summarized in Fig. 3. Note that because hotspots are based on case density, overall March to October hotspots are somewhat influenced by high case numbers in September and October, when White, single-family, and higher SES households experienced high numbers of cases per capita. See maps in the Supplement for more temporal detail.
Hotspots occurring in cities tended to recur from month to month, while township hotspots were consistent only in concentrations of housing outside but adjacent to cities.
Temporal trends differed for all minorities compared to NHW persons, with cases per capita among the total racial minority population exceeding the rate for NHW persons in March through September (e.g., 16% higher in the July–August period), but lower by 1% in October. Hispanic persons of any race (the largest minority, 4,242 persons) per capita case rates were 2.2 times the NHW rate in March–June, 3.4 times in July–August, 1.9 times in September, and 0.8 times in October.
Finally, while all race, ethnic, and SES groups experienced an increase in cases in September and October, the large absolute increase in cases in October occurred chiefly due to an increase in cases among the White population, higher SES, and in single-family residential areas in rural communities.
Comparison of Population Characteristics Inside and Outside Hotpots in Rural Areas
Rural city hotspots accounted for 33.3% (325/975) of cases and 14.1% of rural city population. As shown in Table 1, people living in city hotspots compared to other city residents tended to be younger, included a similar or higher proportion of minorities (except for African Americans), and higher SES. Hotspots and non-hotspots had the same proportion of HOUSES Q1 (lowest SES) residents through August at 23.2%, indicating the influence of a high proportion of higher SES cases in September and October.
* P values for testing association between variables and hotspot status (in hotspots), using logistic regression.
** Other/Mixed category includes other, mixed (2+ races), American Indian, and Hawaiian/Pacific Islander.
Rural township hotspots accounted for 48.8% of cases (277/568) and 27.1% of township population. As shown in Table 2, people living in township hotspots compared to other township residents tended to be younger, included a higher proportion of minorities, and were of higher SES.
* P values for testing association between variables and hotspot status (in hotspots), using logistic regression.
** Other/Mixed category includes other, mixed (2+ races), American Indian, and Hawaiian/Pacific Islander.
Table 3 compares city and township hotspots with city and township areas outside hotspots in terms of test positivity over time and the cases (positive tests) per 100,000 population per day for the five aggregated time periods in our analysis. In both cities and townships, and for all periods, hotspots had higher positivity levels and higher cases per 100,000 population. The rate of cases per capita increased over 15 times from March–June to late October.
* Note: Positive tests (cases) are not weighted by level of testing.
Discussion
The rural burden of COVID-19 is similar to that experienced by other populations experiencing health disparities. Our longitudinal geospatial analysis adds new information on geographic risk factors significantly related to the overall burden of COVID-19 and associated racial/ethnic and SES groups within rural communities depending on the timing of the pandemic. Geospatial analyses showed consistent hotspots in several cities and in a few areas of townships, even after adjusting for underlying population density. While other studies have reported county-level geographic clusters of COVID-19 cases [Reference Emeruwa, Ona and Shaman47,Reference Drew, Nguyen and Steves48], this study demonstrates the importance and utility of identifying geographic hotspots within counties. Hotspots significantly accounted for the COVID-19 burden in the rural areas of these four midwestern counties. To our knowledge, this is the first longitudinal geospatial analysis for COVID-19 epidemiology at a neighborhood level in rural counties in the Midwest region of the USA. As a practical matter of targeting preventive measures and interventions, the hotspots identified through this study establish a method to focus efforts in neighborhoods at higher risk. This may be especially important during periods in which pandemic surges make contact tracing difficult. Testing and tracing efforts could be guided by identifying hotspots within counties.
The areas identified as hotspots in our geospatial analysis reflected a broad range of neighborhoods in terms of SES depending on the timing of the pandemic, with lower SES and minority households especially affected early (March through June 2020) in the pandemic and higher SES households affected later (July through October). This observation is novel and has important implications for understanding and designing the public health interventions. In these four rural counties, the relationship between SES and race/ethnicity and risk of COVID-19 depends on geographic location and timing (whether in an early or later stage) of the pandemic. Results mirrored national trends in some respects and in the urban settings [Reference Juhn, Wheeler and Wi49]. Significant disparities in the burden of COVID-19 occurred despite community factors mitigating health disparities in this region such as higher median family income than the national average. We found that for most months, racial and ethnic minority populations were disproportionately impacted by COVID-19, especially in the beginning of the pandemic and those residing in townships, even in rural areas where their share of population is low. These findings have implications for interventions regarding preventive measures and vaccination. During the early phases of a pandemic, community health interventions should focus on under-resourced populations. During later phases, broader segments of the population need to be engaged. Our study findings indicate that community health interventions and allocation of resources (e.g., public health education, testing/tracing, and vaccine roll out) could benefit from data on the geographic distribution and neighborhood characteristics of patients and populations in the context of timing of the pandemic, given the well-recognized health effects of the places in which people live [Reference Baum, Wisnivesky, Basu, Siu and Schwartz50] and other SDOH [Reference Emeruwa, Ona and Shaman47,Reference Drew, Nguyen and Steves48,Reference Rasmussen, Khoury and del Rio51]. For example, the study demonstrates that given a sufficiently rapid turnaround from testing to geographic analysis, the findings could guide precise outreach and other interventions based on geographic hotspots, for example, to neighborhoods with housing types (e.g., apartment) associated with hotspots. In this sense, the study represents a proof of concept. Prompt development of analyses will enable an evolving response to evolving conditions. As an application in our study setting, a geospatial analysis-guided flu vaccine outreach team went out during COVID-19 pandemic to vaccinate target populations in hotspots with influenza vaccine (when COVID vaccine was not available) to avoid the burden of influenza for under-resourced populations who were already significantly affected by COVID-19. This strategy can be applied to COVID-19 vaccinations depending on the geographic vaccination rates (e.g., hotspots for low vaccine uptake).
Our study has important strengths. First, our study is a population-based study leveraging a geographically well-defined population, a self-contained health care environment, and the REP, an electronic data repository for our region. Second, our study is the first longitudinal temporal geospatial analysis for COVID-19 epidemiology at a neighborhood level in rural counties in the Midwest region. The prevalence of COVID-19 and the analysis of hotspots reflected the number of tests, population density, and household SES. Third, we believe the study methods are generalizable to other rural areas wherever address data can be found for tested persons and positive tests, regardless of the proportion of cases in the population, although with lower incidence, the bandwidth would need to be adjusted even in higher density areas. If address data are not available (e.g., where only the Zip code of the tests and cases is known), it would be possible to identify “hotspot” Zip codes, but with much less precision as to the neighborhood characteristics associated with the hotspots. In our mixed urban-rural area, at least, zip code areas might not be suitable as they are heterogeneous in terms of SES, housing type, race, ethnicity, density, rural-urban character, and other relevant factors.
Our study also has several limitations. First, residents of cities and townships of lower population were less likely to be tested. Although we adjusted our analysis to account for testing, there may have been unreported cases in these areas. Second, our reliance on the REP as the data set means that some COVID-19 tests and cases are likely missing from our analysis. In addition, geocoded REP records for those with research authorization covered 75.0% of rural residents in the four-county study area, ranging from 63.7% in Goodhue County to 93.2% in Olmsted County. Third, parts of our study setting have a high proportion of health care workers compared to other settings, which may affect the generalizability of our study. Fourth, our analysis was not tested for spatial autocorrelation. Fifth, non-residential disease transmission (e.g., at workplaces) was not accounted for. In addition, while we know from other sources that rural counties have a lower level of vaccination and from the masking study [Reference Maciejko, Fox and Steffens12] that rural people who affiliate with the Republican Party have a higher tendency to resist masking, neither vaccination status, masking habits, or party affiliation are part of the data for this study. Finally, in low density areas in cities, the kernel density approach is influenced by positive cases outside the areas having high weighted case density.
In conclusion, COVID-19 cases among the rural populations increased significantly during the study period. In these four Minnesota counties, geographic factors (hotspots) significantly account for the overall burden of COVID-19 and associated racial/ethnic and SES disparities in rural areas depending on the timing of the pandemic. Results could more precisely guide community outreach efforts (e.g., target populations, public health education, testing/tracing, and vaccine roll out) to those residing in hotspots.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/cts.2021.885
Acknowledgements
This study was supported by the HOUSES Program, the National Institute of Health (R01 HL126667) and UL1 TR02377 (National Center for Advancing Translational Sciences).
Disclosures
The authors have nothing to disclose that poses a conflict of interest.