INTRODUCTION
Mosquito-borne Sindbis virus (SINV) of the genus Alphavirus is the causative agent of rash-arthritis syndrome in Finland [Reference Kurkela1, Reference Sane2]. The emergence of a Chikungunya virus, a globally important mosquito-borne alphavirus, has increased the interest in climatic factors [Reference Burt3]. SINV antibodies have been detected globally in humans and birds. The clinical SINV infection is known as Pogosta disease in Finland, Ockelbo disease in Central Sweden and Karelian fever in Russian Karelia [Reference Lvov4, Reference Lvov5], with similar microbiological and epidemiological characteristics [Reference Lundstrom6]. SINV has a local circulation in Northern Europe and no significant changes in the genotype have been observed during the past decades [Reference Sane7].
The laboratory diagnosis of SINV infection is by serology [Reference Manni8]. Most cases occur between ages 40–60 years with female predominance [Reference Brummer-Korvenkontio9, Reference Kurkela10]. The incubation period of SINV infection is 4 days (range 2–18 days), and the disease is characterized by fever, myalgia, rash and joint symptoms [Reference Kurkela1, Reference Sane2]. About 25–50% of cases have severe long-term articular symptoms [Reference Kurkela1, Reference Laine11, Reference Kurkela12]. The highest seroprevalence has been reported from Eastern Finland, while the disease is virtually non-existent in some parts of the country [Reference Brummer-Korvenkontio9]. Major SINV infection outbreaks have occurred mostly in 7-year cycles: in 1981, 1988, 1995 and 2002 [Reference Brummer-Korvenkontio9]. Cases occur exclusively within a seasonal cycle from the end of July until October [Reference Brummer-Korvenkontio9]. The disease has probably only relatively recently (less than five decades ago) been introduced to Finland [Reference Brummer-Korvenkontio9, Reference Brummer-Korvenkontio and Saikku13], as no cases were identified and no SINV seroprevalence in animals or humans has been observed before 1965 [Reference Brummer-Korvenkontio9, Reference Brummer-Korvenkontio and Saikku13]. The major public health burden is due to the high number of asymptomatic infections and articular symptoms [Reference Brummer-Korvenkontio9].
SINV infection is transmitted to humans exclusively by mosquito bites [Reference Sane2]. Wild birds have been suspected as viral reservoirs and as amplifying hosts, especially Passeriformes of the genera Turdus and Fringilla [Reference Lundstrom, Turell and Niklasson14]. The introduction of SINV to Northern Europe probably occurred through migratory birds from South Africa [Reference Kurkela10]. Forest-dwelling grouse species (Tetraonidae), black grouse (Tetrao tetrix) and capercaillie (Tetrao urogallus) had high SINV antibody titres following outbreaks [Reference Brummer-Korvenkontio9, Reference Kurkela10]. Experimentally SINV-infected birds have had sufficient titres to infect mosquitoes [Reference Lundstrom, Turell and Niklasson14]. The temperature between May and July and the depth of the snow cover in the preceding winter have coincided with increasing numbers of SINV infections in the following July–September [Reference Brummer-Korvenkontio9]. Late summer Culex and Culiseta mosquito species are considered to be the primary vectors for SINV infection [Reference Francy15], although the more human-adapted Ochlerotatus species may also play a role [Reference Lvov5, Reference Sane7]. There is a paucity of up-to-date information about the distribution of the mosquito species in Finland, yet all the above-mentioned species are considered abundant in most of the country [Reference Sane7, Reference Utrio16].
A time-series regression model with monthly indicators and sinusoidal terms for annual cycles has been used with time-series count data [Reference Schwartz17]. One important feature of the data was the excess of zeros and therefore, a two-part hurdle or zero-inflated model that models the zero counts and positive counts separately may be necessary [Reference Hilbe18, Reference Ridout, Demétrio and Hinde19]. We have previously applied a hurdle model with Verotoxigenic Escherichia coli surveillance data with low case counts, which has distinguished between significant risk factors for the occurrence (binary part) and incidence of disease (count part) [Reference Jalava20]. These types of models have also been applied to cholera prevalence [Reference Carrel21] and bacterial counts [Reference Gonzales-Barron22].
The aim of this study was to model SINV infections by healthcare district (HCD) in Finland between 1984 and 2010, taking into account seasonal monthly fluctuation, seasonality, long-term cycles and time lags in the observations to identify risk factors and predict the number of SINV infections. Climatic, ecological and socioeconomic data were used as explanatory variables in a hurdle model to identify factors for the occurrence and incidence of SINV infection. Furthermore, we used the obtained model to predict the cases for the years 2009, 2010 and 2011, in order to further justify the model's applicability.
MATERIALS AND METHODS
Serology and surveillance data
The laboratory diagnosis of SINV infection is based on enzyme immunoassays and the haemagglutination inhibition test [Reference Manni8]. A case was defined as a person with serologically confirmed acute SINV infection by the Department of Virology, University of Helsinki (Haartman Institute) during the period 1984–1994. This included about 70–80% of the laboratory-confirmed SINV infections in Finland. All cases from Helsinki University Hospital Laboratory (HUSLAB, or its predecessors) and other clinical microbiology laboratories were reported to the National Infectious Disease Register (NIDR) after its establishment in 1995. These two datasets were combined, collating information on sex, year of birth, place of residence or a proxy by place of treatment and date of sampling. The date of sampling was established as the latter if paired samples were taken.
Explanatory variables
We selected a range of explanatory variables based on a knowledge of known risk factors for SINV infections [Reference Sane2, Reference Brummer-Korvenkontio9]. Weather variables used in the models were precipitation, temperature and snow cover from the database of the Finnish Meteorological Institute. Two types of weather variables were used; first, monthly precipitation and mean temperature for May, June, July and August as a mean for each HCD calculated from the gridded datasets [Reference Ylhaisi23, Reference Tietavainen, Tuomenvirta and Venalainen24] and depth of snow cover on 15 March and 15 April, and second, weather variables with different monthly lags with respect to the reporting month of the case: precipitation and temperature (t – 1 to t – 6 months) and depth of snow cover (t – 5 to t – 7 months). The annual wildlife data for the density of grouse adults and juveniles (hatch-year grouse) in August were also included [Reference Lindén25]. We also included more time-independent variables, such as habitation, agricultural, and land usage variables, as shown in Supplementary Table S1, with all explanatory variables adjusted by HCD (n = 21), year and month.
Hurdle model
We applied a two-part hurdle model with log of lag 1 of the outcome as an autoregression replacement term for SINV infection surveillance data and created a model with covariates explaining the occurrence of the cases in the binary part of the model and the incidence in the negative binomial regression part. With an excess of zeros in the data for SINV infections (mean number of cases 0·52 per HCD, month and year with variance of 18·1), we applied a negative binomial distribution-based hurdle model [Reference Ridout, Demétrio and Hinde19] as previously described in detail [Reference Jalava20].
Serologically diagnosed acute SINV infections were included in the analysis as outcome variables for the 26-year follow-up period by HCD, month and year of diagnosis based on date of sampling. We included data from June to October, as there were essentially no cases outside this time-frame. We performed a univariable analysis with all explanatory variables in the hurdle model applying a clog log link function and using the data for 1984–2009. Those variables with P values of <0·20 in the univariate analysis were selected for the multivariable model. Of the correlated variables with correlation coefficients of >0·80, only those with P values of <0·20, i.e. the most significant ones, were included in the final model within each group as shown in Supplementary Table S1. To identify explanatory variables in the final multivariable model, we used forward selection according to Akaike's Information Criteria (AIC) due to its ability to increase goodness of fit while simultaneously penalizing for increasing the number of estimated parameters to avoid overfitting. We also tested the linear trend with splines, estimated the sinusoidal terms, and performed collinearity diagnostics to identify possible multicollinearity for the final model. The autocorrelation and partial autocorrelation for scaled residuals by HCD with lags 1–10 (2 years) were checked by visual inspection for any remaining autocorrelation. We also tested for the spatial autocorrelation of the residuals for each year and month. The permutation test for Moran's I statistics (Bonferroni) was used to assess the possible spatial autocorrelation in the scaled residuals by time. We evaluated further the spatial accuracy of the model by calculating the point estimates for the predictions of each HCD for 1984–2009 using other HCD as learning data. The comparison between Poisson vs. negative binomial hurdle model and hurdle vs. zero-inflated model was performed with Vuong's test [26]. The statistical packages included R version 2.14.1 (R Foundation, Austria), SPSS version 19 (SPSS Inc., USA) and Stata version 9.2 (StataCorp., USA).
Prediction of the occurrence of acute SINV infections
Data for the periods 1984–2008, 1984–2009 and 1984–2010 were used with the obtained model to predict cases for 2009, 2010 and 2011, respectively. As there were no packages available to calculate the prediction intervals (PI) for the hurdle model in R, we calculated predicted point estimate values and standard errors by simulation. Subsequently the partial likelihood estimators were calculated for the parameters. We assumed that the distribution of the prediction estimates could be approximated by normal distribution [Reference Casella and Berger27]. The predicted estimates and PIs were sampled 100 times using a normal distribution to simulate a fiducial distribution for parameters for the years 2009, 2010 and 2011. We used only marginal normal distributions for the parameters (R code in Supplementary Material S2, abbreviations in Table 1).
n.a., Not applicable.
* exp(0·01 × βx/1·386) – 1.
† exp(0·01 × βx) – 1.
RESULTS
Descriptive epidemiology
There were 339 cases with acute SINV infection diagnosed by the Department of Virology, University of Helsinki (Haartman Institute) for the period 1984–1994. Altogether 3042 SINV infection cases were reported to NIDR for the period 1995–2010. Major outbreaks of SINV infection occurred in a cyclic manner with the largest outbreak recorded in 1995 (1310 cases) (Fig. 1). Of all cases, 3320 (98%) occurred between the end of July and October. The first cases were detected every year around week 30 (end of July), peaking in week 33 (mid August) and declining by weeks 37–47 (September–October). There is a delay of about 2 days between onset of symptoms and first medical contact [Reference Sane2]. Half of the cases, 1965 (58%) were female, while 1135 (34%) were identified in the 45–54 years age group. Cases were geographically clustered in central latitudes of approximately 61°–64° (Fig. 2).
Hurdle model
The splines for the piecewise linear trend with knots at years 1995 and 2002 (major outbreak years), several sinusoidal sin/cos terms, and month of reporting were significant, as shown in Table 1. As some of the variables were most likely to be on the same pathway for infection, the ecological variables (grouse and water-related variables) were used mostly in the binary part and population variables (income, working in agriculture) in the negative binomial part, as judged by AIC. In the multivariable analysis, a high monthly mean temperature in May and June, high monthly precipitation in June, thick snow cover in April (melting waters) and a high number of cases in the previous month (natural logarithm) were positively significant both for occurrence of the disease (zero part) and incidence (negative binomial part) of SINV infection. The early summer weather conditions and depth of the previous winter's snow cover probably predict the number of mosquitoes in late summer. In addition for the occurrence of the disease, hatch-year black grouse density, regulated water shore area and previous month's precipitation were significant.
In addition to the those mentioned above, the most significant variables for the incidence of the disease were the level of income (negative), proportion of population working in agriculture (positive), and proportion of agricultural land (negative). The people most likely to become exposed are those working in agriculture or picking berries or mushrooms or hunting in the forest, possibly reflecting a lower level of income. The negative binomial-based hurdle model was chosen as best fitting the data based on Vuong's test, the AIC for the fully fitted model was 3866. The residuals were satisfactory for autocorrelations and partial autocorrelations as analysed by visual inspection for any lags over the 95% confidence intervals. There was essentially no remaining spatial autocorrelation in the scaled residuals. The predictions by HCD were within the magnitude of the true total counts and therefore considered satisfactory (Table 2). The fit of the final model is presented in Figure 1.
Prediction of cases for 2009–2011
Using the hurdle model for the period 1984–2009, a prediction for the number of cases for 2009 was 85 (95% PI 2-1187) cases, with 106 actual cases occurring. The prediction for 2010 was 37 cases (95% PI 5-241), with an actual occurrence of 54 cases and for 2011, 44 cases (95% PI 11-392), with an actual occurrence of 63. It should be noted that the predictions can be done only in the preceding month.
DISCUSSION
This study identified ecological cycles and variables explaining the occurrence and incidence of SINV infections and provided accurate predictions. SINV infection dynamics between 1984 and 2011 were characterized by regular annual cycles between late July and October with larger outbreaks in 1988, 1995 and 2002. The disease most likely spreads between black grouse, mosquitoes and humans when suitable climatic conditions for the reproduction of mosquitoes occur, such as warm temperature and high precipitation with thick snow cover during the previous winter. These conditions are likely to be met in the area of Central-Eastern Finland. The development of infected mosquitoes is probably further expedited by the amount of regulated waters. Hatch-year black grouse may be one of the main amplification hosts for SINV. People are likely become infected during outdoor activities when exposed to infected mosquitoes. As SINV surveillance data were characterized by an overdispersion of zeros, we were able to model SINV infections by applying a hurdle model.
The higher occurrence of SINV infections in middle-aged women was as expected [Reference Sane2]. Cases are known to cluster in Eastern Finland [Reference Brummer-Korvenkontio9, Reference Sane28], but we found them also concentrated within Central Finland, indicating the importance of local geographical and climatic factors for the incidence of the disease. Ockelbo disease also clusters in certain latitudes in Sweden [Reference Lundstrom6] and Karelian fever in Russia [Reference Lvov5]. However, host genetic factors may also contribute to the geographical distribution of clinical SINV infections, producing potentially more clinical disease in Eastern Finland [Reference Sane29]. The study combined two datasets, one prior to initiation of surveillance of infectious diseases in Finland, and the other the NIDR data from 1995 to 2010. We tested the distribution of cases by HCD during these two periods by χ2 test and found that they were not comparable (data not shown), yet the number of cases during the early period was only about 10% of the total number of cases. This was mostly due to the history of the disease as it was initially discovered in one HCD only in the beginning of 1980s, so most of the early cases stem from that area. However, when the hurdle model was applied to the NIDR data only (1995–2009), the model remained essentially the same, temperature in May and sin/cos72 terms were not significant (data not shown).
The seasonal timing of the cases was identical each year, indicating a biological explanation for the amplification of the virus. We suspect that this may be due to hatching of black grouse chicks. Our data indicate that suitable climatic conditions enable the development of mosquito populations, allowing effective transmission between hosts, including amplification in hatch-year black grouse, thereby increasing the number of infective mosquito vectors that subsequently infect humans. This is also supported by a recent study on the prediction of tularaemia outbreaks by mosquito surveillance data where similar weather variables explained the development of mosquito populations [Reference Ryden30]. More virological and epidemiological data is needed to validate this finding.
We expected weather variables to be important for the epidemiology of SINV infection as high temperature and increased precipitation have been shown to be important variables for SINV infections in South Africa [Reference Jupp31, Reference McIntosh32]. The early summer weather variables and melting water from the previous winter's snow cover were significant for both parts of the model, indicating their role in the occurrence and incidence of the infection. Mosquito larvae development depends on temperature and precipitation and melting water from snow [Reference Trawinski and Mackay33]. The temperature in June is also an important factor for the survival of the newly hatched grouse chicks [Reference Linden34, Reference Ludwig35]. The model supported the role of hatch-year black grouse as an amplifying host for SINV infections in Finland [Reference Sane2, Reference Brummer-Korvenkontio9]. The hatch-year birds have been found to be an important amplifying host for West Nile virus [Reference Hamer36]. Despite the overall decline in black grouse density in the late 1980s and early 1990s in Finland, it is still comparatively high in areas with high SINV incidence. Grouse chicks feed on insects for the first few weeks of their life and hatching occurs around mid-June in Northern Finland and about 1 week earlier in Southern Finland [Reference Linden34, Reference Ludwig35]. Hatch-year chicks are also more accessible to mosquitoes due to as yet undeveloped plumage and may be more susceptible to infection due to a lack of immunity. These factors and the density of the mosquito population during late summer may together explain the exact timing of the annual SINV infection outbreaks.
The negative effect of agricultural land in the model possibly indicates the importance of other land areas where people spend time and may become exposed to mosquitoes. The significance of the negative effect of income perhaps indicates that people with lower income are more likely to pick berries and mushrooms rather than purchase them, thereby becoming exposed to mosquito bites. The effect of income level may also be a coincidental finding, as the level of income is generally lower in high SINV infection rate areas. There was a negative correlation between income level compared to the proportion of population working in agriculture (data not shown). Spending time in the forest or swamps is an established risk factor for acquiring SINV infection [Reference Sane2]. It may be that overall outdoor activities are more common in the countryside where the level of income is also lower.
The significance of the proportion of regulated lake water shore-length for the occurrence of the disease is a novel finding. This finding should be interpreted with caution as this study is of an ecological nature and ecological fallacy may account for this finding. However, there is also strong supportive evidence that this association is valid. Water regulation probably creates ecological changes in the lake banks favouring the development of suitable environments for the development of mosquitoes or its bloodmeal supplier, i.e. birds. Regulated water areas have been found to be a risk factor for West Nile virus, which has a similar biological cycle to SINV infection [Reference Liu, Weng and Gaines37]. Notably, the heavily regulated Koitere Lake is located in the hotspot of SINV infections, in the Ilomantsi municipality (the disease was named Pogosta disease by the Ilomantsi Centre) with one of the highest prevalence and incidence rates for SINV infection. A power plant was built in 1955, creating changes in the level of water surfaces, which was further enforced by initiation of water regulation in 1980 [Reference Tarvainen38]. The lakeshore was trimmed of trees prior to regulation, but the regulation was conducted at a lower level than initially planned, creating bushy vegetation on the lakeshore [Reference Tarvainen38]. The first major SINV infection outbreak was diagnosed by serology in Finland in 1981, although a probable SINV infection outbreak was described clinically in 1974 [Reference Brummer-Korvenkontio9, Reference Brummer-Korvenkontio and Kuusisto39]. With the geographical spread of the cases and the importance of local climatic factors and regulated waters, it is tempting to speculate that the prevalence of SINV infections in Sweden and Russian Karelia may be due to similar phenomena.
We found the hurdle model suitable for characterizing the cyclic SINV infections. Distinguishing explanatory variables for the occurrence and incidence of the infections was beneficial. Based on this and our previous study [Reference Jalava40], we believe that the hurdle model provides more information on the nature of the explanatory variables for the pathways of infections. A model with an estimated mosquito variable based on weather variables would be useful [Reference Trawinski and Mackay33]. We speculate that the variable for cases in the previous month (ln) is a useful proxy for the magnitude of the mosquito population. Several sinusoidal terms remained significant in the final model. We do not have biological explanations for all these terms, except the 12-month term, which reflects the seasonal variation of the infections and the 84-month term, which slightly reflects the cycles in the forest grouse populations detected prior to the mid-1980s [Reference Ranta, Helle and Lindén41]. Overall, our knowledge of SINV cycles is still rudimentary, while unknown and important factors may remain to be identified, including entomological variables. There may be cyclic fluctuations in the density of other potential amplifying hosts or similarly other factors influencing SINV epidemiology. We also fitted the model without the cyclic and trend terms to assess the adequate fit of the model (data not shown). All other variables remained significant, but temperature in May and hatch-year grouse density in the zero part and income in the negative binomial part became non-significant. However, leaving any of these variables out of the original model increases AIC by a factor of 25 units. Therefore we conclude that the model with the trend terms and external variables as listed above was adequately fitted.
We identified climatic, ecological and socioeconomic determinants for both the occurrence and incidence of human SINV infections in Finland, applying a negative binomial distribution based hurdle model. For public health interventions, water regulation guidelines should be adapted to prevent an increase in infected mosquito populations. The study was hampered by the lack of mosquito surveillance data, although climatic factors are known estimators for mosquito quantities [Reference Ryden30, Reference Trawinski and Mackay33, Reference Roiz42].
A shortcoming of our hurdle model is that data were unadjusted for age and sex distribution for the HCDs, and we ignored the possible change in effects by time and area. The spatial correlations of the residuals were tested with the limitations of the study material and found non-significant, this is in accordance with the intuitive observation that the spread of the disease is localized.
This study clarifies interactions between ecological and biological phenomena and the occurrence of SINV infections in Finland. Furthermore, the use of a hurdle model is justified for identifying risk factors for the occurrence and incidence of an infectious disease with an excess of zero counts in the data.
SUPPLEMENTARY MATERIAL
For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S095026881200249X.
DECLARATION OF INTEREST
None.