Seventy-five years of estimating the force of infection from current status data

N. HENS; M. AERTS; C. FAES; Z. SHKEDY; O. LEJEUNE; P. VAN DAMME; P. BEUTELS

doi:10.1017/S0950268809990781

Seventy-five years of estimating the force of infection from current status data

Published online by Cambridge University Press: 21 September 2009

N. HENS ,

M. AERTS ,

C. FAES ,

Z. SHKEDY ,

O. LEJEUNE ,

P. VAN DAMME and

P. BEUTELS

Show author details

N. HENS*: Affiliation:
Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium Centre for Health Economics Research & Modelling Infectious Diseases; Centre for the Evaluation of Vaccination, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
M. AERTS: Affiliation:
Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
C. FAES: Affiliation:
Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
Z. SHKEDY: Affiliation:
Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
O. LEJEUNE: Affiliation:
Centre for Health Economics Research & Modelling Infectious Diseases; Centre for the Evaluation of Vaccination, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
P. VAN DAMME: Affiliation:
Centre for Health Economics Research & Modelling Infectious Diseases; Centre for the Evaluation of Vaccination, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
P. BEUTELS: Affiliation:
Centre for Health Economics Research & Modelling Infectious Diseases; Centre for the Evaluation of Vaccination, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
*: *Author for correspondence: Professor N. Hens, Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Agoralaan 1, Building D, B-3590 Diepenbeek, Belgium. (Email: [email protected])

Article contents

Summary
INTRODUCTION
DISCUSSION
References

Rights & Permissions

Summary

The force of infection, describing the rate at which a susceptible person acquires an infection, is a key parameter in models estimating the infectious disease burden, and the effectiveness and cost-effectiveness of infectious disease prevention. Since Muench formulated the first catalytic model to estimate the force of infection from current status data in 1934, exactly 75 years ago, several authors addressed the estimation of this parameter by more advanced statistical methods, while applying these to seroprevalence and reported incidence/case notification data. In this paper we present an historical overview, discussing the relevance of Muench's work, and we explain the wide array of newer methods with illustrations on pre-vaccination serological survey data of two airborne infections: rubella and parvovirus B19. We also provide guidance on deciding which method(s) to apply to estimate the force of infection, given a particular set of data.

Keywords

Hugo Muench incidence data infectious diseases serological data statistical modelling

Type: Original Papers
Information: Epidemiology & Infection , Volume 138 , Issue 6 , June 2010 , pp. 802 - 812

DOI: https://doi.org/10.1017/S0950268809990781 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

INTRODUCTION

Although epidemics were already documented by Hippocrates (458–377 b.c.), it was not until 1760 that mathematical modelling of infectious diseases was first documented in a publication by Daniel Bernoulli [Reference Bernoulli1], who used a mathematical method to study smallpox infections. These and subsequent models at the very start of the 20th century [Reference Hamer2–Reference Kermack and McKendrick5] focused on determining the infectious disease spread and the associated basic and effective reproduction number (see Appendix for terminology used). Hugo Muench first proposed the concept of estimating the force of infection (at that time called the ‘effective contact rate’) as a key parameter in such mathematical models, in a publication entitled ‘Derivation of rates from summation data by the catalytic curve’ in the March 1934 issue of the Journal of the American Statistical Association [Reference Muench6]. His work became widely known only 25 years later with the publication of his book Catalytic Models in Epidemiology [Reference Muench7].

Before Muench's seminal work, series of physicians' case-reports, i.e. incidence data, which are often affected by underreporting and misdiagnosis, were used to study the ‘effective contact rate’ of a given disease in a given population [Reference Bernoulli1, Reference Collins8]. Muench suggested using a catalytic model on summation data to obtain a measure of the rate at which a susceptible acquires infection (and not necessarily disease). The name ‘catalytic’ was inspired by the similarity to the equations used to study the processes that drive chemical reactions. In his 1959 monograph [Reference Muench7], Muench referred to this effective contact rate as the force of infection (FOI). The importance of Muench's catalytic model is the capacity to use test results on, for example, serological or saliva samples, in addition to reported incidence data, in order to estimate the FOI, especially for infections that leave permanent markers of immunity in their surviving hosts.

More explicitly, Muench stated that the simplest approach is to assume a constant effective exposure rate λ per unit of time t and that this rate applies to the entire population at all times. Nonetheless, the immune proportion π increases only to the extent that previously uninfected individuals can incur new infections. In addition, the model allowed that some fraction of the population cannot be infected at all. Muench's catalytic curve can be written as

(1)

$\pi \lpar t\rpar \equals k\lpar l \minus {\rm e}^{ \minus \lambda t} \rpar \comma \hfill$

where k is the proportion of the population that can be infected, π is the proportion of all previously infected (and immune) individuals prior to age t, and l is the proportion of the population which may show evidence of exposure. Figure 1 illustrates the behaviour of Muench's model for different choices for k and l. Note that the model gives negative estimates of the proportion immune for l<1 and thus interpretation is not straightforward in this case.

Fig. 1. Muench's catalytic model for λ=0·15 and various choices for k and l [see equation (1)].

Using the method of moments to estimate these parameters, Muench illustrated his approach on several datasets. These included intraperitoneal mouse protection test results for yellow fever in South America, a test which remains positive once an individual has been infected by yellow fever. He considered two regions; one in which the population was originally assumed entirely susceptible with frequent epidemics so that λ actually corresponded to a steady effective exposure rate. He compared these results with those of a second region where an outbreak of yellow fever occurred. He illustrated that these test results cannot show when individuals had been infected, only that it must have occurred at some time before taking the test. Indeed, by analysing summation data one estimates the probability of past infection at a certain time-point and derives the FOI using the analytical expression in equation (1). The latter is based on the untestable assumption of time homogeneity.

Testifying its importance, numerous later publications echoed this inherent limitation of not being able to use direct data on person-time incidence but rather summation data to derive the FOI under the assumption of time homogeneity. Muench already discussed other complications such as test reversion rates, differential mortality, and passive immunity through transfer of maternal antibodies.

Although applicable to cumulative incidence data too, we illustrate Muench's method on cross-sectionally collected seroprevalence data because of the above limitations of reported incidence data. We use two serological surveys, the first on rubella in the UK, collected in 1986–1987 for males only [Reference Farrington, Kanaan and Gay9]; and the second on parvovirus B19 in Belgium, collected in 2001–2003 [Reference Hens10]. We make a number of typical common assumptions, i.e. that no portion of the population is free from exposure, a perfect test is used, the time homogeneity assumption holds, disease-related mortality is negligible compared to all-cause mortality, infection confers lifelong immunity [in equation (1): l=k=1] and, specifically for rubella in the UK, that the schoolgirl programme that was implemented had negligible effect on the seroprevalence of rubella in males. Whereas Muench originally used the method of moments to estimate the FOI, we cast his model in the maximum-likelihood framework. More specifically, we use a binomial likelihood to relate the age-specific prevalence (or cumulative incidence) to the age-specific observed proportion immune:

(2)

$L\lpar \lambda \semi \ {\bf y}\comma \ {\bf n}\comma \ {\bf a}\rpar \equals \prod\limits_{i \equals \setnum{1}}^{N} {\pi \lpar a_{i} \rpar ^{y_i} \left( 1 \minus \pi \lpar a_{i} \rpar \right) ^{n_i \minus y_i} } \comma \hfill$

where a=(a ₁, …, a _N) denotes the age vector of length N; y=(y ₁, …, y _N) and n=(n ₁, …, n _N) denote the corresponding vectors of positive and total counts per age value. Assuming Muench's model, π(a _i) in equation (2) is given by equation (1). Maximizing the likelihood in equation (2) with respect to λ yields the maximum-likelihood estimate $\hat{\lambda }_{{\rm ML}}$ .

The black curves in Figure 2 show the fitted seroprevalence and FOI curves together with the observed seroprevalence for both infections. The constant FOI was estimated as 0·104 and 0·053 for rubella and parvovirus B19, respectively. The assumption of a constant FOI seems visually appropriate for the rubella data but inappropriate for the parvovirus B19 data (cf. Fig. 2). Such observations led other researchers to develop new methods allowing for an age-dependent FOI.

Fig. 2. (a) UK rubella and (b) Belgium parvovirus B19 infection by age (years): the observed seroprevalence per integer age value with size proportional to the sample taken (•), the fitted seroprevalence curve (–––, upper curve) and the FOI curve (- - -, lower curve). Four different models were used: Muench's constant FOI model (black curves); Farrington's exponentially damped model (red curve), a spline model (green curve) and its monotonized version (green-blue-green curve) as applied by Hens et al. [Reference Hens10]. Note that, by definition, the latter curve (partly) overlaps with the green curve for rubella (parvovirus B19).

Standing on the shoulders of Muench

Muench's work has led to the development of many models to estimate the FOI. These models can broadly be divided into parametric and non-parametric methods. Whereas parametric methods rely on specific functional relationships (e.g. Muench's model is parametric since it assumes a constant FOI), non-parametric models relax such assumptions. In this section we present an overview of these different approaches. Starting with Muench's work, Table 1 summarizes the key contributions with respect to the shape of the FOI, the focus of the analysis and the data used.

Table 1. Historical overview of key-contributions to the estimation of the force of infection (FOI) from summation data in the 20th century

Parametric methods

Griffiths [Reference Griffiths11] was the first author to propose the use of a linear rather than a constant FOI to model measles incidence data from England and Wales for the period 1956–1969. With the above notation, his model could be formulated as

(3)

$\pi \lpar t\rpar \equals 1 \minus {\rm e}^{ \minus \int_{\setnum{0}}^{t} {\lambda \lpar \omega \rpar {\rm d}\omega } } \comma \hfill$

where λ(t)=γ₀(t+γ₁)I _{t>τ}, with γ₀, γ₁ parameters to be estimated and τ the threshold for inherited immunity. I _{t>τ} denotes the indicator function taking value 1 if t>τ and 0 otherwise. In general, equation (3) is the solution of the differential equation

(4)

${\rm d}x\lpar t\rpar \sol {\rm d}t \equals \minus \lambda \lpar t\rpar x\lpar t\rpar \hfill$

(with initial condition x(0)=1 meaning that everyone is assumed susceptible at birth) that describes the changes in the proportion of susceptibles x(t)=1−π(t) with respect to time. Note that $\int_{\setnum{0}}^{t} {\lambda \lpar \omega \rpar {\rm d}\omega }$ denotes the cumulative FOI up to age t acting on susceptibles. Whereas Griffiths [Reference Griffiths11] described the estimation procedure outlined above as simple and straightforward to apply, he actually used an alternative method to model the measles incidence (rather than cumulative incidence) using a multinomial model and thereof estimated the attack rate.

Note that the catalytic model as presented in equation (3) is actually a survival model and that the probability of past infection is the cumulative distribution function of the time to infection or alternatively 1 – survival function. The (cumulative) FOI is the (cumulative) infection hazard.

In general the change in the susceptible portion could be both age- and time-dependent. It is under the assumption of time homogeneity that age can be used to determine the time of infection and thus age and time are identified and equation (3) is typically denoted in terms of age rather than time. Time homogeneity implies that neither the pathogen's transmissibility, nor susceptible people's receptiveness to infection (irrespective of their age), nor the frequency and intensity of interactions necessary for transmission to occur (e.g. social contact patterns for air- and saliva-borne infections, sexual intercourse for sexually transmitted infections) have changed substantially with time. In particular feco-orally transmitted infections (e.g. hepatitis A virus, cholera) are documented to be time heterogenous, due to the drastic improvements in sanitary conditions in many settings over the last century [Reference Gay12, Reference Schenzle, Dietz and Frösner13]. Note that Schenzle et al. [Reference Schenzle, Dietz and Frösner13] argued that, for the case of hepatitis A, it is more appropriate to assume age homogeneity and time heterogeneity, and thus modelled time effects. The limitation of having to choose for either time or age may be overcome by use of data which is both age and time structured (see below).

Griffiths used maximum-likelihood theory to estimate the parameters in the catalytic linear model and for the first time addressed the goodness-of-fit to the data using Pearson's χ² test. Interestingly, Griffiths justified his choice of a linear FOI by using a non-parameteric estimate for the FOI which was plotted against age and showed a linear trend. Note that Griffiths applied his model only up to age 10 years as by then most children had been infected with measles. Griffiths' work was later used by several other authors to model measles and other common childhood infections [Reference Anderson and May14]. Griffiths' 1974 contribution should be seen as the completion of the basic building block for the estimation of the FOI.

Grenfell & Anderson [Reference Grenfell and Anderson15] extended Griffiths' approach to encompass a general polynomial description of changes in the FOI with age and derived a stepwise maximum-likelihood method for parameter estimation from datasets consisting of either case notifications or serological information. This encompasses both Muench's and Griffiths' catalytic model. The appropriate polynomial degree can be found by minimizing the relative deviance. Grenfell & Anderson discussed the advantages and disadvantages of using case-notification data and serological data and stressed that availability is the key criterion for which type of data to use [Reference Grenfell and Anderson15]. For case- notification data, the quality largely depends on the biases and inaccuracies of the case-notification system. Serological databases too can suffer from the representativeness of the collected samples, as well as implicit assumptions for analysis, such as time homogeneity, lifelong immunity and the other complications listed above, duly noted by Muench [Reference Muench6]. Several of these issues gave rise to further research.

A first complication is that under the assumptions stated above, the model for the prevalence should be monotonically increasing with age. The monotonicity issue was not relevant for Muench and Griffiths since a model with constant or linear FOI always estimates a monotone prevalence. However, this is not necessarily the case with high-order polynomials. In 1990, Farrington [Reference Farrington16] placed the issue of monotonicity in the heart of the estimation problem. He noted that for measles, mumps and rubella, the FOI typically rises linearly from birth up to about 10 years of age, after which it drops off again for older ages. This qualitative form was also observed by Griffiths. Farrington considered a nonlinear model, i.e. an exponentially damped linear model, in age

(5)

$\lambda \lpar a\rpar \equals \lpar \gamma _{\setnum{0}} a \minus \gamma _{\setnum{1}} \rpar {\rm e}^{ \minus \gamma _{\setnum{2}}{a} } \plus \gamma _{\setnum{1}} \comma \hfill$

where γ₀, γ₁ and γ₂ are positive parameters that can be estimated using maximum-likelihood and constrained optimization. The FOI in equation (5) is 0 for age 0, then shows a linear increase and ends in an exponential decay. Following Griffiths, Farrington [Reference Farrington16] assessed the goodness-of-fit of his model using a non-parametric estimate of the FOI, i.e. a moving average, indicating again that a more formal non-parametric approach for the estimation of the FOI is needed, at least in the exploratory stage.

Note that for specific choices of λ(a), such as those proposed by Muench and by Grenfell & Anderson (see also [Reference Massad, Raimundo and Silveira17]), equation (3) corresponds to a generalized linear model (GLM [Reference McCullagh and Nelder18]) with binomial response and log-link. Other parametric models fitted within the framework of GLMs have been proposed since then using different link-functions such as the logit and cloglog link [Reference Keiding19–Reference Beutels29]. Of those, the most popular parametric model employed is the piecewise constant FOI where for predetermined intervals a constant FOI is assumed. The choice of these intervals is usually inspired by the intuitive relevance of the ages of mixing groups in the population (e.g. pre-school, school, high school, etc.). A drawback for the model of Grenfell & Anderson and many other parametric models is the probable occurrence of a negative FOI for some age values or an unrealistic steep increase for higher age values because of the chosen functional relationship. Farrington's model is not a member of the GLM family, and deals with these issues by constraining the model based on prior knowledge. However, if that knowledge is unavailable or questionable, non-parametric options could be explored.

Non-parametric methods

Although non-parametric techniques were used before to assess the fit of a parametric, possibly nonlinear, function [Reference Griffiths11], Niels Keiding [Reference Keiding30] was the first to explicitly use a non-parametric technique to estimate the FOI from serological data, based on the isotonic Kaplan–Meier estimator of 1−π(a) which finds its origin in survival analysis. Keiding also addressed the issues of time homogeneity, monotonicity, and censoring.

Most of the various non-parametric methods in the GLM framework developed since Keiding, involved estimating the (sero)prevalence by a non-parametric technique and subsequently deriving the FOI by $\hat{\lambda }\lpar a\rpar \equals \rmDelta \hat{\pi }\sol \lpar 1 \minus \hat{\pi }\lpar a\rpar \rpar$ . Note that this latter expression for the FOI is a discretized version of equation (4) with x(a)=1−π(a).

Of these non-parametric applications, spline-based methods have become very popular [Reference Hens10, Reference Nagelkerke31–Reference Shiboski34]. A spline can be seen as a concatenation of a rich number of local polynomials which are glued together in a ‘smooth’/continuous way. In 1996, Keiding and colleagues proposed to replace the kernel smoother in his earlier work with a smoothing spline [Reference Keiding19]. Subsequent work involved semi-parametric models [Reference Hastie and Tibshirani35–Reference Namata37], in which the age-specific prevalence is modelled non-parametrically and possible covariate effects such as gender are included in the parametric component of the model. The green (blue) curves in Figure 2 represent the (monotonized) estimated prevalence and FOI based on the spline methodology [Reference Hens10, Reference Keiding19, Reference Namata37].

Looking further at the different methods as applied to the rubella and parvovirus B19 data in Figure 2, several observations can be made. First, Muench's model seems to fit the serological profile of rubella well, whereas it does not follow the pattern of the parvovirus B19 seroprofile. Farrington's exponentially damped linear model shows an improved performance for rubella whereas the fit to the parvovirus B19 data seems reasonable except that it is not able to capture the decay in seroprevalence at about 25 years of age. The spline model shows a similar fit to the seroprevalence data as the exponentially damped linear model for the rubella data, but some quantitative differences appear on the scale of the FOI.

Moreover, the spline model is able to capture the decaying seroprevalence at around the age of 25 years whereas its monotone version is a regularization to ensure a positive FOI. Indeed, when looking at parvovirus B19 in Figure 2a, the spline fit reveals a non-monotone pattern (green curve) whereas its monotonized version, applying a smooth-then-constrain approach, shows a monotone trend (green-blue-green curve) for the prevalence and a positive FOI.

Based on the non-parametric fits, one might question which of the underlying assumptions is violated for parvovirus B19 in Belgium. This could for instance be due to antibody titres declining with time post-infection, and eventually falling below the cut-off for positivity; or a time dependence in the FOI resulting in a cohort effect, or the use of a manifestly imperfect test. It is only by contrasting different methods and the graphical exploration of the goodness-of-fit that such distortions appear [Reference Hens21].

When estimating the FOI from summation data, phenomena like these are often not taken into account because of the lack of information in the data and thus result in what is often referred to as an average profile. The methods that have been used to investigate the potential mechanisms behind such features mostly rely on contrasting mathematical models with incidence data. It seems hardly possible and beyond the scope of the current paper to list all papers that may have used Muench's catalytic model as a starting point (without necessarily referencing it) to make extensions for such mathematical modelling studies. Nonetheless we further illustrate the vast influence of Muench's pioneering work by listing some of the main examples of such extensions: (1) models assuming infection confer no or no lasting immunity. Examples of such analyses include Bordetella pertussis [Reference van Boven38] and Haemophilus influenzae type b [Reference Coen39], transmission models in which waning immunity was taken into account; (2) analyses describing the possible occurrence of seasonality [Reference Fine and Clarkson40–Reference Zaaijer, Koppelman and Farrington42] or regular epidemics [Reference Whitaker and Farrington43]; (3) analyses on chronic infections (see e.g. [Reference Mathei44–Reference Becker48]); (4) analyses taking into account vaccination programmes (see e.g. [Reference Becker48–Reference Amaku50]); (5) further extensions of the more basic catalytic models have been used (see e.g. [Reference Zhang51]).

A practical guide to estimate the FOI

The diversity of methods to estimate the FOI raises the question, given a particular dataset, which method should be used? Figure 3, presents a guiding flow chart, which starts from an exploratory stage in which a non-parametric model is fitted to the data. A graphical representation of such a fit is given in Figure 2. Given the epidemiology of the infectious disease under consideration and specific complications, such as diagnostic uncertainty, the shape of the non-parametric estimate and the goodness-of-fit to the data could show distortions with respect to monotonicity, maternal immunity and time homogeneity. For each of these distortions, different remedial techniques are available, which are listed in the flow chart (Fig. 3). These techniques enable obtaining a ‘regularized’ estimate of the FOI, which can be studied in detail, parametrized [Reference Beutels52–Reference Thiry56] and in turn can be used for the estimation of related parameters, such as the basic reproduction number [Reference Farrington, Kanaan and Gay9, Reference Farrington16].

Fig. 3. Flow chart of a practical guide to estimate the force of infection (FOI) from seroprevalence data with reference to the literature on what to do and how to do it.

Muench today: a road map to another 75 years?

Although, Muench's model was overly simplistic in assuming a constant FOI, his 1934 paper already raised concerns with respect to model assumptions and data constraints, which are still today important research topics. There are a number of areas where recent research has produced interesting results, and in this section we highlight four of these.

(1) Time homogeneity is often assumed, as a necessary condition, if one can only use a single cross-sectional serological survey for estimating the FOI. In some cases (e.g. parvovirus B19 in Fig. 2), this seems questionable and might be untenable. If so, one needs longitudinal type of data (see e.g. [Reference Whitaker and Farrington57]) or alternatively several prevalence surveys at different points in time [Reference Whitaker and Farrington57, Reference Ades and Nokes58]. Since similar distortions were observed for parvovirus B19 in four other countries [Reference Mossong59], other hypotheses such as waning of naturally acquired antibody levels with time post- infection need further investigation.
(2) Antibody levels are commonly used to classify an individual's sample as positive, negative or inconclusive based on a given test-specific threshold. This allows estimation of the proportion of susceptible people at each age, and to derive from these proportions the FOI by age. Mixture models are a natural alternative for this type of data. Gay [Reference Gay60] was the first to model age-stratified serological data using two-component mixture models with age-dependent mixing probabilities and age-dependent mixture components. Using this mixture approach, Bollaerts et al. [Reference Bollaerts61] proposed a direct estimator for an age-dependent FOI (i.e. without using a predefined threshold to distinguish susceptible from immune people).
(3) As sera are often tested for more than one antigen, joint analysis for diseases with similar transmission routes can lead to new insights. Hens et al. [Reference Hens10] introduced the age-dependent joint and conditional FOI, a framework allowing formal statistical tests such as testing the assumption of separable mixing. Earlier work on bivariate models [Reference Farrington, Kanaan and Gay9, Reference Kanaan and Farrington62] focused on the estimation of the basic reproduction number from serological survey data incorporating the effect of individual heterogeneity (see also [Reference Coutinho63]) and testing for separable mixing [Reference Farrington, Kanaan and Gay9].
(4) A fundamental concept for the estimation of the basic reproduction number is the mass action principle, relating the FOI with the transmission rate, i.e. the per capita rate at which an individual of a particular age makes an effective contact with a person of a specific age, per year. The transmission rates populate the so-called WAIFW matrix [Reference Anderson and May64], which is traditionally imposed to be of a particular structure (rather ad hoc, albeit inspired by, e.g. social and schooling systems). Wallinga et al. [Reference Wallinga, Teunis and Kretzschmar65] were the first to conceptually link seroprevalence data with data on conversational contacts per person, whilst assuming that transmission rates are proportional to rates of conversational contact. Using data from a social contact survey, Ogunjimi et al. [Reference Ogunjimi66] and Goeyvaerts et al. [Reference Goeyvaerts67] proposed the disentanglement of the WAIFW matrix into two components: the surface of conversational contacts and an age-dependent proportionality factor.

DISCUSSION

Although the basic reproduction number is a very powerful and elegant summarizing parameter at the heart of infectious disease transmission dynamics, the paucity of opportunities, and difficulties to obtain direct estimates of R ₀, make the FOI extremely relevant. Indeed, it may often be the parameter, which through its estimation allows all other parameters to be estimated.

In disease burden estimates, it determines estimates of the occurrence of infections, of clinical cases and all consequences arising from these cases (e.g. hospitalizations, deaths, life-years lost, disability adjusted life years). Moreover, in studies estimating the impact of infectious disease interventions, such as vaccination, in terms of effectiveness and cost-effectiveness, the FOI will time and again rank high if not first of the most influential parameters determining the outcomes, and hence the public health policy measures based on these outcomes. Therefore estimating the FOI as accurately as possible, is essential.

Muench's seminal works published 75 and 50 years ago, are still inspirational today for the conceptual basis he provided for the FOI, and the pitfalls and problems he identified. We have described not only the various extensions those standing on the shoulders of Muench have proposed, but also noted that we are still trying to deal with the same pitfalls and problems already identified by Muench.

There appears to have been an increased interest in developing approaches for the estimation of the FOI since the mid-1980s [Reference Hens68]. We hope the guidance we have provided will be useful for researchers to decide, after exploration with a non-parametric method, which of the parametric methods is best suited for the dataset under consideration.

ACKNOWLEDGEMENTS

We thank the editor and both referees for their valuable suggestions that have led to an improved version of the manuscript. This work was supported by research project (MSM 0021620839), funded by ‘SIMID’, a strategic basic research project funded by the institute for the Promotion of Innovation by Science and Technology in Flanders (IWT) (project number 06008); by the Fund of Scientific Research (FWO, Research Grant G039304) in Flanders, Belgium; and by the IAP research network (no. P6/03) of the Belgian Government (Belgian Science Policy). The R-code used to analyse the datasets in this manuscript is available from the authors.

APPENDIX Glossary of words in alphabetical order

DECLARATION OF INTEREST

None.

References

REFERENCES

1.Bernoulli, D. An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it [in French]. Histoire et Mémoires de l'Académie des Sciences 1760; 2: 1–79.Google Scholar

2.Hamer, WH. Epidemic disease in England. Lancet 1906; i: 733–739.Google Scholar

3.Brownlee, J. Statistical studies in immunity: the theory of an epidemic. Proceedings Royal Society of Edinburgh 1906; 26: 484–521.CrossRef Google Scholar

4.Ross, R. The Prevention of Malaria. London: John Murray 1911.Google Scholar PubMed

5.Kermack, WO, McKendrick, AG. Contributions to the mathematical theory of epidemics. Proceedings of the Royal Society of London, Series A 1927; 115: 700–721.Google Scholar

6.Muench, H. Derivation of rates from summation data by the catalytic curve. Journal of the American Statistical Association 1934; 29: 25–38.CrossRef Google Scholar

7.Muench, H. Catalytic Models in Epidemiology. Harvard University Press: Boston, 1959.CrossRef Google Scholar

8.Collins, SD. Age incidence of the common communicable diseases of children. Public Health Reports 1929; 44: 14.Google Scholar

9.Farrington, CP, Kanaan, MN, Gay, NJ. Estimation of the basic reproduction number for infectious diseases from age-stratified serological survey data. Journal of the Royal Statistical Society, Series C 2001; 50: 251–292.CrossRef Google Scholar

10.Hens, N, et al. Modelling multi-sera data: the estimation of new joint and conditional epidemiological parameters. Statistics in Medicine 2008; 27: 2651–2664.CrossRef Google Scholar

11.Griffiths, D. A catalytic model of infection for measles. Journal of the Royal Statistical Society, Series C 1974; 23: 330–339.Google Scholar

12.Gay, NJ. A model of long-term decline in the transmissibility of an infectious disease: implications for the incidence of hepatitis A. International Journal of Epidemiology 1996; 25: 854–861.CrossRef Google Scholar

13.Schenzle, D, Dietz, K, Frösner, GG. Antibody against hepatitis A in seven European countries. Statistical analysis of cross-sectional surveys. American Journal of Epidemiology 1979; 110: 70–76.CrossRef Google Scholar

14.Anderson, RM, May, RM. Directly transmitted infectious diseases: control by vaccination. Science 1982; 215: 1053–1060.CrossRef Google Scholar PubMed

15.Grenfell, BT, Anderson, RM. The estimation of age-related rates of infection from case notifications and serological data. Journal of Hygiene 1985; 95: 19–36.CrossRef Google Scholar PubMed

16.Farrington, CP. Modeling forces of infection for measles, mumps and rubella. Statistics in Medicine 1990; 9: 953–967.CrossRef Google Scholar PubMed

17.Massad, E, Raimundo, SM, Silveira, ASB. A continuous function model for the age-related force of infection. Mathematical and Computer Modelling 1990; 13: 101–112.CrossRef Google Scholar

18.McCullagh, P, Nelder, JA. Generalized Linear Models. Chapman & Hall, 1989.CrossRef Google Scholar

19.Keiding, N, et al. Estimation from current status data in continuous time. Lifetime Data Analysis 1996; 2: 119–129.CrossRef Google Scholar PubMed

20.Shkedy, Z, et al. Modeling age dependent force of infection from prevalence data using fractional polynomials. Statistics in Medicine 2006; 5: 1577–1591.CrossRef Google Scholar

21.Hens, N, et al. Modelling distortions in seroprevalence data using change-point fractional polynomials. Statistical Modelling (in press).Google Scholar

22.Hens, N, et al. Handling missingness when modelling the force of infection from clustered seroprevalence data. Journal of Agricultural, Biological and Environmental Statistics 2007; 12: 1–16.CrossRef Google Scholar

23.Faes, C, et al. Population-averaged versus herd-specific force of infection. Journal of the Royal Statistical Society, Series C 2006; 55: 595–613.CrossRef Google Scholar

24.Van Effelterre, T, et al. Contact patterns and their implied basic reproductive numbers: an illustration for varicella-zoster virus. Epidemiology and Infection 2009; 137: 48–57.CrossRef Google Scholar

25.Nardone, A, et al. The comparative sero-epidemiology of varicella zoster virus in 11 countries in the European region. Vaccine 2007; 25: 7866–7872.CrossRef Google Scholar PubMed

26.Nardone, A, Miller, E. Serological surveillance of rubella in Europe: European sero-epidemiology network (ESEN2). Eurosurveillance 2004; 9: 5–7.CrossRef Google Scholar PubMed

27.Edmunds, WJ, Pebody, RG, Aggerback, HEA. The sero-epidemiology of diphtheria in Western Europe. European sero-epidemiology network (ESEN). Epidemiology and Infection 2000; 125: 113–125.CrossRef Google Scholar PubMed

28.Edmunds, WJ, et al. The prevaccination epidemiology of measles, mumps and rubella in Europe: implications for modeling studies. Epidemiology and Infection 2000; 125: 635–650.CrossRef Google Scholar

29.Beutels, P, et al. Hepatitis B in St Petersburg, Russia (1994–1999): incidence, prevalence and force of infection. Journal of Viral Hepatitis 2003; 10: 141–149.CrossRef Google Scholar

30.Keiding, N. Age-specific incidence and prevalence: a statistical perspective (with discussion). Journal of the Royal Statistical Society, Series A 1991; 154: 371–412.CrossRef Google Scholar

31.Nagelkerke, N, et al. Semi-parametric estimation of age-time specific infection incidence from serial prevalence data. Statistics in Medicine 1999; 18: 307–320.3.0.CO;2-Z>CrossRef Google Scholar PubMed

32.Shkedy, Z, et al. Modelling forces of infection by using monotone local polynomials. Journal of the Royal Statistical Society, Series C 2003; 52: 469–485.CrossRef Google Scholar

33.Greenhalgh, D, Dietz, K. Some bounds on estimates for reproductive ratios derived from the age-specific force of infection. Mathematical Biosciences 1994; 12: 49–57.Google Scholar

34.Shiboski, SC. Generalized additive models for current status data. Lifetime Data Analysis 1998; 4: 29–50.CrossRef Google Scholar PubMed

35.Hastie, TJ, Tibshirani, RJ. Generalized additive models: some applications. Journal of the American Statistical Association 1987; 82: 371–386.CrossRef Google Scholar

36.Hastie, TJ, Tibshirani, RJ. Generalized Additive Models. New York: Chapman and Hall, 1990.Google Scholar

37.Namata, H, et al. Estimation of the force of infection from current status data using generalized linear mixed models. Journal of Applied Statistics 2007; 34: 923–939.CrossRef Google Scholar

38.van Boven, M, et al. Waninig immunity and sub-clinical infection in an epidemic model: implications for pertussis in The Netherlands. Mathematical Biosciences 2000; 164: 161–182.CrossRef Google Scholar

39.Coen, PG, et al. Mathematical models of Haemophilus influenzae type B. Epidemiology and Infection 1998; 120: 281–295.CrossRef Google Scholar PubMed

40.Fine, PEM, Clarkson, JA. Measles in England and Wales – I: an analysis of factors underlying seasonal patterns. International Journal Epidemiology 1982; 11: 5–14.CrossRef Google Scholar PubMed

41.Grassly, NC, Fraser, C. Seasonal infectious disease epidemiology. Proceedings of the Royal Society of London, Series B 2006; 273, 2541–2550.Google Scholar PubMed

42.Zaaijer, HL, Koppelman, MH, Farrington, CP. Parvovirus B19 viraemia in Dutch blood donors. Epidemiology and Infection 2004; 132: 1161–1166.CrossRef Google Scholar PubMed

43.Whitaker, HJ, Farrington, CP. Estimation of infectious disease parameters from serological survey data: the impact of regular epidemics. Statistics in Medicine 2004; 23: 2429–2443.CrossRef Google Scholar PubMed

44.Mathei, C, et al. Evidence for a substantial role of sharing of injecting paraphernalia other than syringes/needles to the spread of hepatitis C among injecting drug users. Journal of Viral Hepatitis 2006; 13: 560–570.CrossRef Google Scholar

45.Sutherland, I, Svandova, E, Radhakrishna, SE. The development of clinical tuberculosis following infection with tubercle bacilli. Tubercle 1982; 62: 255–269.CrossRef Google Scholar

46.Sutherland, I. Recent studies in the epidemiology of tuberculosis, based on the risk of being infected with tubercle bacilli. Advances in Tuberculosis Research 1976; 19: 1–63.Google Scholar PubMed

47.Vynnycky, E and Fine, PEM. The natural history of tuberculosis: the implications of age-dependent risks of disease and the role of re-infection. Epidemiology and Infection 1997; 119: 183–201.CrossRef Google Scholar

48.Becker, NG. Analysis of Infectious Diseases Data. London: Chapman and Hall, 1989.Google Scholar

49.Hens, N, et al. Estimating the impact of vaccination using age–time-dependent incidence rates of hepatitis B. Epidemiology and Infection 2008; 136: 341–351.CrossRef Google Scholar PubMed

50.Amaku, M, et al. Vaccination against rubella: analysis of the temporal evolution of the age-dependent force of infection and the effects of different contact patterns. Physical Review E 2003; 67: 5.CrossRef Google Scholar PubMed

51.Zhang, YX. A compound catalytic model with both reversible and two-stage types and its applications in epidemiological study. International Journal of Epidemiology 1987; 16: 619–621.Google Scholar PubMed

52.Beutels, P, et al. Hepatitis B in St Petersburg, Russia (1994–1999): incidence, prevalence and force of infection. Journal of Viral Hepatitis 2003; 10: 141–149.CrossRef Google Scholar

53.Morris, MC, et al. Sero-epidemiological patterns of Epstein-Barr and herpes simplex (HSV-1 and HSV-2) viruses in England and Wales. Journal of Medical Virology 2002; 67: 522–527.CrossRef Google Scholar PubMed

54.Nardone, A, et al. The comparative sero-epidemiology of varicella zoster virus in 11 countries in the European region. Vaccine 2007; 25: 7866–7872.CrossRef Google Scholar PubMed

55.Shkedy, Z, et al. Modeling age dependent force of infection from prevalence data using fractional polynomials. Statistics in Medicine 2006; 5: 1577–1591.CrossRef Google Scholar

56.Thiry, N, et al. The seroepidemiology of primary varicella-zoster virus infection in Flanders (Belgium). European Journal of Pediatrics 2002; 161: 588–593.CrossRef Google Scholar PubMed

57.Whitaker, HJ, Farrington, CP. Infections with varying contact rates: application to varicella. Biometrics 2004; 60: 615–623.CrossRef Google Scholar PubMed

58.Ades, AE, Nokes, DJ. Modeling age- and time-specific incidence from seroprevalence: toxoplasmosis. American Journal of Epidemiology 1993; 137: 1022–1034.CrossRef Google Scholar PubMed

59.Mossong, J, et al. Parvovirus B19 infection in five European countries: seroepidemiology, force of infection and maternal risk of infection. Epidemiology and Infection 2008; 136: 1059–1068.CrossRef Google Scholar PubMed

60.Gay, NJ. Analysis of serological surveys using mixture models: application to a survey of parvovirus B19. Statistics in Medicine 1996; 15: 1567–1573.3.0.CO;2-G>CrossRef Google Scholar PubMed

61.Bollaerts, K, et al. Estimating the force of infection directly from antibody levels. Statistics in Medicine (in press).Google Scholar

62.Kanaan, MN, Farrington, CP. Matrix models for childhood infections: a Bayesian approach with applications to rubella and mumps. Epidemiology and Infection 2005; 133: 1009–1021.CrossRef Google Scholar PubMed

63.Coutinho, FAB, et al. Modelling heterogeneities in individual frailties in epidemic models. Mathematical and Computer Modelling 1999; 30: 97–115.CrossRef Google Scholar

64.Anderson, RM, May, RM. Infectious Diseases of Humans: Dynamics and Control. Oxford: Oxford University Press, 1991.CrossRef Google Scholar

65.Wallinga, J, Teunis, P, Kretzschmar, M. Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. American Journal of Epidemiology 2006; 164: 936–944.CrossRef Google Scholar PubMed

66.Ogunjimi, B, et al. Using empirical social contact data to model person to person infectious disease transmission: an illustration for varicella. Mathematical Biosciences 2009; 278: 80–87.CrossRef Google Scholar

67.Goeyvaerts, N, et al. Estimating transmission parameters and the basic reproduction number using social contact data and serological data on varicella zoster virus in Belgium. Journal of the Royal Statistical Society, Series C (in press).Google Scholar

68.Hens, N, et al. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. New York: Springer-Verlag (in press).CrossRef Google Scholar

69.Wilson, EB, Worcester, J. Contact with measles. Proceedings of the National Academy of Sciences USA 1941; 27: 7–13.CrossRef Google Scholar

Fig. 1. Muench's catalytic model for λ=0·15 and various choices for k and l [see equation (1)].

Fig. 2. (a) UK rubella and (b) Belgium parvovirus B19 infection by age (years): the observed seroprevalence per integer age value with size proportional to the sample taken (•), the fitted seroprevalence curve (–––, upper curve) and the FOI curve (- - -, lower curve). Four different models were used: Muench's constant FOI model (black curves); Farrington's exponentially damped model (red curve), a spline model (green curve) and its monotonized version (green-blue-green curve) as applied by Hens et al. [10]. Note that, by definition, the latter curve (partly) overlaps with the green curve for rubella (parvovirus B19).

Table 1. Historical overview of key-contributions to the estimation of the force of infection (FOI) from summation data in the 20th century

Fig. 3. Flow chart of a practical guide to estimate the force of infection (FOI) from seroprevalence data with reference to the literature on what to do and how to do it.

Article contents

Seventy-five years of estimating the force of infection from current status data

Summary

Keywords

INTRODUCTION

Standing on the shoulders of Muench

Parametric methods

Non-parametric methods

A practical guide to estimate the FOI

Muench today: a road map to another 75 years?

DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX Glossary of words in alphabetical order

DECLARATION OF INTEREST

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests