INTRODUCTION
Rheumatic fever (RF) and its sequela rheumatic heart disease (RHD) cause an important burden of disease in New Zealand (NZ), Australia and developing countries [Reference Ibrahim-Khalil1, Reference Ogunbi2]. In NZ, RF is now almost exclusively a disease of Māori and Pacific children [Reference Milne3–Reference Jaine, Baker and Venugopal6]. RF is not usually fatal, but may produce arthritis, arthralgia, chorea and RHD [Reference Jones7, Reference Ralph and Carapetis8] Between 42% and 60% of RF cases develop RHD if not treated monthly with intramuscular antibiotics over a period of at least 10 years [Reference Ralph and Carapetis8]. RHD is a leading cause of preventable mortality in NZ Māori and Pacific populations. The mean age of death for Māori with RHD is 57·4 years, and 55·4 years for Pacific peoples [Reference Milne3]. By contrast, the NZ population as a whole has a life expectancy of 80·9 years [9]. The economic impact of RHD and RF is substantial with direct hospital costs alone amounting to NZ$12 million per annum over 2000–2009 [Reference Milne3].
NZ has three major sources of RF surveillance information: national hospitalization data, national notification data, and regional registers. All of these surveillance sources have major limitations that negatively affect their ability to generate national case totals and descriptive epidemiological data [Reference Loring10–Reference Moxon13]. These limitations reduce the ability of RF surveillance to support strategy-focused activities (e.g. describing the incidence and distribution of disease to help identify and evaluate interventions) and also control-focused activities (e.g. case management, including delivery of penicillin prophylaxis) to reduce the incidence and impact of this disease [Reference Baker, Easther and Wilson14].
The National Health Index (NHI) number is used to identify patients across all healthcare events. This number is linked to variables such as sex and ethnicity, allowing such variables to be checked, thus improving the internal completeness of datasets. The NHI also links healthcare events over time so readmissions, recurrences and hospital transfers can be identified with a fairly high degree of certainty.
District health boards are required to electronically submit data on all publicly funded hospitalizations. The Ministry of Health collates these data to form the National Minimum Dataset [Reference Craig, Jackson and Han15]. Because hospitalization is recommended for all RF cases [Reference Atatoa-Carr, Lennon and Wilson16], the hospitalization dataset should be a comprehensive surveillance data source [Reference Loring10]. However, such data contains misdiagnosed and miscoded cases. An analysis of Waikato RF hospitalization data found that this source over-counted cases by 25% [Reference Atatoa-Carr, Bell and Lennon12]. An analysis of Auckland hospitalization data found that RF cases in this region were over-counted by 30% [Reference Moxon13].
RF was made a notifiable disease in 1986 [Reference Millen17]. The notification dataset includes the case's demographic and diagnostic details [18]. Severe under-notification of cases has been documented, with incompleteness ranging between 10% and 50% in different regions [Reference Loring10].
The NZ population of 4·4 million people includes 1·4 million concentrated in the Auckland region [19]. National healthcare delivery is managed by 20 district health boards. There are 11 regional registers [Reference Yap20], which aim to track patients and therefore support delivery of prophylactic antibiotics to prevent RF recurrences. Consequently cases are scrutinized carefully before placing them on these databases [21, 22]. The registers' use as a surveillance tool is a by-product of this patient management function [Reference Atatoa-Carr, Lennon and Wilson16, 18]. Of these, the Auckland register is the longest running and most complete, covering about 45% of recorded cases in NZ. It is also seen as having the most rigorous case definition and therefore supplying the most accurate estimates of RF incidence, although data collection is restricted to the Auckland region [Reference Moxon13, Reference Yap20].
Due to the limitations described here, none of the existing surveillance data sources can provide a complete or accurate base for describing the incidence and distribution of RF across NZ [Reference Loring10, Reference Jackson and Lennon11, Reference Atatoa-Carr, Lennon and Wilson16]. One of the top ten goals of the NZ Government sector is to achieve a two-thirds reduction in the incidence of initial bouts of RF by mid-2017, reaching the rate of 1·4/100 000 [23]. However the actual current incidence is unclear. In order to meet our aim of estimating the likely incidence of RF in NZ, we performed capture–recapture analyses using national hospitalization and notification data covering the 15-year period 1997–2011. The resulting incidence rate estimate may be useful as a baseline for measuring progress towards the Government's RF reduction target. These analyses formed part of a comprehensive surveillance sector review which led to recommendations being made to the Ministry of Health on what potential improvements could be made to the NZ surveillance sector with the aim of supporting optimal RF control and prevention activities [Reference Oliver, Pierse and Baker24].
METHODS
Data sources
We obtained national RF hospitalization data covering the period 1988–2011 from the Ministry of Health Analytical Services Section. We extracted cases with a principal diagnosis of RF (ICD-9 390–392, ICD-10 I00-I02). Each record included an encrypted NHI number, used to distinguish first admissions from later ones.
We received national RF notification data compiled by Environmental Science and Research Ltd (ESR) for the period 1997–2011 containing encrypted NHIs for most entries.
We also received data from four regional registers covering the following areas: Hawke's Bay, Rotorua, Waikato and Wellington. All registers contained encrypted NHIs and all except the Rotorua register contained additional case demographic information.
We received a selection of summary statistics on the Auckland Register from D. Lennon (Professor of Pediatrics, University of Auckland). Unfortunately the raw data were not made available to us. The odds ratio of matching these groups to notification and hospitalization datasets were calculated from the statistics provided.
Creating data subsets
The program R v. 2·15·0 was used throughout the analysis [25]. Entries lacking encrypted NHIs were removed. Where we detected repeated entries, all entries for that individual were removed except the first.
Our initial hospitalization dataset contained only NZ residents with a principal diagnosis of RF (assigned between 1997 and 2011), who had never previously been assigned a principal or additional diagnosis of RF or a principal diagnosis of RHD. Hospital transfers were excluded, so only the first hospitalization was recorded for each case. In doing this, we attempted to make the dataset exclusive to new initial RF presentations. This method is in accordance with the approach recently adopted by the Ministry of Health to measure the incidence of new RF cases. The initial notification dataset contained individuals notified between 1997 and 2011, who had no previous episodes of RF recorded in their first notification, and could not be matched to a hospitalization event for RHD prior to RF being assigned as a principal diagnosis. Basic descriptive analyses were performed on the initial hospitalization and initial notification datasets.
Matching individuals between datasets
An overlap occurred when an encrypted NHI in one dataset could be matched to the same encrypted NHI in the other. Most analyses involved matching data from the initial notification dataset to the initial hospitalization dataset, and vice versa.
In order to discover the number of individuals who could not be matched, we subtracted matched individuals in each dataset from the total number in the dataset. When investigating whether people with certain characteristics were more likely to be matched, we performed stratified analyses using logistic models. These models were used to calculate odds ratios (ORs) and 95% confidence intervals (CIs) of matching subgroups, compared to a reference subgroup. Reference subgroups were usually selected on the basis that they contained the highest number of individuals.
All analyses containing register data were completed using the reduced RF hospitalization dataset and overall notification dataset, rather than the initial datasets. It would be erroneous to use initial datasets for the register analyses as people with RHD, previous RF attacks and additional RF diagnoses may be receiving prophylaxis. The reduced hospitalization dataset included all first RF hospitalizations between 1997 and 2011 that had been assigned principal or additional diagnoses of RF, regardless of whether individuals had been hospitalized with RHD previously. The overall notification dataset included all notifications between 1997 and 2011.
Capture–recapture analysis
The total size of a population may be detected by imperfect surveillance systems [Reference Chapman26–Reference Gemmell, Millar and Hay30]. Two such systems exist in the context of this study; the initial notification and initial hospitalization datasets. Using the equation derived by Chapman [Reference Chapman26] the likely true number of initial RF cases may be estimated as:
where N is the number of true initial RF cases (according to the Chapman estimate); M is the total cases in the initial hospitalization dataset; C is the total cases in the initial notification dataset; and R is the number of cases in both the initial hospitalization dataset and the initial notification dataset
Based on the central limit theorem, the standard error of the estimate of N can be calculated by:
95% CIs can then be calculated using the equation:
Dividing the number of initial cases recorded in the dataset in question by N gives an estimate of its sensitivity [Reference Chapman26].
The Chapman estimate assumes that the datasets are independent and that only true cases are recorded. Neither of these assumptions is likely to be satisfied. We have therefore presented five scenarios where we have adjusted the data to reflect the lack of independence and data inaccuracies.
The baseline scenario, scenario 1, assumes the datasets are independent and accurate. This is most likely incorrect [Reference Atatoa-Carr, Bell and Lennon12, Reference Moxon13]. In scenario 2 we have adjusted for dataset inaccuracies by assuming that the notification dataset overstates cases by 2·7% and the hospitalization dataset overstates cases by 25·0% (as found by a Waikato analysis) [Reference Atatoa-Carr, Bell and Lennon12]. The positive predictive value (PPV) for the initial notification dataset was therefore set at 97·3% and the PPV for the initial hospitalization dataset was set at 75·0%. In scenario 3 we have the same assumptions and PPVs as scenario 2, but have also adjusted for positive dependence between the datasets. Consequently we have reduced the overlap rate to account for hospitals directly notifying cases.
In scenario 4 we used the results of a case audit in the Auckland region, which indicated slightly different PPVs for the hospital and notification datasets (i.e. 67·0% and 78·7%, respectively). The PPV of the overlap section was 88·0% [Reference Atatoa-Carr, Bell and Lennon12, Reference Moxon13]. However the Auckland register is considered the best regional register and thus these findings do not reflect the situation across the whole country. As 47·6% of all RF cases come from Auckland, we present scenario 5. In scenario 5 we have assigned the PPVs calculated using the Auckland analysis [Reference Moxon13] a weight of 50·0%. PPVs used in scenarios 2 and 3 have also been weighted at 50·0%.
RESULTS
RF case data
The distribution of initial RF cases arising during 1997–2011 according to major national data sources is shown in Table 1. There were considerably higher proportions of Māori and people of other ethnicities in the notification dataset than in the initial hospitalization dataset (P = 0·0003).
RF, Rheumatic fever.
* All cases entered into the Auckland register over the period 1998–2010
† All cases entered into the Hawke's Bay and Waikato registers over the period 1997–2011.
Māori were greatly overrepresented in the initial hospitalization dataset, compared to national hospitalization data (i.e. all hospitalizations for 5- to 15-year-olds over 1997–2011, not only those diagnosed with RF or RHD). The proportion of Māori in the initial hospitalization dataset was 2·1 times greater. Pacific peoples were also over-represented; the initial hospitalization dataset contained a 3·6 times greater proportion. There was a slightly higher proportion of males in the initial hospitalization dataset. RF is much more common in socio-economically deprived groups, with 64% of hospitalized cases coming from the most deprived 20% of NZ census area units. By contrast, 30·1% of all hospitalizations were from the two most deprived deciles.
Matched RF data
Altogether 2111 individuals were identified in the initial notification and initial hospitalization datasets. The majority of individuals recorded as hospitalized with RF (1021/1953 individuals, i.e. 86·6%) could be matched to the notification dataset. Of those on the initial notification dataset, 158 (13·4%) notifications were not matched to the initial hospitalization dataset (Fig. 1).
The likelihood of matching cases between datasets was influenced by a patinet's age, sex, ethnicity and whether their disease presentation met the Jones criteria. If older than the reference group (5–15 years), a patient's odds of matching between both datsets declined. The OR for hospitalized patients being matched to the initial notification dataset was 0·46 (95% CI 0·36–0·60) for the 16–30 years age group, and 0·08 (95% CI 0·05–0·15) for those aged >30 years, compared to the reference group. Similarly, the OR of notified patients being matched to the initial hospitalization dataset was 0·30 (95% CI 0·20–0·44) for the 16–30 years age group, and 0·15 (95% CI 0·07–0·35) for those aged >30 years. Notified women were less likely to be matched to the initial hospitalization dataset (OR 0·65, 95% CI 0·46–0·93), and also less likely to be matched from the initial hospitalization dataset to the initial notification dataset (OR 0·83, 95% CI 0·70–0·997). Māori had the highest odds of matching to the initial notification dataset. If the Jones criteria were not met, notified patients were less likely to be matched to the initial hospitalization dataset (OR 0·27, 95% CI 0·25–0·64) (Table 2).
OR, Odds ratio; CI, confidence interval.
Bold values are statistically significant.
* Compared to reference groups: age 5–15 years; male; Māori ethnicity; meets the Jones criteria.
† Auckland register data covers cases added to the register between 1998 and 2010.
‡ Upper North Island registers are Hawke's Bay and Waikato registers.
Aucklanders in the initial hospitalization dataset were significantly less likely to be matched to the initial notification dataset, compared to those from Wellington, Waikato, Hawke's Bay, Northland and Tairawhiti. Wellington and Tairawhiti notifications were less likely than all other areas to be matched to the initial hospitalization dataset.
We also matched from a selection of regional registers to the hospitalization and notification datasets. Of those on the regional registers, 51% could be matched to the overall notification dataset and 59% to the reduced hospitalization dataset. Of those on the overall notification dataset, only about 80% were recorded as receiving register-based prophylaxis in the region they were notified as being in. Both the overall notification dataset and reduced hospitalization datasets contained cases that could not be matched to their local register.
Estimated incidence of RF
Table 3 shows the best estimates of the likely true case numbers according to the scenarios outlined in the Methods section.
CI, Confidence interval.
Scenario 1 is the baseline with no adjustment; scenario 2 is adjusted for the over-reporting in the datasets (based on a Waikato audit), scenario 3 is adjusted for the over-reporting and positive dependency. Scenario 4 adjusts for both the over-reporting and positive dependency (based on the Auckland data).
Scenario 5 uses the average values from scenarios 3 and 4 to give the most plausible estimate of the true number of cases.
Overall, notification data are consistently less sensitive that hospitalization data. Estimates of its sensitivity vary between 50% and 70%, with the best estimate (based on scenario 5 as +62%). Hospitalization data, according to our estimates, is between 70% and 97% sensitive, with the best estimate being 82% sensitive.
It is likely that there has been a 165% increase in the annual number of initial true RF cases over the period 1997–2011, with between 146 and 158 new cases occurring during 2011.
The annual number of RF notifications is consistently lower than overall hospital diagnoses (i.e. principal and additional diagnoses of RF including those with previous RF and RHD admissions); however, the notification curve roughly follows the curves and troughs seen in this hospitalization curve. This pattern may be indicative of ICD code misclassification. The overall hospital diagnoses curve is consistently higher than the initial RF curves and the discrepancy has widened slightly in recent years. This pattern suggests RF cases are increasingly being hospitalized for complications and/or comorbidities. From 2005, overall hospitalizations and notifications broadly increased, both peaking in 2010. Our estimated number of cases very closely follows the number of initial RF hospitalizations (Fig. 2).
DISCUSSION
Here we present the first national estimate of new RF cases (excluding those with previous RF/RHD) in NZ residents, based on combined data from national notification and hospitalization data and knowledge gained from regional RF registers. The capture–recapture analysis indicates there were, on average, about 113 cases arising per year over the period 1997–2011.
RF hospitalization rates of around 300 events per year, and 100–160 new initial cases annually (Fig. 2), are comparable to that seen in Indigenous Australian populations, which are known to have among the highest RF rates in the world [Reference Parnaby and Carapetis31]. These rates may also be comparable to true South African incidence rates; however, under-diagnosis is likely to be a major issue in that region [Reference Ralph and Carapetis8, Reference Nkgudi32]. Importantly, this analysis showed that none of the current national RF surveillance systems are complete or accurate. The finding that just over 50% of hospitalized cases could be matched to the notification dataset is consistent with under-notification documented in previous literature [Reference Loring10]. The relatively poor sensitivity of the notification dataset also indicates that under-notification is a serious problem, particularly in the populous Auckland region. As the overall notification and reduced hospitalization datasets contained cases that could not be matched to their local register, this indicates that the registers may be incomplete.
Despite the higher sensitivity, a high rate of misdiagnosis and/or case miscoding affects the hospitalization dataset. Our findings support previous research, by showing that 47·7% of all hospitalized cases assigned RF codes could not be matched to the notification dataset [Reference Loring10]. It is possible that the higher PPV observed with the notification dataset might result from clinicians being more likely to notify severely unwell patients.
Hospitalized cases have diagnostic codes assigned at discharge. This is an imperfect process and does not include mechanisms to revise subsequent diagnoses. Furthermore, there are no ICD codes denoting ‘possible’, ‘probable’ or ‘definite’ RF status. Cases may be notified as ‘probable’, ‘confirmed’ or ‘under investigation’ when notifying. Thus clinical information concerning case status [Reference Atatoa-Carr, Lennon and Wilson16] does not translate well to national data sources.
The proportion of people aged >30 years assigned RF codes is almost three times higher than the proportion of notified cases in this age group. As an initial episode of RF is rare in people aged >30 years [Reference Jaine, Baker and Venugopal6, Reference Chapman26, Reference Taranta and Markowitz33], it seems likely that diagnoses in this group were frequently not confirmed, resulting in few notifications and inaccurate hospitalization data.
It is concerning that the odds of matching female and older cases were lower than the odds of matching younger males between both national datasets. Our findings here may, in part, be attributable to a higher rate of miscoding in these groups. However, there is pre-existing evidence that women and older patients with cardiovascular pathologies are less likely to have their symptoms investigated or receive treatment. Such findings could be at least partially attributed to clinician preconceptions and underuse of active therapies [Reference Anderson and Pepine34]. Our results suggest these findings may also hold relevance to RF in NZ. This suggestion may be supported by the fact that while there are more male RF patients in NZ [Reference Harvey, Craney and Kelly27], a significantly greater proportion of women die from RHD [Reference Milne3]. Additionally, previous research has shown that Māori and Pacific women with mechanical or bioprosthetic cardiac valves have a 7- to 8-fold increased risk of dying, compared to European and Asian women [Reference North35]. Together, this research indicates that management and treatment outcomes for Māori and Pacific women with RF and RHD are of particular concern.
Calculating an overall total for RF is complicated by the fact that there is clinically detected RF, and RF that is not detected. Patients may be hospitalized with RHD without having been diagnosed first with RF. Five scenarios for the total number of detected RF cases are presented, of which scenario 5 is probably the most robust. This indicates the likely true number of RF cases arising in NZ over the 15-year period 1997–2011 ranges from 1668 to 1728, with an average of 113 cases annually.
Our major limitation is that the capture–recapture analysis is designed to work with datasets which are largely independent, but the notification and hospitalization datasets are most likely non-independent. This assumption is almost always violated in capture–recapture analyses, therefore attempts to reduce the effect of bias are recommended [36]. We have no way of knowing the extent to which the datasets are non-independent or how greatly we violated the assumption of independence in our analyses. The Chapman estimate formula, however, has been designed to reduce the effect of bias when working with non-independent datasets [Reference Chapman26, Reference Brugal37, Reference Corrao38]. We have also adjusted the estimates to take account of the estimated (and large) inaccuracies in these datasets. Capture–recapture analysis has been widely used to enhance infectious disease surveillance [Reference Nkgudi32], notably for calculating likely true case numbers using multiple incomplete data sources and for assessing the magnitude of under-reporting, much in the manner performed here [Reference Giorgi39–Reference van Hest48].
If an individual had more than one NHI number, they may have been counted multiple times in our analyses (although the Ministry of Health indicates that it is now unusual for younger patients in NZ to have multiple NHIs). A number of cases in the notification dataset had not been assigned NHI numbers and could not be linked to any recorded NHI number and consequently were excluded from the analysis. If the datasets are independent then the conclusions will not be affected by excluding such cases. If, however, the datasets are not independent as is plausible, then given the positive dependence between the datasets we are slightly overestimating the total number of cases and underestimating the cases in the older datasets. Our analyses would have also been affected by inaccuracies in ICD coding; however, we attempted to correct for this by using different error scenarios (Table 3).
We assume that well maintained regional registers are more accurate than notification or hospitalization data. Both the Waikato [Reference Atatoa-Carr, Bell and Lennon12] and Auckland [Reference Moxon13] RF case audits indicate that these registers are more sensitive and specific than the national datasets. The PPVs used in our scenarios have been calculated using dataset sensitivity estimates based on these findings.
The major implication of this research is the need for NZ to develop a single national RF register. Such a register could take over the functions of the current regional registers and, ideally, the RF notification database. It could build-in additional quality features (such as case definitions used by the Auckland Regional register) and an expanded dataset covering important RF risk factors. This proposed register has potential to save time and effort by requiring clinicians to report cases to a single database instead of two. It would also provide a much more comprehensive base for national RF surveillance through enabling effective case management, to support disease control and by producing strategy-focused surveillance information [Reference Jackson and Lennon11, Reference Oliver, Pierse and Baker24].
Our findings also suggest the need to investigate clinicians' awareness and index of suspicion of RF in individuals who are not male, Māori and aged 5–15 years. The possibility of under-diagnosis and reduced hospitalization of female cases should also be investigated. If verified by subsequent research, intervention is necessary.
What is already known
RF is an important cause of morbidity and mortality in NZ, especially in Māori and Pacific children. Accurate national RF epidemiological information is not able to be generated as major limitations affect all three sources of national RF surveillance data (i.e. the notification database, hospitalization records, and regional registers). Capture–recapture analyses are useful when calculating the likely true size of a population in situations where that population is detectable only using flawed surveillance systems.
What this study adds
Over the period 1997–2011, it is likely that 1668–1728 new RF cases occurred in NZ, an average of 113 per year. The hospitalization dataset is our most comprehensive national RF surveillance system and it detects about 82% of these cases, while the notification dataset detects about 62%. Male, Māori patients aged 5–15 years are most likely to be matched between hospitalization and notification datasets, and vice versa. This implies they are more likely than people of other demographics to be both hospitalized and notified with RF. There is a clear need to improve information linkage and develop a more comprehensive national surveillance system, such as a national RF register.
ACKNOWLEDGEMENTS
We acknowledge our co-advisors: Tomasz Kiedrzynski and Nicholas Jones. We thank Chris Lewis, from the Ministry of Health Analytical Services Section, for encrypting the notification data; and Jane Zhang, who provided the hospitalization and mortality data. We acknowledge John Tagg, Graham Mackereth, John Malcolm, Rob Siebers, and Nicholas Jones, who all reviewed this research. We also thank the staff who provided register data from the following organizations: Rotorua General Practice Group, Capital and Coast DHB, Hawke's Bay DHB and Waikato DHB; and Diana Lennon, who provided Auckland register statistics. This project was supported by a scholarship awarded to J.O. by the Ministry of Health.
DECLARATION OF INTEREST
None.