We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Anxiety in pregnancy and after giving birth (the perinatal period) is highly prevalent but under-recognised. Robust methods of assessing perinatal anxiety are essential for services to identify and treat women appropriately.
Aims
To determine which assessment measures are most psychometrically robust and effective at identifying women with perinatal anxiety (primary objective) and depression (secondary objective).
Method
We conducted a prospective longitudinal cohort study of 2243 women who completed five measures of anxiety and depression (Generalized Anxiety Disorder scale (GAD) two- and seven-item versions; Whooley questions; Clinical Outcomes in Routine Evaluation (CORE-10); and Stirling Antenatal Anxiety Scale (SAAS)) during pregnancy (15 weeks, 22 weeks and 31 weeks) and after birth (6 weeks). To assess diagnostic accuracy a sample of 403 participants completed modules of the Mini-International Neuropsychiatric Interview (MINI).
Results
The best diagnostic accuracy for anxiety was shown by the CORE-10 and SAAS. The best diagnostic accuracy for depression was shown by the CORE-10, SAAS and Whooley questions, although the SAAS had lower specificity. The same cut-off scores for each measure were optimal for identifying anxiety or depression (SAAS ≥9; CORE-10 ≥9; Whooley ≥1). All measures were psychometrically robust, with good internal consistency, convergent validity and unidimensional factor structure.
Conclusions
This study identified robust and effective methods of assessing perinatal anxiety and depression. We recommend using the CORE-10 or SAAS to assess perinatal anxiety and the CORE-10 or Whooley questions to assess depression. The GAD-2 and GAD-7 did not perform as well as other measures and optimal cut-offs were lower than currently recommended.
Early detection of ST-segment elevation myocardial infarction (STEMI) on the prehospital electrocardiogram (ECG) improves patient outcomes. Current software algorithms optimize sensitivity but have a high false-positive rate. The authors propose an algorithm to improve the specificity of STEMI diagnosis in the prehospital setting.
Methods:
A dataset of prehospital ECGs with verified outcomes was used to validate an algorithm to identify true and false-positive software interpretations of STEMI. Four criteria implicated in prior research to differentiate STEMI true positives were applied: heart rate <130, QRS <100, verification of ST-segment elevation, and absence of artifact. The test characteristics were calculated and regression analysis was used to examine the association between the number of criteria included and test characteristics.
Results:
There were 44,611 cases available. Of these, 1,193 were identified as STEMI by the software interpretation. Applying all four criteria had the highest positive likelihood ratio of 353 (95% CI, 201-595) and specificity of 99.96% (95% CI, 99.93-99.98), but the lowest sensitivity (14%; 95% CI, 11-17) and worst negative likelihood ratio (0.86; 95% CI, 0.84-0.89). There was a strong correlation between increased positive likelihood ratio (r2 = 0.90) and specificity (r2 = 0.85) with increasing number of criteria.
Conclusions:
Prehospital ECGs with a high probability of true STEMI can be accurately identified using these four criteria: heart rate <130, QRS <100, verification of ST-segment elevation, and absence of artifact. Applying these criteria to prehospital ECGs with software interpretations of STEMI could decrease false-positive field activations, while also reducing the need to rely on transmission for physician over-read. This can have significant clinical and quality implications for Emergency Medical Services (EMS) systems.
This study aimed to compare Greek Australian and English language normative data with regard to impairment rates yielded within a healthy Greek Australian older adult sample. We also examined whether optimal cut scores could be identified and capable of sensitively and specifically distinguishing between healthy Greek Australians from those with a diagnosis of Alzheimer’s disease (AD).
Method:
Ninety healthy Greek Australian older adults and 20 demographically matched individuals with a diagnosis of AD completed a range of neuropsychological measures, including the Wechsler Adult Intelligence Scale-Fourth Edition, Greek Adaptation (WAIS-IV GR), verbal and visual memory, language and naming, and executive functions. Impairment rates derived from the use of either Greek Australian or English language normative data were calculated and compared, using a 1.5 standard deviation criterion to denote impairment. Receiver operating characteristics curve analysis was used to investigate the sensitivity and specificity of alternate cut scores.
Results:
Impairment rates derived from the Greek Australian normative data showed that rates of impairment generally fell within the expected 7% range. In contrast, impairment rates for all tests derived using English language normative data were significantly higher and ranged from 11%–66%. Comparisons between healthy and AD participants with moderate dementia showed significant differences across all measures. Area under the curve results ranged from .721 to .999 across all measures, with most tests displaying excellent sensitivity and specificity.
Conclusions:
English language normative data were found to be inappropriate for use with Greek Australian elders, potentially leading to erroneous diagnostic outcomes. The use of minority group specific normative data and associated cut points appear to partially ameliorate this issue. Clinical implications are discussed alongside future research directions.
This is a brief presentation of the evidence from a systematic literature review of the diagnostic accuracy in suspected traumatic shaking. The national and international reaction to this systematic literature review is also addressed, along with rebuttal of the criticism and an interpretation of the hostile reception of the review. We argue that despite the fact that a scientific controversy often includes competing theories about mechanisms, the shaken baby controversy also includes a controversy about correlation knowledge, because its function is to corroborate (or falsify) the applied theories about mechanisms. Moreover, we argue that long personal experience and groupthink within child protection teams have influenced the development of biased gold standards, resulting in turn in circular reasoning: hence most of the shaken baby literature is flawed.
The Tilburg Frailty Indicator (TFI) is a validated tool for determining frailty in older adults. This study examined the validity and accuracy of the TFI Part B (TFI-B) in a North American context. Seventy-two individuals ≥ 65 years of age recruited from a rural geriatric medicine clinic completed a set of self-reported and performance-based measures, including TFI-B. Frailty level was determined using modified Fried’s Frailty Phenotype (FFP). Pearson correlation coefficients (r) assessed the concurrent relationships between the TFI-B and other measures. Accuracy of the TFI-B in classifying frailty level was assessed using assessing area under the curve (AUC). The TFI-B scores showed low correlations (r < 0.4) with gait speed and grip, suggesting that the TFI-B did not consider frailty as merely a physical problem. The AUC of 0.82 indicated that the TFI-B scores accurately classified frail versus non-frail individuals. The score of ≥ 5 on the TFI-B scores showed satisfactory sensitivity/specificity (73%/77%) and excellent negative predictive value (91.95%). This indicates that a TFI-B score of < 5 can be used to rule out frailty.
Primary health care (PHC) professionals may play a crucial role in improving early diagnosis of depressive disorders. However, only 50% of cases are detected in PHC. The most widely used screening instrument for major depression is the Patient Health Questionnaire (PHQ), including the two-, eight- and nine-item versions. Surprisingly, there is neither enough evidence about the validity of PHQ in PHC patients in Spain nor indications about how to interpret the total scores. This study aimed to gather validity evidence to support the use of the three PHQ versions to screen for major depression in PHC in Spain. Additionally, the present study provided information for helping professionals to choose the best PHQ version according to the context.
Methods
The sample was composed of 2579 participants from 22 Spanish PHC centers participating in the EIRA-3 study. The reliability and validity of the three PHQ versions for Spanish PHC patients were assessed based on responses to the questionnaire.
Results
The PHQ-8 and PHQ-9 showed high internal consistency. The results obtained confirm the theoretically expected relationship between PHQ results and anxiety, social support and health-related QoL. A single-factor solution was confirmed. Regarding to the level of agreement with the CIDI interview (used as the criterion), our results indicate that the PHQ has a good discrimination power. The optimal cut-off values were: ⩾2 for PHQ-2, ⩾7 for PHQ-8 and ⩾8 for PHQ-9.
Conclusions
PHQ is a good and valuable tool for detecting major depression in PHC patients in Spain.
Major depression has become one of the most frequent diagnoses in Germany. It is also quite prominent in cases referred for medicolegal assessment in insurance, compensation or disability claims. This report evaluates the validity of clinicians’ diagnoses of major depression in a sample of claimants. In 2015, n = 127 consecutive cases were examined for medicolegal assessment. All had been diagnosed with major depression by clinicians. All testees underwent a psychiatric interview, a physical examination, they answered questionnaires for depressive symptoms according to DSM-5, embitterment disorder, post-concussion syndrome (PCS) and unspecific somatic complaints. Performance and symptom validity tests were administered. Only 31% of the sample fulfilled the diagnostic criteria for DSM-5 major depression according to self-report, while none did so according to psychiatric assessment. Negative response bias was found in 64% of cases, feigned neurologic symptoms in 22%. Symptom exaggeration was indiscriminate rather than depression-specific. By self-report (i.e. symptom endorsement in questionnaires), 64% of the participants qualified for embitterment disorder and 93% for PCS. In conclusion, clinicians’ diagnoses of depression seem frequently erroneous. The reasons are improper assessment of the diagnostic criteria, confusion of depression with bereavement or embitterment and a failure to assess for response bias.
Evidence-based diagnostic methods have clinical and research applications in neuropsychology. A flexible Bayesian model was developed to yield diagnostic posttest probabilities from a single person’s neuropsychological score profile by utilizing sample descriptive statistics of the test battery across diagnostic populations of interest.
Methods:
Three studies examined the model’s performance. One simulation examined estimation accuracy of true z-scores. A diagnostic accuracy simulation utilized descriptive statistics from two popular neuropsychological tests, the Wechsler Adult Intelligence Scale–IV (WAIS-IV) and Repeatable Battery for the Assessment of Neuropsychological Status (RBANS). The final simulation examined posterior predictive accuracy of scores to those reported in the WAIS manual.
Results:
The model produced minimally biased z-score estimates (root mean square errors: .02–.18) with appropriate credible intervals (95% credible interval empirical coverage rates: .94–1.00). The model correctly classified 80.87% of simulated normal, mild cognitive impairment, and Alzheimer’s disease cases using a four subtest WAIS-IV and the RBANS compared to accuracies of 60.67–65.60% from alternative methods. The posterior predictions of raw scores closely aligned to percentile estimates published in the WAIS-IV manual.
Conclusion:
This model permits estimation of posttest probabilities for various combinations of neuropsychological tests across any number of clinical populations with the principal limitation being the accessibility of applicable reference samples. The model produced minimally biased estimates of true z-scores, high diagnostic classification rates, and accurate predictions of multiple reported percentiles while using only simple descriptive statistics from reference samples. Future nonsimulation research on clinical data is needed to fully explore the utility of such diagnostic prediction models.
Depression is common in persons experiencing mild cognitive impairment (MCI), with 32% (95% Cl 27, 37) overall experiencing depression. Persons with MCI who have depression have more cognitive changes compared to those without depression. To understand how we can detect depressive symptoms in persons with MCI, we undertook a systematic review to identify tools that were validated compared with a reference standard.
Design:
We searched MEDLINE, EMBASE, PsycINFO, and Cochrane from inception to April 25, 2021, and conducted a gray literature search. Title/abstract and full-text screening were completed in duplicate. Demographic information, reference standards, prevalence, and diagnostic accuracy measures were then extracted from included articles (PROSPERO CRD: CRD42016052120).
Results:
Across databases, 8,748 abstracts were generated after removing duplicates. Six hundred and sixty-five records underwent full-text screening, with six articles included for data extraction. Nine tools were identified compared to a reference standard, with multiple demonstrating a sensitivity of 100% (Brief Assessment Schedule Depression Cards, Beck Depression Inventory-II, Cornell Scale for Depression in Dementia, Zung Self-Rated Depression Scale, and the Neuropsychiatric Inventory). The second highest sensitivity reported was 89% (Patient Health Questionnaire-9). Too few studies were available for a meta-analysis.
Conclusions
Multiple depression detection tools have been examined amongst MCI outpatients, with several showing high sensitivity. However, this evidence is only present in single studies, with little demonstration of how differing MCI types affect accuracy. More research is needed to confirm the accuracy of these tools amongst persons with MCI. At this time, several tools could be suitable for use in cognitive clinics.
Clinical neuropsychology has been slow in adopting novelties in psychometrics, statistics, and technology. Researchers have indicated that the stationary nature of clinical neuropsychology endangers its evidence-based character. In addition to a technological crisis, there may be a statistical crisis affecting clinical neuropsychology. That is, the frequentist null hypothesis significance testing framework remains the dominant approach in clinical practice, despite a recent surge in critique on this framework. While the Bayesian framework has been put forward as a viable alternative in psychology in general, the possibilities it offers to clinical neuropsychology have not received much attention.
Method:
In the current position paper, we discuss and reflect on the value of Bayesian methods for the advancement of evidence-based clinical neuropsychology.
Results:
We aim to familiarize clinical neuropsychologists and neuropsychological researchers to Bayesian methods of inference and provide a clear rationale for why these methods are valuable for clinical neuropsychology.
Conclusion:
We argue that Bayesian methods allow for a more intuitive answer to our diagnostic questions and form a more solid foundation for sequential and adaptive diagnostic testing, representing uncertainty about patients’ observed test scores and cognitive modeling of test results.
Ictal semiology interpretation for differentiating psychogenic nonepileptic seizures (PNESs) and epileptic seizures (ESs) is important for the institution of appropriate treatment. Our objective was to assess the ability of different health care professionals (HCPs) or students to distinguish PNES from ES based on video-recorded seizure semiology.
Methods:
This study was designed following the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines. We showed in a random mix 36 videos of PNES or ES (18 each) and asked 558 participants to classify each seizure. The diagnostic accuracy of various groups of HCPs or students for PNES versus ES was assessed, as well as the effect of patient age and sex. Measures of diagnostic accuracy included sensitivity, specificity, and area under the curve (AUC).
Results:
The descending order of diagnostic accuracy (AUC) was the following (p ≤ 0.001): (1) neurologists and epileptologists; (2) neurology residents; (3) other specialists and nurses with experience in epilepsy; and (4) undergraduate medical students. Although there was a strong trend toward statistical difference, with AUC 95% confidence intervals (CIs) that were not overlapping, between epileptologists (95% CI 93, 97) compared to neurologists (95% CI 88, 91), and neurologists compared to electroencephalography technicians (95% CI 82, 87), multiple pairwise comparisons with the conservative Tukey–Kramer honest significant difference test revealed no statistical difference (p = 0.25 and 0.1, respectively). Patient age and sex did not have an effect on diagnostic accuracy in neurology specialists.
Conclusion:
Visual recognition of PNES by HCPs or students varies overall proportionately with the level of expertise in the field of neurology/epilepsy.
Dopaminergic imaging is an established biomarker for dementia with Lewy bodies, but its diagnostic accuracy at the mild cognitive impairment (MCI) stage remains uncertain.
Aims
To provide robust prospective evidence of the diagnostic accuracy of dopaminergic imaging at the MCI stage to either support or refute its inclusion as a biomarker for the diagnosis of MCI with Lewy bodies.
Method
We conducted a prospective diagnostic accuracy study of baseline dopaminergic imaging with [123I]N-ω-fluoropropyl-2β-carbomethoxy-3β-(4-iodophenyl)nortropane single-photon emission computerised tomography (123I-FP-CIT SPECT) in 144 patients with MCI. Images were rated as normal or abnormal by a panel of experts with access to striatal binding ratio results. Follow-up consensus diagnosis based on the presence of core features of Lewy body disease was used as the reference standard.
Results
At latest assessment (mean 2 years) 61 patients had probable MCI with Lewy bodies, 26 possible MCI with Lewy bodies and 57 MCI due to Alzheimer's disease. The sensitivity of baseline FP-CIT visual rating for probable MCI with Lewy bodies was 66% (95% CI 52–77%), specificity 88% (76–95%) and accuracy 76% (68–84%), with positive likelihood ratio 5.3.
Conclusions
It is over five times as likely for an abnormal scan to be found in probable MCI with Lewy bodies than MCI due to Alzheimer's disease. Dopaminergic imaging appears to be useful at the MCI stage in cases where Lewy body disease is suspected clinically.
The links between ADHD and addictive disorders have been the subject of a large number of studies showing a high prevalence rate of ADHD in substance abusing populations as well as an increased risk of substance use disorder (SUD) in ADHD patients that may be independent of other psychiatric conditions. High prevalence of ADHD has also been highlighted among individuals suffering from other addictive disorders such as pathological gambling. Adequate diagnosis of ADHD in SUD patients is challenged by phenomenological aspects of addiction and by frequently associated other psychiatric disorders that overlap with key symptoms of ADHD. A detailed comprehensive search for child and adult symptoms including the temporal relationship of ADHD, substance use and other psychiatric disorders should maximize the validity and the reliability of adult ADHD diagnosis in this population. Further, a follow-up evaluation of ADHD symptoms during treatment of SUD may reduce the likelihood of misdiagnosis. Finally, it should be noticed that when SUD occurs with ADHD, it is associated with a greater severity of SUD compared to other SUD patients. This has been shown with an earlier age at onset, antisocial behavior, risk for depression, chronicity of substance use, need for hospitalization and likelihood of a complicated course. Recent data suggest that the effects of ADHD on SUD outcomes are independent of other psychiatric comorbidities. This highlights the need of an earlier implementation of preventive interventions for substance use or behavioral addiction in children/adolescents with ADHD and the necessity to consider this disorder in the treatment of addictive disorders. Benefices and risk of MPH in adult patients with addiction and ADHD are discussed.
Outcomes for people with schizophrenia are improved by expedient diagnosis and specific treatment. ICD-11 and DSM-5 have reduced the importance of Schneider's first rank symptoms (FRS) in the diagnosis of schizophrenia; however, FRS may still offer a useful triage tool for the early identification of schizophrenia and initiation of antipsychotic therapy in high-demand and resource-poor settings. This commentary considers a Cochrane review that assesses the diagnostic accuracy of one or multiple FRS in diagnosing schizophrenia in adults and adolescents.
Item 9 of the Patient Health Questionnaire-9 (PHQ-9) queries about thoughts of death and self-harm, but not suicidality. Although it is sometimes used to assess suicide risk, most positive responses are not associated with suicidality. The PHQ-8, which omits Item 9, is thus increasingly used in research. We assessed equivalency of total score correlations and the diagnostic accuracy to detect major depression of the PHQ-8 and PHQ-9.
Methods
We conducted an individual patient data meta-analysis. We fit bivariate random-effects models to assess diagnostic accuracy.
Results
16 742 participants (2097 major depression cases) from 54 studies were included. The correlation between PHQ-8 and PHQ-9 scores was 0.996 (95% confidence interval 0.996 to 0.996). The standard cutoff score of 10 for the PHQ-9 maximized sensitivity + specificity for the PHQ-8 among studies that used a semi-structured diagnostic interview reference standard (N = 27). At cutoff 10, the PHQ-8 was less sensitive by 0.02 (−0.06 to 0.00) and more specific by 0.01 (0.00 to 0.01) among those studies (N = 27), with similar results for studies that used other types of interviews (N = 27). For all 54 primary studies combined, across all cutoffs, the PHQ-8 was less sensitive than the PHQ-9 by 0.00 to 0.05 (0.03 at cutoff 10), and specificity was within 0.01 for all cutoffs (0.00 to 0.01).
Conclusions
PHQ-8 and PHQ-9 total scores were similar. Sensitivity may be minimally reduced with the PHQ-8, but specificity is similar.
Introduction: Despite their widespread use, measures of classification accuracy (i.e. sensitivity and specificity) have several limitations that conceals relevant information and may bias decision-making. Assessing the predictive ability of clinical tools instead may provide more useful prognostic information to support decision-making, particularly in an Emergency setting. We sought to contrast classification accuracy versus predictive ability of the Systemic Inflammatory Response Syndrome (SIRS) and quick Sepsis-related Organ Failure Assessment (qSOFA) Sepsis scores for determining mortality risk among patients with infection transported by paramedics. Methods: A one-year cohort of patients with infections transported to the Emergency Department by paramedics was linked to in-hospital administrative databases. Hospital mortality was determined for each patient at the time of discharge. We calculated sensitivity and specificity of SIRS and qSOFA for classifying hospital mortality across different score thresholds, and estimated discrimination (assessed using the C statistic) and calibration (assessed visually) of prediction. Prediction models for hospital mortality were constructed using the aggregated SIRS or qSOFA scores for each patient as a predictor, while accounting for clustering by institution and adjusting for differences in patient age and sex. Predicted and observed risk were plotted to assess calibration and change in risk across levels of each score. Results: A total of 10,409 patients with infection who were transported by paramedics were successfully linked, with an overall mortality rate of 9.2%. The median SIRS score among non-survivors was 2, while the median qSOFA score was 1. SIRS score had higher sensitivity estimates than qSOFA for classifying hospital mortality at all thresholds (0.11 – 0.83 vs. 0.08 – 0.80), but the qSOFA score had better discrimination (C statistic 0.76 vs. 0.71) and calibration. The risk of hospital mortality predicted by the SIRS score ranged from 6.6-24% across score values, whereas the risk predicted by the qSOFA score ranged from 8.6-53%. Conclusion: Assessing the SIRS and qSOFA scores predictive ability reveals that the qSOFA score provides more information to clinicians about a patient's mortality risk despite having worse sensitivity. This study highlights important limitations of classification accuracy for diagnostic test studies and supports a shift toward assessing predictive ability instead. Character count 2490
Acute aortic dissection (AAD) is a time sensitive, difficult to diagnose, aortic emergency. We sought to explore the quality of history taking in AAD and assess its impact on misdiagnosis.
Methods
We studied a retrospective cohort of patients >18 years old who presented to two tertiary care emergency departments from January 1st 2004 – December 31st 2012 and were diagnosed with an acute aortic dissection (AAD) on CT, MRI or TEE. Trained reviewers’ extracted data using a standardized data collection form. The definitions of 5 pain characteristics – character, onset, duration, quality, and radiation were defined a priori.
Results
Data were collected for 194 cases of acute aortic dissection with a mean age of 65(SD 14.1) and 66.7% male, 34(17.6%) missed on initial presentation. Only 20(14.8%) patients were asked all 5 questions. The most common initial incorrect diagnosis were acute coronary syndrome (16, 47%), pulmonary embolism (5, 14.7%) and stroke (4, 11.7%). If <2 questions were asked 1 in 5 cases were missed, 4 times greater than if >2 were asked (P < 0.01).
Conclusion
Clinicians should ask and document the character, onset, duration, radiation and severity of pain in any patient presenting with chest, abdominal or flank pain. A focused history still remains the keystone to reducing misdiagnosis.
In the absence of perfect reference standard, classical techniques result in biased diagnostic accuracy and prevalence estimates. By statistically defining the true disease status, latent class models (LCM) constitute a promising alternative. However, LCM is a complex method which relies on parametric assumptions, including usually a conditional independence between tests and might suffer from data sparseness. We carefully applied LCMs to assess new campylobacter infection detection tests for which bacteriological culture is an imperfect reference standard. Five diagnostic tests (culture, polymerase chain reaction and three immunoenzymatic tests) of campylobacter infection were collected in 623 patients from Bordeaux and Lyon Hospitals, France. Their diagnostic accuracy were estimated with standard and extended LCMs with a thorough examination of models goodness-of-fit. The model including a residual dependence specific to the immunoenzymatic tests best complied with LCM assumptions. Asymptotic results of goodness-of-fit statistics were substantially impaired by data sparseness and empirical distributions were preferred. Results confirmed moderate sensitivity of the culture and high performances of immunoenzymatic tests. LCMs can be used to estimate diagnostic tests accuracy in the absence of perfect reference standard. However, their implementation and assessment require specific attention due to data sparseness and limitations of existing software.
Objective: Administrative data validation is essential for identifying biases and misclassification in research. The objective of this study was to determine the accuracy of diagnostic codes for acute stroke and transient ischemic attack (TIA) using the Ontario Stroke Registry (OSR) as the reference standard. Methods: We identified stroke and TIA events in inpatient and emergency department (ED) administrative data from eight regional stroke centres in Ontario, Canada, from April of 2006 through March of 2008 using ICD–10–CA codes for subarachnoid haemorrhage (I60, excluding I60.8), intracerebral haemorrhage (I61), ischemic (H34.1 and I63, excluding I63.6), unable to determine stroke (I64), and TIA (H34.0 and G45, excluding G45.4). We linked administrative data to the Ontario Stroke Registry and calculated sensitivity and positive predictive value (PPV). Results:: We identified 5,270 inpatient and 4,411 ED events from the administrative data. Inpatient administrative data had an overall sensitivity of 82.2% (95% confidence interval [CI95%]=81.0, 83.3) and a PPV of 68.8% (CI95%=67.5, 70.0) for the diagnosis of stroke, with notable differences observed by stroke type. Sensitivity for ischemic stroke increased from 66.5 to 79.6% with inclusion of I64. The sensitivity and PPV of ED administrative data for diagnosis of stroke were 56.8% (CI95%=54.8, 58.7) and 59.1% (CI95%=57.1, 61.1), respectively. For all stroke types, accuracy was greater in the inpatient data than in the ED data. Conclusion: The accuracy of stroke identification based on administrative data from stroke centres may be improved by including I64 in ischemic stroke type, and by considering only inpatient data.
Introduction: Previous investigations of the diagnostic accuracy of point-of-care ultrasound (POCUS) in distal radius fractures (DRF) report a wide range of sensitivities (71%-98%) and specificities (73%-100%) when performed by medical professionals, which may reflect inconsistencies in POCUS training or sonographer experience. The purpose of this study was to determine the accuracy of POCUS performed by pre-clerkship medical students with minimal POCUS training compared to standard radiography in diagnosing DRF in adult patients with traumatic wrist injuries, in order to assess POCUS as an alternative to traditional radiographic imaging. Methods: This prospective observational study was conducted from June to September 2015. The study population consisted of adults presenting to the emergency department (ED) with distal forearm pain secondary to traumatic injury within the past seven days and for whom radiographic imaging was ordered. Patients were evaluated using POCUS performed by medical students with no prior experience who had received one hour of POCUS training taught by an emergency ultrasound fellowship-trained ED physician. A pre-test probability of fracture was stratified as low or high and documented independently by the treating physician. Students were blinded to pre-test probability and radiography results. Results: Of the 52 patients enrolled, 18 had DRF diagnosed by radiographic imaging. Compared to radiography, student-performed POCUS had 72% overall sensitivity (95% CI, 47%-90%) and 85% specificity (95% CI, 69%-95%), with 81% overall accuracy. In the high pre-test probability group (N = 20), POCUS had 80% sensitivity (95% CI, 52%-96%) and 60% specificity (95% CI, 15%-95%). In the low pre-test probability group (N = 32), POCUS had 33% sensitivity (95% CI, 1%-91%) and 90% specificity (95% CI, 73%-98%). Conclusion: POCUS performed by medical students demonstrated reasonable success in diagnosing DRF, with overall sensitivity and specificity in keeping with published data. Within the low pre-test probability group, the diagnostic accuracy of POCUS suggests that ultrasound was an unreliable alternative to radiographic imaging for DRF in this cohort. Future analysis of the factors leading to DRF missed by POCUS as being related to adequacy of POCUS training, image capture, or sonographer experience will further explore the utility of POCUS as a diagnostic alternative.