CLINICIAN'S CAPSULE
What is known about the topic?
Traditional conference evaluations might be limited in evaluating conference knowledge translation.
What did this study ask?
How do Twitter metrics complement traditional conference evaluation.
What did this study find?
There is no correlation between Twitter metrics and traditional evaluation, but Twitter metrics could be a complementary tool to evaluate conference knowledge dissemination.
Why does this study matter to clinicians?
Clinicians might wish to adopt Twitter to receive disseminated conference content.
INTRODUCTION
Medical conferences are the largest venues for continuing medical education. In addition to networking, they aim to disseminate and advance research, and enhance practice through education.Reference Davis1,Reference Ioannidis2 In 2018, 1,303 health and medical conferences were held in the United States, with another 118 in Canada.3 However, they change physician knowledge or patient outcomes minimally.3–Reference Forsetlund, Bjørndal and Rashidian5 Conferences are evaluated by traditional evaluation metrics, but they are inadequate to assess knowledge translation for several reasons. Traditional evaluation metrics mostly focus on social experience, overall satisfaction, and processes instead of learning outcomes or commitment to change.Reference Domino, Chopra, Seligman, Sullivan and Quirk6 They aggregate individual responses and diminish potentially diverse input.Reference Haustein, Peters, Sugimoto, Thelwall and Larivière7,Reference Chapman, Aalsburg Wiessner, Storberg-Walker and Hatcher8 With 30% to 40% response rates, there are also questions about their quality and utility.Reference Neves, Lavis and Ranson9 Overall, they provide little insight into how much conference research knowledge is disseminated into practice.Reference Maloney, Tunnecliff and Morgan10
Given the emerging prevalence of social media in this digital age, perhaps there are alternative metrics to complement traditional evaluation metrics and provide greater insight into the how conference knowledge disseminates. Twitter, an online micro-blogging service launched in 2006, enables people to communicate in real-time by means of limited character entries (initially 140, now expanded to 280), called “tweets.” These tweets can be repeated by anyone reading them, known as “retweets.” Twitter aligns with social-constructivist pedagogy of seeking, sharing, and collaborating.Reference Chapman, Aalsburg Wiessner, Storberg-Walker and Hatcher8,Reference Maloney, Tunnecliff and Morgan10 It has transformed communication in journalism, public health research, and emergency response. Academics use Twitter to obtain and share real-time information, expand professional networks, and contribute to wider conversations.Reference Mohammadi, Thelwall, Kwasny and Holmes11 Educators use Twitter as virtual communities of practice, sharing innovations and feedback informally despite brief interactions.Reference Rosell-Aguilar12–Reference Choo, Ranney and Chan14 Furthermore, Twitter metrics have been compared with other traditional metrics in knowledge translation for published articles. They can predict highly cited articles within the first 3 days of publication. They correlate significantly with traditional bibliometric indicators (readership, citations) in some journals.Reference Haustein, Peters, Sugimoto, Thelwall and Larivière7,Reference Neves, Lavis and Ranson9,Reference Amath, Ambacher, Leddy, Wood and Ramnanan15 They also correlate with citations at Google Scholar™ (a free online search engine for scholarly literature including articles, theses, books, abstracts). Twitter metrics predict top-cited articles with 93% specificity and 75% sensitivity.Reference Neves, Lavis and Ranson9,Reference Eysenbach16
In medical conferences, Twitter has been shown to disseminate research knowledge.Reference Mishori, Levy and Donvan17–Reference Scott, Hsu, Johnson, Mamtani, Conlon and DeRoos26 Attendees tweet presented results.Reference Chaudhry, Glode, Gillman and Miller22 Those absent from the conference read these tweets, and some choose to retweet to their followers on Twitter, creating a second tier of knowledge dissemination. Some might add their own comment in the retweets. This retweeting can continue for many tiers to diffuse knowledge.Reference McKendrick, Cumming and Lee21 With increasing Twitter activity in conference in recent years, some authors encouraged conference organizers to encourage Twitter use for maximum effectiveness.Reference Gricks, Woo and Lai27 Aside from content broadcasting, Twitter might be useful for evaluation and community discussion. With this in mind, authors have suggested that Twitter could be a novel real-time speaker impact evaluation toolReference Roland, May, Body and Carley23 (see Table 1). Perhaps Twitter metrics can complement what is frequently missed by traditional evaluation metrics. To our knowledge, there has not been a study that compares Twitter metrics to traditional evaluation metrics as a speaker impact evaluation tool. We, therefore, asked:
1. Do Twitter metrics correlate with traditional evaluation metrics?
2. How do these two metrics measure speaker impact differently?
METHODS
This study used a retrospective design. The hashtag (a metadata tag in Twitter) #CAEP14 was prospectively registered with Symplur, an online Twitter management tool, so that all tweets bearing the hashtag #CAEP14 (Canadian Association of Emergency Physicians conference 2014) were archived. Attendees were encouraged to tweet using this hashtag. All tweets that date from the start date of the conference to 30 days afterward were collected.
Two authors (S.Y. and S.D.) independently assessed each tweet for inclusion. Table 2 described the inclusion and exclusion criteria of each tweet.
We developed a classification system (see Table 3) to differentiate original tweets from retweets, and tweets that generated further discussion. All researchers discussed and agreed upon a coding scheme. Two authors assessed and coded the first 200 tweets together to ensure a uniform approach to coding, and independently coded the remaining tweets. All tweets were revisited and coding discrepancies were resolved by consensus among the study team. Manual coding was chosen based on previous literature, because content analysis is hampered by brevity and unconventional forms of written expression.Reference Kim, Hansen, Murphy, Richards, Duke and Allen28
With the consent of CAEP, all available conference speaker evaluations from the CAEP 2014 conference were collected confidentially. One author (S.Y.) reviewed conference speaker evaluations, and abstracted the value corresponding to the rating item “The speaker was an effective communicator.” We chose this because it was the only one item specific to a speaker rather than the whole session (in which multiple speakers would speak). We believed this was the most appropriate unit as the tweets were aimed at specific speaker and not the whole session.
We calculated descriptive statistics using proportions, means, or medians with standard deviations of interquartile ranges (IQRs) as appropriate. Means and proportions were compared using t-tests with reported p-values. Linear correlation analyses, using Pearson's R, were calculated to compare conference evaluation scores and Twitter metrics.
To quantify the impact of tweets, we proposed a novel theory-based Twitter Discussion Index. This index included all original tweets (Class 1 – Disseminating tweets and Class 2 – Engagement tweets) and retweets (Class 1R and Class 2R). We have defined the Twitter Discussion Index to be as follows:
We weighted the components of this equation based on our lens of Twitter as aligned with social-constructivist pedagogy.Reference Chapman, Aalsburg Wiessner, Storberg-Walker and Hatcher8,Reference Maloney, Moss and Ilic29 Through Class 1 tweets, users construct and disseminate their own understanding. Class 2 tweets create important networks of interactivity and virtual communities of practice, and are vital to further layers of discussion. Nonoriginal retweets (Classes 1R and 2R) are endorsements and amplify the message spread.
RESULTS
Description of Twitter data
In total, 3,804 tweets contained the hashtag #CAEP14, and fell within the prescribed date range. Of these, 2,419 (63.59%) were included. The others were excluded as they had no relation to content. (Examples of excluded tweets: personal communications, logistics about room locations, reminders for upcoming sessions.) Forty-eight percent (48%) of sessions received at least one tweet (mean = 11.7 tweets; 95% CI of 0 to 57.5; range, 0–401). Included tweets were classified as follows: 634 (26.21%) were Class 1; 1,276 (52.75%) were Class 1R; 190 (7.85%) were Class 2; and 319 (13.19%) were Class 2R (Table 4).
Plenary sessions and sessions that encouraged audience input received a much higher number of tweets (mean = 219.8 tweets, overall conference mean per session = 11.7 tweets).
Comparison between Twitter metrics and traditional evaluation metrics
In this conference, there were 274 sessions total including standard presentations (111), posters (71) and abstracts (92). Only the 111 standard presentations have traditional evaluation metrics for attendees to fill. Within these 111 standard presentations, 85 (76.58%) received traditional evaluation metrics, and 71 (63.96%) received tweets.
Fifty-seven (57 of 111; 51.35%) standard presentations received both traditional evaluation metrics and tweets. Of the 26 standard presentations with no traditional evaluation metrics, 14 received tweets. Of the 40 standard presentations that received no tweets, 14 received traditional evaluation metrics.
In all sessions (including standard presentations, posters, and abstracts), 48% (131 of 274) received tweets (mean = 11.7 per session). For the posters and abstracts when no traditional evaluation metrics were available, 28% (20 of 71) moderated posters and 44% (40 of 92) posters or oral abstracts received tweets (see Figure 1).
In sessions with both tweets and evaluation scores (n = 57) there was no significant correlation between the number of tweets (any Class), Twitter Discussion Index, and the evaluation scores. The median traditional evaluation metrics score was 3.61 of 5, IQR of 3.4 to 3.7 (see Figure 2).
DISCUSSION
Interpretation of findings
Medical conferences are venues designed to bridge the gap between research and practice,Reference Davis1 but static traditional evaluation metrics are not designed to assess knowledge. Given the emerging prevalence of social media in this digital age, we sought to study whether Twitter metrics can complement traditional evaluation metrics and provide greater insight how medical conferences translate knowledge.
We found no correlation between traditional evaluation metrics and Twitter. This is not due to discordant results between the two (such as highly tweeted session receiving low scoring traditional evaluation metrics). Rather, we were unable to correlate the measures due to a lack of traditional evaluation metrics and a narrow range of scores in those available (median score of 3.61 out of 5 with IQR of 3.4 to 3.7).
While tweets were generated in a similar rate in sessions with traditional evaluation metrics, a substantial percentage of sessions without traditional evaluation metrics (by design before conference) generated tweets. Tweets displayed a wide variation of total number among sessions. Most tweets focused on knowledge content rather than logistics or processes, similar to previous studies.Reference Nomura, Genes, Bollinger, Bollinger and Reed19,Reference Neill, Cronin, Brannigan, O'Sullivan and Cadogan25 The majority of tweets (78.95%) disseminated content (Class 1 and Class 1R), and the rest (21.04%) sparked further debate as “discussion” tweets (7.9% were “discussion” tweets and 13.2% were retweets of those).
Comparison to previous studies
We found that our Twitter metrics are similar to previous conferences.Reference Nomura, Genes, Bollinger, Bollinger and Reed19,Reference Chaudhry, Glode, Gillman and Miller22,Reference Jalali and Wood24,Reference Neill, Cronin, Brannigan, O'Sullivan and Cadogan25 These metrics might provide information about knowledge translation and dissemination that traditional evaluation metrics lack. These qualities are derived from the nature of the social media platform: real-time, accessible, searchable, and focused on knowledge acquisition and transfer. We will discuss these qualities below.
Twitter metrics are real-time. Didactic conference sessions suggest a single focus of attention, and restrict individuals to the role of either speaker or listener.Reference Ross, Terras, Warwick and Welsh20 Feedback, collaboration, and interaction are often missing.Reference Reinhardt, Ebner, Beham, Costa, Hornung-Prahauser and Luckmann30 By contrast, Twitter engages. Discussion typically involves active debate, despite character limit.Reference Scanfeld, Scanfeld and Larson31,Reference Jalali, Sherbino, Frank and Sutherland32 Desai advocated that real-time feedback might be less subjected to bias than traditional evaluation metrics.Reference Desai33 Twitter feedback may improve presentation quality, particularly if speakers were informed of the need for clear key messages. Even though higher Twitter metrics might reflect more knowledge translation and dissemination, there are caveats. Elements such as audience size and content that is extreme, shocking, humorous, or controversial would change tweets.Reference Choo, Ranney and Chan14 Tweets can also be an echo chamber of comments representing shared opinion rather than knowledge translation.Reference Choo, Ranney and Chan14 Others cautioned the risk of sensationalism inflating the tweet numbers, or presenters “sterilizing” down their content for risk of being misquoted or misinterpretated.Reference Roland, May, Body and Carley23 As there is little anonymity on Twitter, it might deter those writing negative feedback. Retractions and errata from authors are rare and might be unnoticed on Twitter.Reference Choo, Ranney and Chan14 Content analysis of tweets might mitigate these risks in future studies, but no formal process of fact checking is in place as of yet.
Traditional evaluation metrics are reviewed by organizers and speakers only, while Twitter is accessible to a community. Tweets are archived and searchable, making it an attractive feature for future reference. Contrary to other social media sites (e.g., Facebook, private institution site), Twitter reaches further than a specific group. Rather, posted messages are public by default. They can be searched and tracked by hashtags. Each Twitter user can create public posts to initiate discussions and to participate in debates.Reference Bruns and Stieglitz34,Reference Boyd, Golder and Lotan35 Bakshy et al. discovered that tweets tend to propagate in a power law distribution, with a small number of tweets being retweeted thousands of times.Reference Bakshy, Hofman, Mason and Watts36 These retweeters are the key to wide knowledge dissemination.Reference Mishori, Levy and Donvan17 Because the retweeters do not need to be present in the conference, the impact is not dependent on the size of conference attendees. While traditional evaluation metrics focused on satisfaction and reactions instead of learning, tweets are largely about learning points, aligning with just-in-time learning and knowledge transfer. Also, despite the small character number of tweets, they were often robust and clinically relevant.Reference Chaudhry, Glode, Gillman and Miller22,Reference Brown, Riddell, Jauregui, Yang, Nauman and Robins37,Reference Riddell, Brown, Robins, Nauman, Yang and Jauregui38 It is also possible that tweets can impact long-term retention by mechanisms such as retrieval practice (from the audience tweeting), feedback (from correcting others on Twitter), and spaced repetitions (from tweets and retweets).Reference Butler and Raley39
STRENGTHS AND LIMITATIONS
Our study is the first one that compares traditional evaluation metrics with Twitter metrics, and we had a novel way to differentiate between different levels of contemporaneous tweets.
We captured only tweets that bore the conference hashtag (#CAEP14). It is possible that there were related tweets without it or that some bore the wrong hashtag. As a result, these tweets might have been missed. In addition, the Twitter Discussion Index does not discriminate between positive tweets and negative tweets. In previous studies, Twitter metrics can also be a marker of strong disagreement, research error, or frank misconduct.Reference Carpenter and Cone40 Even though we encountered no tweets with strong sentiment of disagreement in our study, this is a potential limitation of the Twitter Discussion Index.
Also, this study only has data from one conference during a single year, limiting its conclusion.
RESEARCH IMPLICATIONS
Given that Twitter could be an informative metric, we propose that our Twitter Discussion Index be treated as a measure of “disseminative impact”, similar to published articles generating “buzz” with altmetrics leading to high citations.Reference Trueger, Thoma, Hsu, Sullivan, Peters and Lin41 It might be used to complement traditional evaluation metrics in future conference evaluation as a key performance indicator of engagement and impact.Reference Neiger, Thackeray and Van Wagenen42
CONCLUSION
Traditional evaluation metrics are inadequate to evaluate medical conference presentations for knowledge translation. Tweets by conference attendees could amplify knowledge translation and dissemination. Tweets are real-time, accessible, searchable, and describe knowledge transfer. We found Twitter metrics a more nuanced evaluation tool that complements traditional evaluation metrics. We propose a novel index for the use of this tool. We recommend conference organizers to adopt Twitter metrics and Twitter Discussion Index as a measure of knowledge translation and dissemination.
Acknowledgements
The authors thank the Board of CAEP for the use of their conference evaluation results. Author contributions: S.Y. conceived the idea for the research. S.Y., S.D., and J.R.F. each contributed to the design of the study. The study was executed by S.D. and S.Y. AJ led the analysis along with SY, SD, and ACL. S.Y. wrote the first draft of the manuscript, and subsequent versions were reviewed, commented on, and revised critically by all authors. All authors approved the final manuscript for submission.
Competing interest
None declared.
Financial support
This research received funding from the Department of Emergency Medicine at The Ottawa Hospital.