In their response to natural disasters, emergency management agencies must have access to real-time information to respond to the situation. One potential tool is social media data analysis. In recent years, the usefulness of social media for public health surveillance and their use during natural disasters has been proposed. Reference Finch, Snook and Duke1–Reference Muniz-Rodriguez, Ofori and Bayliss5 Social media offers emergency management agencies a tool to communicate emergency information, warnings, and updates in their profiles using short messages, photos, and videos. Reference Finch, Snook and Duke1,Reference Kim and Hastak6 Content analysis and sentiment analysis can help classify information extracted from social media messages into different categories and help identify those in need of assistance and the geographical areas affected by an event. Reference Muniz-Rodriguez, Ofori and Bayliss5,Reference Adams, Raeside and Khan7,Reference Kiatpanont, Tanlamai and Chongstitvatana8
The possible roles of social media data analysis during natural disasters have been studied before. Reference Finch, Snook and Duke1,Reference Muniz-Rodriguez, Ofori and Bayliss5 Researchers used social media data analysis to study the content of shared posts during emergencies, identify user’s locations, develop mapping applications as a visual aid for emergency responders, and communicate emergency warnings. Reference Sherchan, Pervin and Butler3,Reference Kim and Hastak6,Reference Andrews, Gibson and Domdouzis9–Reference Tang, Zhang and Xu13 However, several limitations were identified in the analyses, including a low number of geolocated tweets, large datasets with a reduced number of natural disaster-related posts, and tweets being posted from areas not affected by the disaster. Reference Muniz-Rodriguez, Ofori and Bayliss5 Given such limitation, an imputation method was developed to impute Twitter user geolocations, using the social network connections of Twitter users and the accounts they follow. Reference Muniz-Rodriguez14
We applied such an imputation method to analyze the social media behavior of users who followed schools and school districts in Georgia during Hurricane Matthew. Based on the identified hashtags and information from the National Hurricane Center, Hurricane Matthew was selected as the case study to validate the imputation method used to impute the Twitter users’ locations. 15–17 Hurricane Matthew was a category 5 storm that affected the Caribbean islands, Georgia, and North and South Carolina from September 28 to October 9, 2016. The southwest and coastal regions of Georgia were heavily affected, recording winds from a category 2 hurricane.
This retrospective case study, using a secondary dataset, showcased how the imputation method mentioned above can be applied to impute Twitter users’ locations and its potential to facilitate the communication efforts of emergency responders if applied in real-time. This study aims: (1) to describe the topics and sentiment of Twitter users who follow schools’ and school districts’ accounts in Georgia before and during Hurricane Matthew; and (2) to evaluate the association between retweet frequency and topics posted by Twitter users during Hurricane Matthew.
Methods
Data Collection
The analysis uses secondary data from the social media platform Twitter, as described in Ahweyevu et al. Reference Ahweyevu, Chukwudebe and Buchanan18 Ahweyevu and collaborators downloaded publicly available public school and school districts data for the state of Georgia from the National Center for Education Statistics (NCES) (nces.edu.gov) and identified Twitter profiles for the schools and school districts. Reference Ahweyevu, Chukwudebe and Buchanan18 For details on the data collection process, refer to Ahweyevu et al. Reference Ahweyevu, Chukwudebe and Buchanan18
Missing Data Imputation
We developed a method to impute the information of location for Twitter users who do not share their self-reported locations in their profiles. Reference Muniz-Rodriguez14 A location at the Metropolitan Statistical Area (MSA) level, in Georgia, USA, was assigned to Twitter users who did not share any location or a real location. An MSA is defined as a region with a minimum of 1 community with at least 50,000 people. 19 There are 14 MSAs in Georgia. The public schools’ and public school districts’ Twitter accounts in these MSAs were identified. Reference Ahweyevu, Chukwudebe and Buchanan18 The imputation method used the follower- “followee” relation as a proxy to impute a location to users.
The total sample size was 27,598 followers from 53 school or district accounts. Reference Muniz-Rodriguez14 The analysis presented in this article used the sample of Twitter users and their imputed locations to explore their social media behavior during Hurricane Matthew.
Selection of Hurricane-Related Tweets
The hurricane-related tweets from users in the imputed sample were extracted by the keywords of “hurricane” and “hurricanes”. From a total of 26,274 hurricane-related tweets extracted, 3,753 tweets were posted during Hurricane Matthew, from September 28 to October 9, 2016. Three datasets were created to analyze the tweet content shared by the users. Dataset 1 comprised only those tweets considered original content posted by the users (1,679 tweets). Dataset 2 included tweets identified as retweets in the sample (2,033 tweets), and dataset 3 contained replies to tweets in the sample (41 tweets). Given its very small size, dataset 3 was excluded from further analysis.
Content Analysis
Content analysis was done to describe the topics mentioned by Twitter users who followed schools’ and school districts’ accounts in Georgia before and during Hurricane Matthew. The steps were repeated for original content tweets and retweets to assess the differences in content per type of Twitter post and for counties in the actual hurricane path. We implemented a probabilistic topic model known as the latent Dirichlet allocation (LDA) model, which is a Bayesian mixture model Reference Grün and Hornik20 to determine the importance of a term in the analyzed text corpus. Reference Blei21 The LDA model was trained using 90% of the dataset in this project, and the model was tested using the remaining 10% percent of data. Reference Ghatak22,Reference Kumar23 Before model fitting, the number of topics (k) was determined by running model simulations with k = 5 to k = 100 in the increment of 5 units, Reference Adnan, Yin and Jackson24 with 30 iterations, using the training datasets to assess the value of k. The optimal number of topics for dataset 1 (original tweets) and that for dataset 2 (retweets) were both 30 topics.
Sentiment Analysis
Sentiment analysis was applied to describe the sentiment of Twitter users who followed schools’ and school districts’ accounts in Georgia before and during Hurricane Matthew. A lexicon-approach method was implemented to calculate the average sentiment of words in the tweets. Reference Silge and Robinson25 Two different lexicon libraries, Afinn and Bing, Reference Silge and Robinson25 were compared in their evaluations in a preliminary analysis and the Afinn lexicon was found to be the more preferred library and thus the following analysis used the sentiment scores based on Afinn. Next, general descriptive frequencies were studied for original tweets and retweets. Finally, the overall changes in sentiment scores were plotted over time.
Hurdle Regression Model to Evaluate the Association Between Retweet Frequency and Tweet Topics
We fitted hurdle regression models to evaluate the association between retweet frequency and topics posted by Twitter users during Hurricane Matthew. The response variable, the number of retweets a tweet received, was analyzed in association with the independent variable topic categories obtained from content analysis and US Census demographic data as covariates. 26 The hurdle model was divided into 2 components. The first part was a zero-mass component model that determined the chance of having a zero number of retweets. The second part of the model was a truncated Poisson model that considered only the positive retweet counts to determine the likelihood ratio of having higher number of retweets. Reference Love27,Reference Rodriguez28 The level of significance was specified as 0.05 a priori.
Results
Descriptive Statistics
Hurricane-related tweets were identified through their hashtags (n = 168,184). “Hurricanemaria” (n = 16,346; 0.10%), “Hurricaneharvey” (n = 12,728; 0.08%), and “Hurricanematthew” (n = 11,508; 0.07%) were identified as the 3 most common hurricane-related hashtags in the tweets collected from followers of schools and school districts in Georgia. Observing tweet frequency and time of posting, our analysis focused on major hurricanes in the Atlantic region and those that directly affected the state of Georgia (Supplementary Materials, Figure S1).
Description of the Topics and Sentiment of Tweets From Users Who Followed Schools’ and School Districts’ Accounts in Georgia Before and During Hurricane Matthew
The topics identified by the LDA model in each dataset were manually categorized into 10 different categories (Table 1). The top 3 categories of tweets were “awareness,” “preparedness,” and “call for help or action” for original tweets and retweets datasets (Supplementary Materials, Table S6). Users in the Hinesville MSA, 1 of the MSAs in the hurricane path, posted the highest number of original tweets related to preparing for the weather event (Supplementary Materials, Figure S4).
When focusing on the emergency cycle phases, it was found that most original tweets were posted during the preparedness phase of the emergency response cycle and were mainly associated with content categories “preparedness,” “awareness,” and “call for action or help.” Original tweets posting frequency decreased during the response phase, but high numbers of “awareness” tweets and “call for help or action” tweets were found. Compared with prior phases, the response phase saw the least number of tweets captured in the dataset, however, the “awareness” category as the most identified one (Figure 1). When focusing on the retweets during Hurricane Matthew, it was observed that all categories had a higher number of tweets during the emergency cycle’s preparedness phase than other phases, with “awareness,” “call for help or action,” and “preparedness” as the 3 most common categories (Figure 2).
Analysis of tweet count by MSA during Hurricane Matthew reflected a spike in tweet frequency was observed near the end of the preparedness phase of the emergency response cycle for all MSAs. Original tweet signal decreased as the response phase started, with the lowest number of original tweets detected during the recovery phase for all MSAs. Savannah and Hinesville MSAs had the highest number of original tweets during the recovery phase (Table 1).
The sentiment changes throughout all phases of the emergency response cycle presented a decrease in sentiment value, accompanied by a decline in the number of Twitter posts related to Hurricane Matthew. On September 28, both original tweets and retweets reflected a positive sentiment score. On this day, the National Hurricane Center declared the development of the weather event as a tropical storm Matthew. 29 Overall, among both original tweets and retweets, an increase in negative sentiment through the preparedness phase was observed with a change to an increase in positive sentiment during the response phase. As the day of landfall in Georgia approached, negative sentiment values increased. The days after hurricane landfall, overall sentiment started to show more positive values for original tweets and retweets (Supplementary Materials, Figure S5; Figure S6).
Hurdle Regression Model to Evaluate the Association Between Retweet Frequency and Content Categories Posted by Twitter Users During Hurricane Matthew
A multivariable hurdle regression model was adjusted for confounding variables to evaluate the association between retweet frequency and Twitter content categories (Table 2). The logistic model component presents the adjusted odds ratio (aOR) of a tweet being retweeted; the truncated Poisson model component presents the adjusted risk ratio (aRR) of retweet count if retweeted. As seen in Table 3, compared with tweets in the preparedness category, tweets in the hurricane damage category were less likely to be retweeted (aOR: 0.84; 95% confidence interval [CI], 0.63, 1.12); however, if retweeted, they were retweeted 53% more (aRR: 1.53; 95% CI: 1.52, 1.53). Likewise, tweets in the awareness category were less likely to be retweeted (aOR: 0.83; 95% CI, 0.69, 1); however, if retweeted, they were retweeted 74% more (aRR: 1.74; 95% CI, 1,74, 1.74). Similarly, tweets “calling for help” were 30% less likely to be retweeted (aOR: 0.7; 95% CI: 0.57, 0.85); if retweeted, the retweet count was estimated to increase by 1.62 (95% CI: 1.61, 1.62) compared with tweets in the preparedness category. Location is important when studying Twitter behavior. If the user who posted the tweet was in Hurricane Matthew’s path, their tweet’s probability of being retweeted was reduced by 5% (aOR: 0.95; 95% CI: 0.75, 1.19), and if it was retweeted, its retweet count was reduced by 89% (aRR: 0.11; 95% CI: 0.11, 0.11) (Table 3).
Note: The timeframe for each response phase was determined based on the reviewed literature, the emergency cycle phases, and the official FEMA incident period for Hurricane Matthew in Georgia (October 4, 2016, to October 15, 2016). Reference Muniz-Rodriguez, Ofori and Bayliss5,30,31
Abbreviation: RT, retweet.
Note: The timeframe for each response phase was determined based on the reviewed literature, the emergency cycle phases, and the official FEMA incident period for Hurricane Matthew in Georgia (October 4, 2016, to October 15, 2016). Reference Muniz-Rodriguez, Ofori and Bayliss5,30,31
Abbreviations: aOR, adjusted odds ratio; aRR, adjusted relative risk; CI, confidence interval; REF, reference category.
When we stratified our data by the phase of the emergency cycle, our results demonstrated that the timing of the tweet (in terms of the phase of the emergency cycle) was an important factor to consider in social media analysis for emergency response. If a tweet was posted during the preparedness phase and was published in the path of the hurricane, it was 1.16 (95% CI: 0.85, 1.58) times as likely to be retweeted, and if the post was retweeted, being posted from the hurricane path reduced the retweet count by 91% (aRR 0.09; 95% CI: 0.09, 0.09). During the preparedness phase of the emergency cycle, the retweet count of tweets in the “damage” category, if retweeted, was 73% more than tweets in the preparedness category (aRR, 1.73; 95% CI, 1.72, 1.73); the retweet count for tweets in the “call for help or action” category was estimated to increase by 1.40 (95% CI: 1.39, 1.40) compared with tweets in the preparedness category when retweeted. Also compared with the retweet count of tweets retweeted in the preparedness category, if retweeted, tweets posted in the warning category was 1.89 (95% CI: 1.88, 1.89) times in their retweet count, those in the shelter category was 1.82 (95% CI: 1.81, 1.82) times in their retweet count, and those in the emotion or religious categories was 1.58 (95% CI: 1.58, 1.59) times in their retweet count (Table 4). When analyzing the same model with tweets only posted during the response phase of the emergency cycle, it was observed that those users who resided in counties in the path of the hurricane were 9% less likely (aOR: 0.91; 95% CI: 0.63, 1.32) of being retweeted, and if retweeted, the retweet count was lowered by 74% (aRR: 0.26; 95% CI: 0.26, 0.26) (Table 5).
Note: The timeframe for each response phase was determined based on the reviewed literature, the emergency cycle phases, and the official FEMA incident period for Hurricane Matthew in Georgia (October 4, 2016, to October 15, 2016). Reference Muniz-Rodriguez, Ofori and Bayliss5,30,31
Abbreviations: aOR, adjusted odds ratio; aRR, adjusted relative risk; CI, confidence interval; REF, reference category.
Note: The timeframe for each response phase was determined based on the reviewed literature, the emergency cycle phases, and the official FEMA incident period for Hurricane Matthew in Georgia (October 4, 2016, to October 15, 2016). Reference Muniz-Rodriguez, Ofori and Bayliss5,30,31
Abbreviations: aOR, adjusted odds ratio; aRR, adjusted relative risk; CI, confidence interval; REF, reference category.
Discussion
This case-study incorporates the results from a new imputation method of Twitter users’ locations Reference Muniz-Rodriguez14 into a retrospective analysis of Hurricane Matthew-related Twitter corpus. The analysis identified higher tweet frequency in the preparedness phase and a decline in tweets after the response phase. Also, the results showed that tweets posted by those in the actual path of the hurricane and those in low-income areas were less likely to be retweeted, presenting a challenge if help is needed in these areas. Our results highlight the strengths and limitations of Twitter data analysis for public health emergency response.
The literature suggests that less than 1% of Twitter users share their exact geolocations with geographical coordinates and that users with privacy settings share their location when they feel safe. Reference Fu, White and Chan4,Reference Liang and Shen32 The lack of geolocated data presents a challenge for public health agencies interested in harvesting social media information for emergency response purposes. Our analysis uses the locations of schools and school districts with Twitter accounts as a proxy for user location, imputing the location of 67.0% of the sample. Reference Muniz-Rodriguez14 Public health agencies can use this newly available information to understand the needs, worries, and awareness of individuals residing in the MSA included in our analysis.
This study analyzed Twitter data and observed its possible uses as a tool by emergency response agencies during the preparedness and response phases. The “awareness” category was identified as the most frequent category in both original (37.64%) and retweeted (37.0%) content associated with Hurricane Matthew. The majority of tweets in the “awareness” category were related to weather information pertinent to Hurricane Matthew. The identification of the “awareness” category as the most common content category in the sample was consistent with findings of social media data analysis during flooding and earthquake events. Reference Muniz-Rodriguez, Ofori and Bayliss5,Reference Andrews, Gibson and Domdouzis9,Reference Grasso and Crisci33–Reference Kryvasheyeu, Chen and Obradovich35 Other common content categories were “preparedness” and “call for help or action.” A higher number of retweets from the “damage” category were detected during the response phase than the preparedness phase. An increase in negative sentiment as the hurricane approached the state was observed in the results. A similar pattern was observed during Hurricane Sandy. Reference Zou, Lam and Cai36 A change to more positive sentiment, expressing hope through religious language, was detected after landfall.
The analysis identified a higher number of original tweets and retweets pertinent to Hurricane Matthew during the preparedness and response phases than the other cycle stages, with tweets peaking days before the hurricane landfall. Similar to the results found by other social media researchers, a low number of tweets were posted after landfall and during the recovery phase in our sample. Reference David, Ong and Legara37,Reference Kim, Bae and Hastak38 It is understood that the low number of tweets found during the recovery and mitigation phases establishes that Twitter does not present as a viable tool to study for long-term follow-up of areas affected by natural disasters. Previous research found that most social media communication from emergency management agencies is 1-sided, meaning the agency does not interact with their followers. Reference Muniz-Rodriguez, Ofori and Bayliss5,Reference Tang, Zhang and Xu13 The increased number of tweets observed during the preparedness phase of the emergency can represent an increased awareness of the event, and public health professionals can take this opportunity to perform communication campaigns to help alleviate the information gap.
Retweeted content can help information go viral, and their role in social media communication strategies has been studied. For example, Liang et al. found that on Twitter, Ebola-related information primarily reached a user’s followers (the “broadcast model”). To make a tweet retweeted beyond the immediate group of followers, having individuals who have many followers (such as celebrities) to retweet a public health agency’s tweet may be a key. This suggests that the identities of Twitter users and their followers can influence the reach of a tweet. Reference Liang, Fung and Tse39 This study did not find that celebrities were the most retweeted accounts in our sample; instead, individual personal accounts were more frequently retweeted, contrary to other studies. Reference Tang, Zhang and Xu13,Reference Liang, Fung and Tse39,Reference Comunello, Parisi and Lauciani40 Higher Twitter activity levels were observed in geographical areas (MSAs) outside of the hurricane path, contrary to other studies. Reference Grasso and Crisci33,Reference Zahra, Ostermann and Purves41 Twitter users outside the hurricane path and those in the hurricane path posted tweets related to “awareness” and “call for action or help,” which can be driven by the news cycle and proximity of the storm. Reference Zou, Lam and Cai36 Users in the hurricane path are less likely to be retweeted than those outside the hurricane path. Therefore, the development of a content analysis guide for training is highly recommended. For example, it may include a step-by-step checklist to complete the analysis, what questions can be answered, and specialists that can assist in the analysis if necessary.
The regression modeling results suggested no evidence to support the hypothesis that higher levels of hurricane-related Twitter activity are associated with the actual hurricane path. During the emergency response phase, the results demonstrated that original tweets that were retweeted from low-income areas had an increased retweet count as the poverty percentage in the area increased (albeit statistically insignificant). This can help emergency responders quickly identify those that could have been heavily affected by the event.
Public Health Implications
This case-study demonstrates that retrospective Twitter data analysis can provide emergency response agencies with insights into the needs of social media users who might be affected by natural disasters. However, it is important to recognize that the analysis is time-consuming. It is difficult to make all the data identification, data cleaning and processing, and content and sentiment analyses in real-time. Therefore, to apply this type of analysis in practice, it is recommended to conduct data verification before the start of the Atlantic hurricane season or during the planning phase of emergency management agencies to avoid delays in the emergency communication response.
Strengths and Limitations
There are several limitations to this study. The results are not generalizable to the general population of the state of Georgia. The findings only apply to Twitter followers of schools and school districts in our sample. Also, the user locations analyzed in our study were based on the locations of the schools or school districts they followed. We were not able to verify the veracity of the locations at the time of the analysis. Results were based on post frequency; network analyses for information dissemination were not conducted.
Public health researchers previously employed the dataset used in this case study to detect unplanned school closures, establishing the social media platform’s usefulness to detect a higher number of school closures than the current systems. Reference Ahweyevu, Chukwudebe and Buchanan18,Reference Jackson, Mullican and Tse42 The analysis presented in this research project gave an existing dataset a new purpose, demonstrating how we can repurpose public health datasets from 1 field into a completely new area.
Conclusions
In times where social media is a core component of public health interventions, emergency response should not be the exception. Despite not being able to pinpoint a location if the social media user does not share coordinates, our results showed that our imputation method could help impute users’ geolocations and, thereby, through Twitter data analysis, help provide an overview of the situation in areas affected by natural disasters. It can help understand the needs of social media users in at-risk areas before the event takes place. Future research to further test the imputation method should focus on official emergency response agencies’ pages and their followers.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/dmp.2022.285.
Acknowledgment
This manuscript is part of K.M.R.’s doctoral dissertation project, titled “Social Media Data Analysis, a Tool for Public Health Emergency Management During Natural Disasters,” Fall 2020. K.M.R. thanks the Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, for her graduate assistantship. The authors thank Bryan Omar Sepulveda Bahamundi for his work and assistance in the content analysis portion for this project.
Funding
No external funding reported.
Conflict of interest
No conflict of interest declared.
IRB statement
This research project was approved by the Georgia Southern University Institutional Review Board with a B2 exemption category for the use of Twitter data with project number H15083.