The novel coronavirus disease 2019, COVID-19, impacted the daily lives and careers of millions, resulting in a flood of information and intense dialogue. Along with the public health crisis, the pandemic triggered economic and social disruption. In the United States, conversation surrounding the virus was also marred by political polarization. It is vital for governments and public health agencies to understand the nature of the public discourse surrounding COVID-19 to guide educational campaigns and inform public policy research.
Traditionally, stance has been evaluated with surveys, but there are several shortcomings (ie, high costs, poor response rate, limited sample size, dishonest answers, and closed questions). The growing flow of information on the Internet, commonly known as Big Data, provides a new resource for meaningful insights in the digital age. Athique (2020) notes “[t]here has never been a time in which media systems have been able to convey such detailed and universal coverage of a historical event in real time, with the added capacity to keep us all in touch and to give us a voice too.” Reference Athique1 Big Data, unlike survey research, relies on structuring large volumes of user-generated data. “Big Data allows us to finally see what people really want and really do, not what they say they want and say they do.” Reference Stephens-Davidowitz2 Sources like social media and search engines have become powerful tools for analyzing real-time changes in public attitude.
Social media houses much of the sharing and consumption of news and information in the modern media environment. The demographics of users on apps like Facebook, Instagram, Twitter, and WhatsApp have historically been characterized by a younger audience, but social media platforms have lately become more representative of the general population. The past decade has seen a 2-fold increase in ages 50 and older who report using at least 1 app. 3 The growth of social media has also seen a decrease in the number of people who look to traditional media outlets for news. Two-thirds of American adults say that they “often” or “sometimes” use social media for news and approximately 1 in 5 say that it is their primary source of news. Reference Infield4,Reference Shearer and Matsa5 Twitter was a significant platform for sharing and responding to public health information and misinformation during the COVID-19 pandemic.
People on Twitter tend to be more news-focused than those on other platforms. Roughly three-quarters of Twitter users find their news on the site and two-thirds of users describe Twitter as “good” or “extremely good” for sharing health news. Reference Shearer and Matsa5 Rufai and Bunce Reference Rufai and Bunce6 remark that Twitter is a “powerful public health tool for world leaders to rapidly and directly communicate information on COVID-19 to citizens”. On the other hand, Shahi et al. Reference Shahi, Dirkson and Majchrak7 assert that more than 4 in 5 tweets may contain false claims. Due to the high volume and velocity of data production on social media, there is a reduced ability to distinguish facts from noise. Roozenbeek et al. Reference Roozenbeek, Schneider and Dryhurst8 state that “increased susceptibility to misinformation negatively affects people’s self-reported compliance with public health guidance about COVID-19, as well as people’s willingness to get vaccinated against the virus and to recommend the vaccine to vulnerable friends and family.” Infield Reference Infield4 also maintains that American adults who rely on social media as their primary source of information were the most likely to believe misinformation, and the least engaged and least knowledgeable of current events. The confusing nature of information-sharing on social media may have resulted in individuals misinterpreting or disregarding public health data.
Google search data provides useful insights into understanding the discourse around COVID-19. “People’s search for information is, in itself, information.” Reference Stephens-Davidowitz2 Google Trends measures Web-based interest in topics by collating search data. “Google Trends has served and still serves as an excellent tool for infoveillance and infodemiology… newspapers and newscasts can influence Web queries, it provides a way to quantify the Web interest in a specific topic more efficiently than any other methods historically used (eg, population surveys).” Reference Rovetta9 A total of 83% of Americans use Google as their main search engine, making Google the most popular search engine in the United States. Reference Purcell, Brenner and Rainie10 Due to its widespread usage in the United States, Web-based interest is an important factor in studying COVID-19 discourse—providing an insight into the size of the conversation about the pandemic.
With Twitter and Google Trends, a predictive model was developed for sentiment analysis with historical COVID-19 data, such as cases and deaths, through a machine learning approach. With the rapid spread of misinformation during the pandemic, it remains to be known how COVID-19 health and policy information impacted changes in public opinion.
Literature Review
Twitter is a valuable source of big data due to its accessibility, widespread usage, availability of open-source code, and unidirectional structure. Reference Bossetta11 COVID-19 discourse has recently been examined on Twitter by means of frequency analysis of likes, comments and retweets, word-cloud mapping, stance detection, sentiment analysis, and network modeling. Reference Rufai and Bunce6,Reference Tsai and Wang12–Reference Fuentes and Peterson14 A growing body of researchers have shown that sentiment analysis and topic modeling can be used to successfully investigate emotions and sentiment using natural language processing. Reference Hu, Wang and Luo13,Reference Schweinberger, Haugh and Hames15–Reference Lyu and Luli17 Schweinberger et al. Reference Schweinberger, Haugh and Hames15 chose to model topics and sub-topics across different phases of the pandemic. Singh et al. Reference Singh, Bansal and Bode18 demonstrated that Twitter conversations may be used to predict the spread and outbreak of COVID-19. Hu et al. Reference Hu, Wang and Luo13 and Hussain et al. Reference Hussain, Tahir and Hussain16 generated word clouds, analyzed the geo-temporal patterns of Twitter sentiment related to COVID-19, and linked changes in sentiment to key events and topics. Ahmed et al. Reference Ahmed, Rabin and Chowdhury19 also generated word clouds and conducted a sentiment analysis to study the effects of lockdown and reopening procedures.
Google Trends is commonly used in conjunction with Twitter and/or health data for health research. For the MERS outbreak in 2015, Shin et al. Reference Shin, Seo and An20 found high correlations between the number of confirmed MERS cases and Twitter sentiment and Google interest. For the COVID-19 pandemic, Diaz and Henriquez Reference Díaz and Henríquez21 compared Twitter sentiment and Google interest with fluctuations in the stock market and number of people under lockdown. Mavragani and Gkillas Reference Mavragani and Gkillas22 investigated the relationship between Google Trends data and COVID-19 cases and deaths. Turk et al. created a predictive model for COVID-19 cases using Google Trends and virtual consultation data. Alshahrani and Babour Reference Alshahrani and Babour23 used Twitter and Google Trends to analyze search behaviors and predict new COVID-19 cases.
Zhang et al., Reference Zhang, Saleh and Younis24 furthermore, demonstrated that machine learning, specifically a unigram random forest (RF) model, is a powerful tool to predict coronavirus sentiment. RF regression models tend to outperform classical approaches in analyzing highly non-linear and complex relationships. Reference James, Witten and Hastie25 Cornelius et al. Reference Cornelius, Akman and Hrozencik26 used RFs to predict COVID-19 patient mortality. Iwendi et al. Reference Iwendi, Bashir and Peshkar27 used RF models to predict severity of COVID-19 cases using patient geographical, travel, health, and demographic data. RFs are also able to produce a summary of the importance of predictors. A thorough search of relevant literature did not yield any studies that have directly examined the effect of historical COVID-19 records (ie, cases, deaths, vaccinations, positive tests, hospitalizations, school closures, travel bans, etc.) and Google Trends data in determining social media sentiment. RFs are a useful tool to develop a model of using COVID-19 public health data and Google interest to predict Twitter sentiment over the course of the pandemic.
It is important to note that negative and positive events are not treated equally in public discourse. Individuals have been known to perceive negative experiences more intensely than positive ones. 28–Reference Baucum, Cui and John30 There may be evidence that negative events are more contagious than positive events. 28 On the other hand, certain key topics relating to the pandemic may be perceived more positively than expected. Yousefinaghani et al. Reference Yousefinaghani, Dara and Mubareka31 show that vaccine-related tweets tend to be more positive than negative. Stay-at-home tweets are also shown to be more positive than negative. Reference Ridhwan and Hargreaves32 In the context of the prolonged stress experienced by many during the pandemic, higher levels of resilience may be associated with an increase in positive emotions. Reference Israelashvili33 The complex nature of COVID-19 discourse suggests that negative sentiment may not have been the dominant emotion expressed on Twitter.
Research Questions
Q1: What were the public positive and negative sentiments on Twitter in the United States during COVID-19 pandemic?
This question is investigated by comparing the 8 twitter emotion types and their dynamics over time using the data from January 1, 2020, to September 1, 2021, in the United States. The exploratory study determines whether the public sentiment was evenly split between positive and negative sentiment, and that all emotions were equal, or some emotions were more common than other. For example, fear likely dominated the conversation because of the various economic, social, and health challenges experienced due to COVID-19 in the United States.
Q2: How did Google Trends and real-time historical COVID-19 data relate to sentiment on Twitter in the United States during COVID-19 pandemic?
This question is investigated by comparing Twitter emotion data and Google Trend emotion data and their dynamics over time using data from January 1, 2020, to September 1, 2021. The analysis examines the relationship of Google Trends and historical COVID-19 data to sentiment and emotion on Twitter over the period studied in the United States. For example, rapid increases in cases and deaths were likely significantly related to changes in sentiment and emotions on Twitter.
Data Collection
Twitter data was sampled daily from January 1, 2020, to September 1, 2021, for tweets residing in the United States using the full archive search Twitter API. Zepecki et al. Reference Zepecki, Guendelman and DeNero34 outlined a methodological framework to retrieve Internet data for health research, suggesting that interest be measured in respect to a list of top queries. After an exploratory analysis, Twitter and Google APIs were queried using the list of keywords “covid”, “coronavirus”, “covid19”, “corona”, “pandemic”, “quarantine”, “lockdown”, and “outbreak”. These terms were the most frequently used in discussions of COVID-19 on social media platforms. They were determined through topic analysis of all tweets over a period, as demonstrated in the studies by Schweinberger et al. Reference Schweinberger, Haugh and Hames15 and Hu et al. Reference Hu, Wang and Luo13 Future studies may first do a relevant topic analysis, then pull relevant tweets for a more representative sample. A unigram (1-word) method was chosen because of its optimal use in RF models. Reference Zhang, Saleh and Younis24 A total of 2,500,000 tweets were pulled, and just under 900,000 unique tweets were identified for this study.
Shortly after COVID-19 was discovered, there was little discussion about the virus. Some days, therefore, have a small number of tweets which leaves the subsequent analysis vulnerable to sampling error. To avoid this, sampling was constructed at 3 locations throughout each day as outlined by Kim et al. Reference Kim, Jang and Kim35 Geo-tweet information is provided when users activate location access and provide a finer geographical scale; however, not all users activate this function. According to Twitter, only 30-40% of tweets contain information about profile location. 36 It was deemed that geographical analysis was not generalizable enough, so state-level and city-level granularity was not included in this study. Tweets were preprocessed to remove retweets, references to screen names, hashtags, spaces, numbers, punctuations, URLs, retweet headers, time codes, stop-words, and duplicate tweets.
Google Trends data were obtained using the Trends API and gtrendsR endpoint in R. Google Trends returns data in daily granularity only if the timeframe is shorter than 9 mo, so daily estimates for each month and monthly data for the entire time frame were retrieved, and daily estimates for each month were multiplied by the weight calculated from monthly data to calculate daily estimates from January 1, 2020, to September 1, 2021. Google Trends estimated interest is shown in Figure 1.
Historical data about the virus were supplied by Our World in Data from the COVID-19 Data Repository by the Center for Systems Science and Engineering at John Hopkins University, government sources, and peer reviewed research. This dataset includes confirmed cases, confirmed deaths, vaccinations, hospital and intensive care unit (ICU), tests and positivity, the reproduction rate of the virus, policy responses, and other variables of interest. Missing data were substituted with estimated values from near neighbors as outlined by Kang. Reference Kang37 (2013). New cases and new deaths over time are visualized in Figures 2 and 3, respectively.
Data Summary
The summary statistics of each variable included in this study, including historical COVID-19 health and policy data, Twitter sentiment (positive, negative, trust, surprise, sadness, joy, fear, disgust, anticipation, anger), and Google Trends interest, are given in Table 1. Note that vaccinations and boosters contained many null values because vaccines were only available later in the pandemic.
Methods
Corpus-linguistic techniques were used to create a word cloud of most used words in sampled tweets. The National Research Council Lexicon dictionary (NRC-Lex) was used to conduct sentiment analysis. The NRC-Lex dictionary is based on the 8 emotion classifications (joy, sad, anger, fear, trust, disgust, surprise, anticipation) and sentiment (positive or negative). Frequencies of each emotion and sentiment were obtained in time series.
Sentiment prediction was achieved using RF models. Twitter sentiment counts, Google Trends estimated interest, and historical COVID-19 data were aggregated by day, and 10 RF models were developed for each sentiment type. A training dataset was formed with two-thirds of the data, and a test set was formed with the remaining rows. Mean absolute percentage error (MAPE) was calculated for training and test sets. Important parameters can be calculated for RF models based on node purity and minimal depth. Both indexes are effective, but node purity was chosen as the primary method for this study. Unimportant variables were discarded to prevent overfitting, and a new model was appropriately refitted for each sentiment type using the most important variables. Including relevant variables improves the performance of RFs.
Random Forests
RFs are a substantial modification of bootstrap aggregation (bagging). Bagging is a variance-reduction technique for an estimated predictive function, formed by building a large collection of de-correlated trees with each generated tree being identically distributed, then averaging the resulting trees. Reference Breiman38 Trees are ideal candidates for sentiment analysis because they can capture complex interaction structures inherent in the highly correlated text data. Trees have relatively low bias if grown sufficiently deep. However, trees are notoriously noisy and thus need averaging. Using stochastic perturbation and growing and averaging trees on samples avoid overfitting. The algorithm is as follows.
-
1. For b = 1 to B:
-
(I) Draw a bootstrap sample ${Z^*}$ of size N from the training data.
-
(II) Grow an RF tree ${T_b}$ to the bootstrapped data, by recursively repeating the following steps for each terminal node of the tree, until the minimum node size ${n_{min}}$ is reached.
-
i. Select m variables at random from the p variables.
-
ii. Pick the best variable as split-point among the m.
-
iii. Split the node into 2 daughter nodes.
-
-
-
2. Output the ensemble of trees $\{ {T_b}\} _{b = 1}^B$ .
-
3. $\hat f_{rf}^B$ (x)= ${1 \over B}\sum\nolimits_{b = 1}^B {{T_b}} $ (x).
After Bth recursion, tree sequences $\{ T(x;\;{\theta _b}\left( Z \right)\} _{b = 1}^B$ are grown, the RF predictor at a single target point x is
where ${\theta _b}$ parameterizes the bth RF tree in the sequence in terms of split variables, cutpoints at each node, and terminal-node values.
Increasing B does not cause the RF to overfit as
with an average over B realizations of θ(Z) and the distribution of θ(Z) is conditional on the training data Z. Using full-grow trees results in one less tuning parameter and seldom costs much. The robustness is largely due to the relative insensitivity of misclassification cost to the bias and variance of the probability estimates in each tree. Let ρ(x) is the conditional sampling correlation between any pair of trees used in the averaging,
where ${\theta _1}$ (Z) and ${\theta _2}$ (Z) are a randomly drawn pair of RF trees grown to the randomly sampled Z. ${\sigma ^2}$ (x) is the sampling variance of any single randomly drawn tree, ${\sigma ^2}$ (x)=Var(T(x; θ(Z)).
Then
The conditional covariance of a pair of trees fits at x is zero due to the fact that the bootstrap and feature sampling is independent and identically distributed (i.i.d). On many problems, the performance of RFs is very similar to boosting, and they are simpler to train and tune. Hastie et al. Reference Hastie, Tibshirani and Friedman39 made grand claims that RFs are “most accurate”, “most interpretable”, and the like with very little tuning required.
Sentiment Analysis
To address the first research question, frequency counts from the sentiment analysis of sampled tweets using the terms “covid”, “coronavirus”, “covid19”, “corona”, “pandemic”, “quarantine”, “lockdown”, and “outbreak” were totaled independent of time to produce the findings in Figure 4. Figure 4 shows that, over the course of the period studied, sentiment tended to be more positive than negative. Fear was the most popular emotion, followed closely by trust. Other emotions were less common, including anticipation, sadness, anger, joy, surprise, and disgust.
Another perspective on sentiment is given with the word cloud in Figure 5, which shows the most popular words in the Twitter sample. The most popular words were “quarantine” and “trump”. Figure 5 also portrays how words were associated with emotions from the sentiment analysis. Note, this word cloud was weighted toward the keywords that were used, and did not include all popular words due to spacing constraints.
The temporal trajectories of observed and predicted sentiment are also plotted over the time. Figure 6 used the complete dataset, Figure 7 used the training dataset for the RF (420 observations), and Figure 8 used the predicted values of each RF model. Green signifies positive sentiment, while red is negative sentiment. The other colors—purple, orange, glue, aquamarine, chartreuse, black, yellow, and pink—correspond to trust, surprise, sadness, joy, fear, disgust, anticipation, and anger, respectively. Notably, all sentiment types tended to follow similar trends.
Visually, it appears that the predictive models performed quite well, matching with the actual data. The MAPE and the reported percentage of variation explained, quantified how well the RF models fit the data. MAPE was produced for both the training and the test sets to investigate overfitting and generalizability in Table 2. A MAPE score of less than 20% was considered excellent, while scores from 20% to 30% were considered good. The MAPE for the test set was consistently 2 to 3 times higher than the training set indicating overfitting, however, the MAPEs for all training and test sets had relatively low values. Additionally, the percentage of variation explained was adequate for all models. The surprise sentiment model performed the worst.
The important variables for each RF model are now detailed for each sentiment type with plots of observed and predicted sentiment provided for reference.
Results
Positive Sentiment Random Forest
The variable importance plot for positive sentiment RF are shown in Figure 9. As a proof of concept, minimal depth and frequent interactions are also plotted to compare important variables decided by node purity and are shown in Figures 10 and 11. With cross-validation of the mean of minimal depth distribution and interaction (Figures 10 and 11), among all the variables, “date”, “total_cases_per_million”, “total_cases”, and “est_hits” are important ones for positive sentiment RF.
Node purity and minimal depth provided similar results for deciding important variables. Interaction methods were deemed too complex to interpret and were not used for the analysis. The observed and predicted positive sentiments over time are shown in Figure 12. Positive sentiment increased during the start of the pandemic, then was stable later; another wave was observed starting in 2021.
Negative Sentiment Random Forest
The variable importance plot for negative sentiment RF are shown in Figure 13. “est_hits”, “date”, “total_cases”, and “total_cases_per_million” are the important variables for negative sentiment RF. Notably, Google Trends interest appears to be the most important variable for prediction.
The observed and predicted negative sentiment over time are shown in Figure 14. The negative sentiments increased at the beginning of the COVID-19, with fluctuation over time.
Positive and negative emotions exhibit distinct trend patterns over time (see Appendix). Sentiment frequency over time diagrams were redrawn to better illustrate trend patterns. All positive sentiments, including trust, surprise, joy, and anticipation, dramatically increased at the start of COVID-19 in 2020, and fluctuate over time, with a second peak observed at the start of 2021, but the overall shape is flat (Figure 15). Nonetheless, negative emotions such as sadness, anger, and disgust increased rapidly at the start of the pandemic, with a minor drop later, and then remained stable with a degree of fluctuation, before continuing to rise and reaching a peak in late 2021 (Figure 16). Of interest, fear sentiment appears in the first wave at the start of COVID-19, then falls noticeably, and then returns with a spike at the end of 2021, but at a lower level than the initial jump (Figure 17).
Discussion and Conclusions
The number of people using social media platforms and search engines has increased dramatically during the digital age. The consumption of news on social media has grown, bringing both lower engagement and a diminished understanding of current events. In the United States, the Internet became a significant source of misinformation during COVID-19 amid social, economic, and public health crises. Twitter and Google Trends provide valuable insights into public discourse surrounding COVID-19. This study presented the results of a sentiment analysis of tweets, Google Trends interest, and historical COVID-19 health and policy data over the course of the pandemic and built a predictive model for sentiment.
Sentiment analysis revealed that people mentioned “quarantine” and “trump” the most. These were some of the most important topics during the pandemic; however, they were weighted toward the keywords in the tweet sample. For example, “quarantine” may not have been as important as the word cloud represented because it was also one of the keywords used to find relevant tweets. Positive sentiments were more common than negative sentiments, while fear and trust were the most common emotions. The sentiment analysis in the present study agreed with Hu et al., Reference Hu, Wang and Luo13 Hussain et al., Reference Hussain, Tahir and Hussain16 and Ahmed et al. Reference Ahmed, Rabin and Chowdhury19
Google Trends interest showed a sharp peak at the beginning of the pandemic, which seemed to be related to the first peaks in COVID-19 cases and deaths. This indicates that people in the United States searched for COVID-19 primarily at the beginning of the pandemic as cases and deaths were first appearing. Google Trends estimated interest agreed with analyses by Mavragani and Gkillas, Reference Mavragani and Gkillas22 Turk et al., Reference Turk, Tran and Rose40 and Alshahrani and Babour. Reference Alshahrani and Babour23
RF models were used to predict sentiment types. The most important factors for all models were date, COVID-19 cases, COVID-19 deaths, and Google Trends estimated interest. These models showed that Google Trends and public health data were both important indicators for changes in sentiment. For positive sentiment, the most important factor was date, but for negative sentiment, the most important factor was Google Trends interest. This makes sense given the relationship of Google Trends interest to COVID-19 cases and deaths. The number of people vaccinated did not affect sentiment as much as the number of cases or deaths. Vaccinations were undervalued in the present analysis—due to the large time range there are too many zero values to notice an effect. It is worth noting that, for fear and joy sentiments, COVID-19 tests were also an important variable. Positive emotions during COVID-19 might be linked to the recovery progress, vaccine development, new hopes of technologies development, and resilience. Reference Israelashvili33
Anger, disgust, and sadness sentiments increased during the pandemic, indicating that people in the United States emotionally were not expecting such a long duration of the pandemic. Fear sentiment shows a big wave at the beginning of COVID-19 since 2020, later on drops gradually, then has a big jump at the end of 2021. Fear sentiment cannot last long, but if the event is persistent, it will come back later. Joy is a kind of positive sentiment, so like the positive sentiment trend, it demonstrated a flat and wavy behavior, reflecting hope at the beginning of 2020 when COVID-19 starts, and at the beginning of 2021. Anticipation, surprise, and negative sentiments showed a series of fluctuation waves. This appears to indicate that people were invested in analysis and information seeking behaviors as evidenced by Google Trends interest.
However, there were several limitations. Twitter tends to represent a younger audience, and does not include the entire conversation surrounding COVID-19. In addition, elderly, poor, and underprivileged members are underrepresented on the Internet. More work needs to be done to smooth the noise in sentiment scores. The present analysis only accounts for the keywords used to query Twitter and Google, and do not represent all possible topics. For a more representative sample, we may have sampled from all available tweets/searches and identified those that were related to COVID-19 using topic analysis. Future research may also use a different sentiment/emotion database to acquire a more diverse look than the 10 sentiment types in this study.
In this study, “vaccine(s)” was not included for key word search. Sentiment related to vaccines is an important aspect of the public’s perception of the pandemic, as the widespread availability and acceptance of vaccines is seen as key to controlling the spread of the virus and eventually bringing the pandemic to an end. However, the decision was made not to include vaccines in queries to maintain a clear interpretation of the relationships between overall sentiment of COVID-19 on Twitter and the predictors. Vaccine sentiment may have introduced nuanced correlations in the presence of misinformation and politics. Future study would conduct a topic analysis in depth to identify terms relating to COVID-19 and stratify keywords into sub-topics including vaccines.
The current research focused on text-based emotion analysis at this stage, because text is still the primary choice for people to express their feelings toward other persons, events, or things. However, a multi-platform approach, such as using CrowdTangle, for richer sources of information can be valuable to analyze emotions. A multi-platform approach may have provided a more comprehensive view of public sentiment. For future research, we will consider incorporating data from additional platforms for the analysis in context for the data noise such as sarcasm and irony.
Extracting emotions behind text is still an immense and complicated task in current literature. The study contributes to existing literature by directly examining the effect of health data and Google Trends interest to Twitter sentiment over the duration of the pandemic. The information from this study can be used to acquire a better understanding of COVID-19’s emotional impact on people and communities, as well as their fears, concerns, and coping mechanisms. Furthermore, tracking the emotional patterns of COVID-19-related tweets over time can offer a more thorough picture of how public views and perceptions of the pandemic are changing. Overall, monitoring COVID-19-related tweets for emotion change can support public health research and help inform strategies to address the impacts of the pandemic on individuals and communities.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/dmp.2023.101
Competing interests
The authors declare that there is no conflicts of interest.