There have been many pandemics in history, such as Ebola, Middle East respiratory syndrome coronavirus (MERS-CoV), and SARS, which caused deaths. Reference McMullan1 The COVID-19 pandemic first broke out in Wuhan, China, in December 2019 and spread rapidly worldwide. Reference Zhang and Ma2 Due to the high rate of transmission and death rates of the prevalence of COVID-19, the COVID-19 pandemic is a significant threat that can affect various aspects of physical and mental health. Reference Rothan and Byrareddy3 The World Health Organization (WHO) stated that COVID-19 is an urgent global public health problem, and all countries should take a role in detecting and preventing infection. 4 Due to high infection and death rates, countries had to take restriction measures in many areas. Reference Dutta and Smita5 Governments worldwide implemented masks, social distancing, and isolation measures to prevent the spread of the coronavirus; the COVID-19 vaccines were developed in some countries. Reference Le, Cramer and Chen6,Reference Li, Tenchov and Smoot7
Social media applications (Twitter, Facebook, Instagram, Reddit, and the rest) known as Web 2.0 applications that provide interaction between users have become popular in recent years for information about health and vaccines. Reference Appel, Grewal and Hadi8 Social and statistical studies have shown that users’ time on monthly, weekly, and daily social media use affects human behavior (Statista, 2019). Social media is not only a critical interface for sharing vital information but has also turned into a media where false information is spread. Reference Gölbaşı and Metintas9 Vaccine hesitancy and opposition negatively affect immunization services. Anti-vaccination campaigns and normalizing anti-vaccination are undermining public health confidence. The public continues to be exposed to anti-vaccine conspiracy theories on social media and other online platforms as vaccination hesitancy increase. It is stated that the reason for the hesitation is related to the increase in doubts about scientific consensus. Reference Bernard, Bowsher and Sullivan10
Researchers have focused on emotions from the beginning of COVID-19 and have extensively studied the increase in negative emotion, Reference Berkovic, Ackerman and Briggs11–Reference Chen, Min and Zhang13 anxiety, Reference Lyu, Han and Luli14 and their regulation Reference Restubog, Ocampo and Wang15 in the public related to the COVID-19 pandemic. Although the COVID-19 vaccines are essential in preventing the disease, vaccination has traditionally faced public fears, hesitations, and even opposition. Reference Ball16,Reference Abbasi17 Identifying human emotions from social media is essential for public awareness and policy development. Reference Alamoodi, Zaidan and Zaidan18 It was stated in the study of Singh et al. that many epidemics and pandemics can be quickly brought under control if health-care professionals take into consideration social media data. Reference Singh, Singh and Bhatia19
According to the “Digital 2020: Global Digital Overview” report, Twitter is a free microblogging platform with approximately 340 million registered users. Turkey is one of the countries that use Twitter the most. Turkey ranks 6th in the world and 2nd in Europe with 11.8 million Twitter users. 20,21 The most followed accounts on Twitter in Turkey belong to government officials, politicians, sports clubs, government agencies, media persons, and organizations. 22 Twitter data can be used in research due to easy accessibility. Reference Mathur, Kubde and Vaidya23 The popularity of Twitter and its greater adoption than other social platforms motivated the use of Twitter for the present study. The increase in the use of Twitter and other social media in the COVID-19 pandemic has brought about the effects of the pandemic with sentiment analysis and the examination of opinion mining in countries. Reference Mathur, Kubde and Vaidya23 Sentiment analysis of Twitter data, which provides essential information about public health crises, is done by the Natural Language Processing (NLP) method. This method is the artificial intelligence (AI) analysis of texts in English and Turkish languages by computer programming. Sentiment analysis methods are preferred in analyzing society or groups of different scales, especially in appreciation, comment, and opinion mining.
Big data analysis of social media discussions about the COVID-19 vaccines is limited. Reference Bonnevie, Gallegos-Jeffrey and Goldbarg24,Reference Hussain, Tahir and Hussain25 Research involving the latest social media data is needed to understand the public debate about the COVID-19 vaccines during the pandemic. Individuals or patients need to benefit from the opinions and advice of other individuals in making decisions about medical problems. Reference Abualigah, Alfar and Shehab26 Knowing about the content of the COVID-19 vaccination discussion on Twitter will provide a possible explanation for users’ attitudes and their acceptance or hesitancy of the COVID-19 vaccines. The current study is the first known study in Turkey to conduct sentiment analysis using Twitter data for the COVID-19 vaccines. Public debate and concerns about the COVID-19 vaccines need extensive monitoring. The public is at great risk for COVID-19. Therefore, risk communication techniques are used effectively in public health interventions. Notable changes in critical issues and sentiments need to be recognized; public awareness of public health education and campaigns needs to be raised. In addition, it is crucial for health professionals to understand the emotional aspect of Twitter messages and to be able to make the right interventions in reducing concerns about the COVID-19 vaccines. Nurses always maintain close contact with patients/healthy people. Therefore, in individual and social health assessments, nurses in particular need to know and understand the feelings of the public and take precautions. The current study aims to examine discussions about COVID-19 vaccination on Twitter in Turkey and conduct sentiment analysis.
Research Questions
-
1. What are the sentimental polarities of the statements on Twitter about COVID-19 vaccines?
-
2. What is the frequency of words in Turkish tweets about COVID-19?
-
3. What is the relationship between document segments and relative frequency?
-
4. What are the accuracy values of tweets about COVID-19 vaccines according to ML algorithms?
Methods
Sentiment analysis of Twitter data was performed in the present study with the Natural Language Processing (NLP) method of artificial intelligence (AI). For a flowchart of the proposed model, see Figure 1.
Data Collection
The tweets were retrieved retrospectively from March 10, 2020, when the first COVID-19 case was seen in Turkey, to April 18, 2022. Tweets collected on April 18, 2022. With the hashtags in Turkish (#covid19asi (#covid19vaccine), #covid19asisi (#covid19vaccination), #coronaasi (#coronavaccine), #coronaasisi (#coronavaccination), #koronaaşısı (#koronavaccine), #koronaaşı (#koronavaccination), #biontech, #sinovac, #turkovac) were collected. 10308 tweets accessed. One of the methods frequently used to collect data is Twint. Reference Agustiningsih, Utami and Al Fatta27 Tweets were collected using the Twint library, which has no application programming interface (API) restrictions.
The data analysis was done in Google Colab using Python 3.0 programming language and Python 3.0 compatible Pandas, NumPy, Matplotlib, NLTK, Scikit-learn, and TextBlob libraries. In addition to using Dictionary-based methods, this research used several ML (ML) models for sentiment analysis on an annotated dataset of COVID-19 vaccine tweets. The resulting dataset was used for training and testing ML classifiers. The model was trained and tested with naive Bayes (NB), random forest (RF), XGBoost (XGB), and logistic regression (LR) algorithms. Performance was evaluated using accuracy. The accuracy values of ML algorithms were evaluated as Count vectors accuracy, Word-level TF-IDF accuracy, N-Gram TF-IDF accuracy, Charlevel accuracy.
Data Preprocessing
Figure 1 presents the data preprocessing flowchart.
Sentiment Analysis
Dictionary-based methods are used to calculate emotion score to identify a tweet’s hashtag as positive, negative, or neutral using polarity score with dictionary methods. Sentiment analysis of Twitter data was made with the data that was preprocessed and prepared at this stage. AI’s NLP method was used; this method includes analyzing the keywords that appear in search topics, exploring the sentiment expressed on everything related to COVID-19 vaccines, including word frequency statistics and word clouds. Polarity and subjectivity scores of tweets ranging from -1 to 1 and 0 to 1 were calculated. A subjectivity score of 0 means that the tweet is objective, and a score of 1 means that the tweet is subjective. Polarity scores between -1 and 0 were classified as negative, those equal to 0 as neutral, and between 0 and 1 as positive emotions. The polarity of positive, negative, and neutral text is found with the TextBlob library. It is seen that the accuracy is quite high in the studies in which Turkish sentiment analysis is performed. Reference Demircan, Seller and Abut28–Reference Balli, Guzel and Bostanci33
Ethical Approval
Written permission was obtained from the Twitter developers to use the data. Due to the type of research, ethical permission was not required. The tweets used in the current study are accessible to the public as they are published openly. Still, the privacy of the tweeters was taken into consideration. Due to the impossibility of obtaining informed consent from all tweeters, no personal and sociodemographic data were analyzed. The data were collected using an automated process. Hence, the privacy of those who tweeted was protected. 34
Findings
Tweets about the COVID-19 vaccines were classified according to basic emotion types using sentiment analysis. It was found that 7.50% of the tweeters were positive, 0.59% negative, and 91.91% neutral about the COVID-19 vaccines. In addition, some polarity examples of tweets can be seen in Table 1.
For Wordcloud, https://voyant-tools.org/, 34 an online, open source, known, Turkish character supported software was used and it is shown in Wordcloud Figure 2. In the WordCloud, the most used and prominent words in Turkish tweets are seen. The size of the words in the word cloud map depends mainly on the word frequency, which is essentially a view of the statistics of high-frequency words. The word that occupies the largest area in the figure is the most frequently repeated. Also, repetitive tweets, stop words, punctuation, and hashtags were removed. This wordcloud has document with 15.754 total words and 6.040 unique word forms. Vocabulary Density is 0.383. Readability Index is 18.668. Average Words Per Sentence is 16.2. Most frequent words in the wordcloud is biontech (415); aşı (237); sinovac (221); doz (185); 2 (146). There are 105 words in this wordcloud shown in Figure 2.
The links of the frequently used words in the analysis are shown in Figure 3. The relationship between document segments and relative frequency is shown in Figure 4. Count vectors accuracy, Word-level TF-IDF accuracy, N-Gram TF-IDF accuracy, and Charlevel accuracy values of ML algorithms are shown in Table 2. When the accuracy values of the ML algorithms used in this study were examined, it was seen that the XGB algorithm had higher scores.
Discussion
Defined as “the cornerstone of public health safety” by the WHO, surveillance aims to detect high morbidity and mortality rates, take control measures, and report any event to WHO. 35 Historically, public health workers have used data from multiple sources to measure disease burden and other health outcomes, prevent and control disease, and guide health-care activities. With the spread of the Internet and the advent of modern technology, potential new sources of data are emerging. In recent years, researchers have accepted that social media platforms such as Twitter and Facebook can also provide data on health and behavior at the national level. Reference Neiger, Thackeray and Burton36
Social media applications are characterized as a valuable source of information for health research. Reference Conway, Hu and Chapman37 Internet users follow health-related issues on social media and share personal health information. They cited social media data and studies on communicable diseases such as influenza, as well as noncommunicable diseases such as mental health. It is stated that Twitter data are used to assist public health efforts related to surveillance, event detection, pharmacovigilance, prediction, disease tracking, and geographic identification and has yielded positive results. Reference Edo-Osagie, De La Iglesia and Lake38 According to Tavoschi et al. used 693 Twitter data on vaccines in Italy and automatically classified them as pro-vaccine, anti-vaccine, and abstaining. Reference Tavoschi, Quattrone and D’Andrea39
It has been found in the current study that 7.50% of tweeters in Turkey have a positive opinion, 0.59% a negative opinion, and 91.91% a neutral opinion about the COVID-19 vaccines. Similarly, in the study in which sentiment analysis of the COVID-19 vaccine was conducted from Japanese Twitter users between August 1, 2020, and June 30, 2021, it is stated that emotions are generally neutral. Reference Niu, Liu and Nagai-Tanima40 After the first COVID-19 vaccine was given to Minister of Health Fahrettin Koca on January 13, 2021, vaccination was started in Turkey. 41 The vaccination scheme of the Ministry of Health of the Republic of Turkey against COVID-19 is shown in Figure 5.
The first vaccination studies in Turkey started with Sinovac. Biontech, which was produced later, was put on the market. The Sinovac vaccine was announced to be less protective against some variants, according to Biontech. Therefore, Biontech was seen as a reliable vaccine by the Turkish society. In February 2022, Minister of Health Dr. Koca announced that the Turkovac vaccine, which was produced in Turkey and received Emergency Use Approval, will be made as a reminder dose. However, the Phase 3 study of the vaccine has not yet been published. On the other hand, statements that the number of cases in Turkey were explained incorrectly took place in the media. The number of COVID-19 cases has been corrected by the Ministry and started to be published. These statements undermined public confidence. In the study, in which the self-orientalist discourses regarding the Turkovac vaccine are determined with the data obtained from Twitter, it is seen that the East does not or will not have the capacity to follow modern scientific developments as much as the West. Reference Çankal42 There are also discourses about the reality of Turkovac. In this context, policy makers need to act quickly, follow the pulse of the public on social media, and make reliable statements. People’s perceptions of vaccines, the widespread disruptions caused by the COVID-19 pandemic on Turkish soil, society seem to be neutral toward the benefit of the vaccine. The success of the COVID-19 vaccine in achieving herd immunity depends on widespread adoption of the vaccine. The expansion of the anti-vaccine community, which is largely spread on social media, is likely to discourage such adoption, giving the virus an edge.
Although COVID-19 vaccines were available at the time of the research, it can be argued that the reason for the high number of neutral thoughts in this study is due to the prevailing feeling of insecurity. Because it is known that the effect value of Sinovac is low and Turkovac has not been used yet. In addition, the fact that Biontech is an mRNA vaccine, a technology used for the first time in vaccines, has created disinformation among the public. Vaccination dates are also effective in neutral thinking (Figure 5). Big data analysis shows that positive attitudes of people on twitter toward the COVID-19 vaccine in 10 countries show a decline in positive attitude over the time period. Reference Greyling and Rossouw43 This situation may be directly proportional to the decrease in trust.
In the wordcloud given in this research (Figure 2), it was seen that the most popular vaccine among people living in Turkey is Biontech. In a study conducted in the United States, the most popular vaccine among people was Pfizer. Reference Na, Cheng and Li44 This is thought to be due to the difference in vaccines produced and imported in different countries.
Unlike the present study results, there are studies in which the public has a negative opinion. In the study in which the sentiment analysis of the COVID-19 vaccines discussions using Twitter data in Indonesia, it was reported that 39% of the individuals had a positive opinion, 56% a negative opinion, and 1% a neutral opinion. Reference Pristiyono and Al Ihsan45 In the study of Bonnevie et al. to measure the rise of anti-vaccination during the COVID-19 pandemic in the United States, it was stated that anti-vaccine users on Twitter increased by 80% over the period. Reference Bonnevie, Gallegos-Jeffrey and Goldbarg24 It was stated that the most negative emotion was fear. Reference Kwok, Vadde and Wang46 A sentiment analysis study conducted in Canada indicated that most of those tweeting were hesitant about vaccination. Reference Griffith, Marani and Monkman47 A study containing sentiment analysis of the COVID-19 vaccines with tweets in the United Kingdom and United States expressed that approximately 40% of the population of both countries had negative emotions. People living in the United States were more concerned about the side effects and safety of the vaccine due to the few deaths that occurred after the vaccination. Reference Na, Cheng and Li44 In a sentiment analysis conducted in Korea between February 23 and March 22, 2021, it was observed that tweets with negative views were relatively high. Reference Shim, Ryu and Lee48 A study examining tweets in the United States and India shows that negative feelings dominate the COVID-19 vaccines in both countries. Reference Sharma and Sharma49 In the sentiment analysis of a COVID-19 vaccine made in Iran, it was stated that negative attitudes toward domestic and imported vaccines increased in some periods. Reference Nezhad and Deihimi50 It is reported that the reason for the negative feelings of the public toward the COVID-19 vaccine is due to the dominance of anti-vaccine and vaccine hesitancy groups that appear on social media.
When the accuracy values of the ML algorithms used in this study were examined, it was seen that the XGB algorithm had higher scores. In the COVID-19 vaccine sentiment analysis study, which uses a worldwide dataset, it is stated that the LSTM-GRNN algorithm outperforms TextBlob and deep learning models. Reference Reshi, Rustam and Aljedaani51 ML models were used in the study, in which the dialogues about the COVID-19 vaccine opposition were analyzed. These models are RF, Support Vector Machines (SVM), Multilayer Perceptron (MLP), Gradient Boost (GB), Long Short Term Memory (LSTM). Among these models, it was observed that the ones with the best performance were RF, SVM, and GB, respectively. Reference Paul and Gokhale52 In the study, which deals with the perspective of Sinovac and Pfizer with Twitter data in Indonesia, it is stated that the SVM algorithm performs well. Reference Nurdeni, Budi and Santoso53 In the study of Ritonga et al., it is stated that the performance of the NB algorithm is high in sentiment analysis of Twitter data related to the COVID-19 vaccine. Likewise, in another study using NB, the results were highly accurate. Reference Villavicencio, Macrohon and Inbaraj54 In a study where Twitter data were trained with Bi-LSTM, SVM, and NB models, it was stated that the Bi-LSTM model performed better than the others. Reference To, To and Huynh55 As can be seen, accuracy values differ in studies conducted with different datasets and using different algorithms. Therefore, in this study, the best performing XGB algorithm is proposed.
Studies with positive feelings toward the COVID-19 vaccines are seen in the literature. A study in which sentiment analysis of vaccine discussions on Twitter was conducted using ML methods between January and October 2020 in Australia showed that approximately two-thirds of Australian people had a positive opinion about the COVID-19 vaccines. Approximately one-third had a negative opinion. In a sentiment analysis study conducted with tweets collected from the United States, United Kingdom, Canada, India, Australia, Ireland, and Nigeria over 4 mo from December 1, 2020, to March 31, 2021, the positive opinion of AstraZeneca/Oxford, Pfizer/BioNTech and Moderna vaccines continued to be stable. Reference Marcec and Likic56 In a study conducted in the United States between November 1 and December 16, it was noted that 48.3% of the tweets were positive, 36.1% of the tweets were neutral, and 15.6% of the tweets were negative. Reference Rahul, Jindal and Singh57 In another study conducted in the United States, it was emphasized that the majority of COVID-19 vaccines Twitter posts have pro-vaccine sentiment (45.7%), neutral sentiment (28.6%), and anti-vaccine sentiment (25.7%). Reference Scannell, Desens and Guadagno58 Lyu et al. carried out a sentiment analysis of the COVID-19 vaccine-related tweets posted from March 11, 2020, to January 31, 2021. Reference Lyu, Han and Luli14 The most dominant emotion shown in the pre-April COVID-19 vaccine tweets was fear in the study. However, it was declared that as of the week of April 1, 2020, the fear changed to trust and continued. It was determined that the feeling of confidence reached its peak on November 9, 2020, when it was announced that the Pfizer vaccine was 90% effective. The public has positive opinions about the vaccine since it is thought that trust has increased and will continue to increase over time.
Public health professionals and those who use Twitter as a dissemination tool can benefit from the presence of trust evident in a subset of tweets. Public health professionals can use language that allows for confidence building when composing tweets. Trusting public health officials and their actions during a crisis is critical and a common theme in public health literature. Public health professionals can build trust on issues of public health importance by creating tweets that engage the public through social media, education, regular and timely communication, and evidence-based information. Reference Papadopoulos, Sargeant and Majowicz59
Conclusions
Few of the tweets reviewed were pro-vaccine and positive sentiment; however, anti-vaccination takes precedence. Most tweets consist of neutral emotions. According to research question 1; it was found that 7.50% of the tweeters were positive, 0.59% negative, and 91.91% neutral about the COVID-19 vaccines. According to research question 2, most frequent words are biontech (415); aşı (237); sinovac (221); doz (185); 2 (146). According to research question 3, in the relationship between document sections and relative frequency, the most correlated word is biontech (Figure 4). According to research question 4, when the accuracy values of tweets about Covid-19 vaccines are examined according to ML algorithms, it is seen that the XGB algorithm has the most accurate value (Table 2).
It is crucial for the Turkish government to actively encourage its citizens to get vaccinated and to help them understand the importance of vaccination. The best way to educate citizens about the positive side of vaccination is to address the fears they have expressed in social media posts about COVID-19 vaccines. The effective use of social media by the Ministry of Health and professional organizations to reach the broader masses in providing accurate information about vaccines, presenting explanations, and scientific studies in plain language can effectively eliminate the confusion in society. Trusting public health officials and their actions during a crisis is critically important. Establishing and monitoring risk communication is indispensable in managing and controlling an extraordinary public health emergency such as COVID-19. Risk communication specialists can create tweets. Public health specialists engage the public through education. In addition, public health professionals experienced in risk communication should engage in health education and health promotion interventions. Public health professionals can create tweets that engage the public through education, regular and timely communication, and evidence-based information on social media. In addition, public health authorities can analyze tweets using hashtags. They can measure the feelings of the people and try to understand them. The use of ML algorithms in sentiment analysis provides fast and high accuracy values. Therefore, the accuracy value of the XGB algorithm was found to be high for this study. Health policies can be formed according to the health status of the people. It can increase confidence in matters of public health importance. It may be recommended to conduct other ML algorithms and analyze population subgroups’ feelings about the COVID-19 vaccines. In addition, it can be thought that it would be helpful to conduct studies that evaluate tweets since the introduction of the COVID-19 vaccines.
Limitations
The research limitations are that Twitter represents community engagement, the demographics of user profiles are low, the group with low media literacy and not using Twitter is mostly elderly, and the sample tweets used do not fully represent COVID-19 vaccines. The high representation of minorities among Twitter users makes it difficult to assess health services inequalities. Also, the accuracy of the Twint library is limited. Accuracy is limited as Sentiment analysis is performed in Turkish language. The strengths of the current study are instant captures as tweets are sent in real-time, using AI to evaluate tweets, and analyzing big data faster than humans can. The rapid change of tweet content is the weakness of the present study. The results of the study show the insights that tweets can provide for a health-related event.