Introduction
Physician rating websites have become an increasingly popular avenue for patients to evaluate prospective healthcare providers.Reference Gao, McCullough, Agarwal and Jha1–Reference Emmert, Meier, Pisch and Sander5 Physician rating websites provide demographic information regarding physicians and their practices, while also including patient-driven perspectives about overall experiences with those physicians and practices. Physician rating websites have emerged as patients increasingly utilise the internet to navigate the medical landscape as well as expecting a higher level of transparency from the healthcare system.Reference Emmert, Meier, Pisch and Sander5
Recent studies have demonstrated the critical role of physician rating websites when patients seek a new physician for care. A survey conducted by the National Research Corporation Health in 2018 found that 37 per cent of patients used physician rating websites as a first step to finding a healthcare provider, while 61 per cent avoided a healthcare provider based on negative reviews.6 Similarly, a survey carried out in 2014 found that 65 per cent of respondents were highly familiar with physician rating websites as a resource, and over half of respondents found physician rating websites ‘somewhat important’ or ‘very important’ in their search for a new care provider.Reference Hanauer, Zheng, Singer, Gebremariam and Davis2 One study found public reporting was instrumental in navigating patients towards specific healthcare providers for elective procedures.Reference Emmert, Gemza, Schöffski and Sohn7
Physician rating websites have been previously analysed within the field of otolaryngology.Reference Sobin and Goyal8,Reference Goshtasbi, Lehrich, Moshtaghi, Abouzari, Sahyouni and Bagheri9 A 2018 study interested in the impact of online otolaryngologist presence on online ratings found that younger otolaryngologists with a greater online presence had higher physician rating website ratings.Reference Calixto, Chiao, Durr and Jiang10 A 2021 study comparing hospital-generated online otolaryngologist ratings with physician rating website otolaryngologist ratings similarly found that professionals with over 30 years of experience tended to have lower ratings.Reference Basa, Jabbour, Rohlfing, Schmoker, Lawlor and Levi11 However, these two studies strictly analysed and compared numerical online rating scores, and only one study assessed descriptive comments using a qualitative analysis, but without a standardised approach.
Sentiment analysis can be used for more rigorous evaluation and standardisation of physician rating website reviews of otolaryngologists, as well as investigation of demographic features and key words and phrases. Sentiment analysis is a tool that employs natural language processing to quantify subjective information, such as written comments. The machine learning component of sentiment analysis allows an unbiased analysis of several thousands of reviews. Although natural language processing has been previously applied to quantitatively characterise online patient reviews in other fields, such as spine surgery and hand surgery, it has not been used to evaluate online reviews of otolaryngologists from physician rating websites.Reference Tang, Arvind, Dominy, White, Cho and Kim12,Reference Tang, Arvind, White, Dominy, Kim and Cho13
The current study therefore aimed to use a natural language processing approach to analyse physician rating website reviews of otolaryngologists in the USA and quantitatively describe patients’ perspectives from interactions with their physicians.
Methods
Data acquisition
In March 2022, data were collected from Healthgrades.com, which is a large, robust and publicly available database of online reviews. Healthgrades.com is a top recommended website when searching for reviews for online ratings of physicians. Online reviews and accompanying star ratings, which ranged from 1 to 5 stars, were scraped in bulk of otolaryngologists across all subspecialties associated with academic otolaryngology – head and neck surgery programmes. Scraping is the automated process of extracting information from webpages. The list of surgeons was cross-checked with their hospital websites. Exclusion criteria included physicians with no online ratings, fewer than 5 reviews or those listed past page 100 on the Healthgrades.com database.Reference Tang, Arvind, White, Dominy, Kim and Cho13 The cut-off of otolaryngologists with fewer than five reviews was quantitatively determined to be optimal, as described in the ‘Data analysis’ section below.
Natural language processing
The sentiment analysis of the online reviews was conducted using the publicly available algorithm Valence Aware Dictionary and sEntiment Reasoner (‘VADER’). VADER is a widely used Python package and has been validated for a sentiment analysis application.Reference Hutto and Gilbert14 VADER analyses written text and translates qualitative information into a quantitative compound sentiment analysis score. The algorithm assigns this ‘sentiment’ score by analysing positive and negative words and connotations of a sentence. For this, the VADER package relies on a dictionary of commonly used positive and negative words, developed by 10 independent raters. The raters were trained and checked for inter-rater reliability. They assigned a score from −4 to +4 for the words in the dictionary, with 0 representing a neutral sentiment. Finally, VADER produces a compound sentiment score of the inputted sentences from −1 to +1, normalising the scores to reflect −1 as a negative sentiment and +1 as a positive sentiment. Of note, the VADER package also accounts for potential modifiers of words, such as ‘very’ and ‘not’. Positive modifiers of words are given higher scores, while negative modifiers preceding words reverse the score.
Data analysis
First, the VADER score was validated using a linear regression analysis to compare the average sentiment analysis score for each otolaryngologist and their average star score, using Matplotlib version 3.2.1 to visualise it. The R2 score for all potential cut-offs of numbers of reviews per physician was used to determine the optimal cut-off to be used in the study's exclusion criteria, as previously described.Reference Tang, Arvind, Dominy, White, Cho and Kim12,Reference Tang, Arvind, White, Dominy, Kim and Cho13
All possible demographic characteristics of otolaryngologists were collected. The association between demographic variables (gender, age and location) and average star ratings and sentiment scores was evaluated using student t-tests and one-way analysis of variance (ANOVA) tests. The age groups were categorised into less than 40 years, 40–49 years, 50–59 years and more than 60 years. Age analysis was conducted to test for any differences in sentiment scores and star scores using a one-way ANOVA test. Physicians in the USA were grouped by location of practice into five categories: West, Midwest, Southwest, Southeast and Northeast, as defined by National Geographic.15 Location analysis was conducted using a one-way ANOVA test. Statistical significance was set at a p-value of less than 0.05.
A linguistic analysis was performed to assess the prevalence of various words in both positively and negatively rated online reviews of otolaryngologists. Individual word frequency analysis was conducted on positive reviews that received a sentiment score greater than 0.5 and on negative reviews that received a sentiment score lower than 0.5. Similarly, word pairs (bigrams) were analysed to include more context and phrases that might otherwise have been missed.
Significant words and bigrams of positive and negative reviews from the univariable model were included in a multivariable logistic regression to determine each word and bigram's odds of a positive review while adjusting for other words. In this model, a positive review was defined as one with a sentiment score greater than 0.5, which was previously used as a cut-off score in similar studies.Reference Tang, Arvind, Dominy, White, Cho and Kim12,Reference Tang, Arvind, White, Dominy, Kim and Cho13
As this study did not research human subjects and solely used publicly available data, it was excluded from institutional review board review.
Results
The analysis included 18 546 online reviews of 1240 otolaryngologists, 813 males and 427 females, across the country.
Model validation
The model was validated using a linear regression analysis of the average sentiment scores from the reviews versus average star scores of the physicians, which yielded a significant positive correlation between the average sentiment and average star scores (r2 = 0.637; p < 0.001) (Fig. 1).
Score by otolaryngologist's age, gender and location
Average sentiment scores were significantly different between the age groups: less than 40 years = 0.630; 40–49 years = 0.587; 50–59 years = 0.536; more than 60 years = 0.528 (p < 0.001). Average star scores were also significantly different between the four age groups (4.699, 4.452, 4.329 and 4.287, respectively; p < 0.001) (Table 1).
VADER = Valence Aware Dictionary and sEntiment Reasoner sentiment analysis tool
Male otolaryngologists received higher sentiment scores (0.609 vs 0.531; p < 0.001) and star scores (4.624 vs 4.342; p < 0.001) than female otolaryngologists (Table 1).
Average sentiment scores were significantly different by location within the USA: otolaryngologists in the Northeast (0.529) had notably lower sentiment scores than otolaryngologists in the West (0.563), Midwest (0.569), Southwest (0.605) and Southeast (0.569) (p = 0.009). Average star score analysis was not significantly different among the five regions, though otolaryngologists in the West (4.40) and Northeast (4.38) had lower average star scores compared with otolaryngologists in the Midwest (4.52), Southwest (4.54) and Southeast (4.44) (p = 0.07) (Table 1).
Linguistic analysis
The most frequently used words in the positive reviews included: ‘best’ (n = 2040), ‘excellent’ (n = 2019), ‘caring’ (n = 1790), ‘kind' (n=1514) ‘friendly’ (n = 1476), and ‘comfortable’ (n = 886). Negative reviews had a high frequency of the following words: ‘pain’ (n = 363), ‘rude’ (n = 334), ‘minutes’ (n = 305), ‘problems’ (n = 279) ‘wait’ (n = 268) and ‘issue’ (n = 245) (Table 2).
Bigram frequency analysis showed that positive words together yielded positive reviews, such as ‘friendly, helpful’ (n = 126), ‘wonderful, experience’ (n = 88), ‘recommend, friend’ (n = 75) and ‘staff, excellent’ (n = 47). Bigram analysis of negative reviews showed that bigrams related to patient emotions and scheduling (doctor availability and punctuality) were most associated with a negative review. Examples of negatively associated bigrams include ‘worst, doctor’ (n = 28), ‘waiting, hour’ (n = 17) and ‘complete, waste’ (n = 16) (Table 3).
A multivariable logistic regression was conducted using frequently used words in reviews and clinically relevant words in otolaryngology. Words strongly associated with a positive review were: ‘confident’ (adjusted odds ratio = 19.39; 95 per cent confidence interval (CI) = 10.01–37.58), ‘kind’ (adjusted odds ratio = 7.19; 95 per cent CI = 5.85–8.85), ‘recommend’ (adjusted odds ratio = 3.76; 95 per cent CI = 3.45–4.11) and ‘comfortable’ (adjusted odds ratio = 8.70; 95 per cent CI = 6.49–11.65). Words associated with decreased odds of a positive review included: ‘dismissive’ (adjusted odds ratio = 0.26; 95 per cent CI = 0.13–0.51), ‘arrogant’ (adjusted odds ratio = 0.09; 95 per cent CI = 0.04–0.24) and ‘pain’ (adjusted odds ratio = 0.62; 95 per cent CI = 0.54–0.71).
Words commonly used in patient–physician interactions in all subspecialties of otolaryngology were also analysed in the multivariable model. Words associated with a positive patient review included ‘natural’ (adjusted odds ratio = 6.66; 95 per cent CI = 2.95–15.01) and ‘breathe’ (adjusted odds ratio = 1.62; 95 per cent CI = 1.31–2.02), while words associated with decreased odds of a positive review included ‘hearing loss’ (adjusted odds ratio = 0.70; 95 per cent CI = 0.51–0.97) and ‘blood’ (adjusted odds ratio = 0.39; 95 per cent CI = 0.25–0.61) (Table 4).
*Adjusted for the other words in the model. OR = odds ratio; CI = confidence interval
Other clinically relevant words across all subspecialties, such as ‘dizzy’ (95 per cent CI = 0.26–2.04), ‘vertigo’ (95 per cent CI = 0.55–1.01), ‘earwax’ (95 per cent CI = 0.51–1.35), ‘ear pain’ (95 per cent CI = 0.27–1.19), ‘sleep’ (95 per cent CI = 0.97–1.50), ‘snoring’ (95 per cent CI = 0.34–1.51), ‘crooked’ (95 per cent CI = 0.44–3.96) and ‘wrinkles’ (95 per cent CI = 0.23–2.24), did not achieve statistical significance.
Discussion
Patients have been increasingly utilising physician rating websites over the past decades when they evaluate and establish care with new healthcare providers.Reference Gao, McCullough, Agarwal and Jha1,Reference Bernstein and Mesfin16–Reference Sabin18 The current study showed that more positive reviews were received by younger and male otolaryngologists compared with their older and female counterparts. Additionally, physicians practising in the Northeast USA had worse sentiment reviews and star scores compared with their colleagues in other regions.
Our finding that male gender for an otolaryngologist was significantly associated with a positive review differs from prior studies. Specifically, Calixto et al. found no difference among provider genders in online ratings.Reference Calixto, Chiao, Durr and Jiang10 Similarly, provider gender has been shown not to have an association with Press Ganey scores within the otolaryngology literature.Reference Shintani Smith, Cheng, Kern, Cameron and Micco19,Reference Tracy, Jabbour, Rubin, Sobin, Lawlor and Basa20 However, a preference towards male general surgeons in both patient-report and Press Ganey scores aligns with the current results, which may reflect longstanding gender bias within a traditionally male-dominated field.Reference Nuyen, Altamirano, Fassiotto and Alyono21,Reference Adudu and Adudu22 One possible explanation is that there may be variability in gender preference depending on whether the encounter is a clinical or a surgical consult. It is also possible that there is a self-selecting group of patients interacting with physician rating websites that differs from the population within prior studies. Dunivin et al. showed that there are differences in the frequency and content of online reviews between patient genders, perhaps biasing the reviews towards favouring male physicians.Reference Dunivin, Zadunayski, Baskota, Siek and Mankoff23
Our finding that younger otolaryngologists received higher ratings than older otolaryngologists is more pronounced compared with prior physician rating website analysis within otolaryngology, which found that age had only a small impact on content found on rating websites.Reference Calixto, Chiao, Durr and Jiang10 Prior studies that have found more favourable reviews for younger physicians have suggested that older physicians have been in practice for longer and thus had more time to accumulate negative reviews, or speculated that younger physicians may have better interpersonal skills because of an increased emphasis on this during medical training over recent years.Reference Basa, Jabbour, Rohlfing, Schmoker, Lawlor and Levi11,Reference Sama, Matichak, Schiller, Li, Donnaly and Damodar24
Our study found that otolaryngologists practising in the Northeast and Western USA had the worst sentiment scores and lowest star scores. These results are similar to those found by Goshtasbi et al., who reported that neuro-otologist review ratings were lower among those in the Western USA compared with the South.Reference Goshtasbi, Lehrich, Moshtaghi, Abouzari, Sahyouni and Bagheri9 Our results may support existing anecdotal evidence that hospitals in certain regions (e.g. the Northeast and California) tend to get lower ratings because of higher patient expectations.Reference Ginocchio, Duszak and Rosenkrantz25 While age, gender and region of practice are not within a physician's control, it is important to recognise these apparent patient biases in their interactions.
Finally, we investigated frequently repeated key words and phrases in physicians’ reviews. We found that physician personality words such as ‘confident’ and ‘kind’ were associated with the highest odds of a positive review. Similarly, patients who indicated they were ‘comfortable’ with their physician were significantly more likely to leave a positive review. Patients who felt ‘natural’ after a visit with their physician, likely with their facial plastics otolaryngologist, were also extremely satisfied in their reviews, although with our approach it is unclear whether ‘natural’ is referring to their interaction with the physician or their appearance. Our findings notably show that words highly associated with positive reviews are words that are related to physician bedside manner and patient emotions rather than the actual quality of the treatment performed or clinical outcome. In contrast, other key words related to physician traits such as ‘dismissive’ or ‘arrogant’ were significantly likely to indicate a review that did not meet the threshold for positivity.
These findings should be taken into consideration by practising otolaryngologists, given the significant influence of physician rating websites on physician selection by prospective patients. A 2014 study by Hanauer et al. linked physician rating website reviews with patient behaviour, revealing that 37 per cent of patients avoided a physician because of bad online ratings, and 35 per cent chose physicians with favourable online ratings.Reference Hanauer, Zheng, Singer, Gebremariam and Davis2 Negative outcomes were more often mentioned in a review, while positive features of bedside interaction with a physician were more strongly associated with positive reviews than the clinical outcome. Therefore, incorporating the positive bedside manner features and avoiding preventable clinical outcomes are essential to improving the patient experience. Our findings underscore the outsized role that bedside manner and interpersonal skills play in patients’ perception and rating of a physician.
There are several limitations to our approach that are worth noting. First, we were unable to capture all otolaryngologists in the USA because of website limitations. However, we believe that our sample of over 1000 otolaryngologists can accurately represent the whole population on Healthgrades.Reference Elfil and Negida26 Second, we were limited by our decision to bulk scrape data from only Healthgrades; nevertheless, our numbers of physicians analysed are larger than any previously published study in general otolaryngology.Reference Goshtasbi, Lehrich, Moshtaghi, Abouzari, Sahyouni and Bagheri9–Reference Basa, Jabbour, Rohlfing, Schmoker, Lawlor and Levi11 We were only able to freely extract data from this website, because other websites such as Google, Vitals, Yelp or Zocdoc cannot be filtered, or the source code does not allow scraping. Therefore, based on previously published work, we focused on scraping large data from Healthgrades in the interest of feasibility.Reference Tang, Arvind, Dominy, White, Cho and Kim12,Reference Tang, Arvind, White, Dominy, Kim and Cho13
In addition, we were not able to break down the results by specific subspecialty, as the results could vary in a subgroup analysis. There is a concern for selection bias, as individuals submitting reviews may be more passionate, either positive or negative, about their physician. In fact, this potential selection bias was highlighted by the fact that most reviews were skewed towards five-star ratings (n = 16 012) followed by one-star ratings (n = 1885) in the current study. Given the rating-agnostic sampling methods for reviews in this study, this finding insinuates that patients’ ratings of their otolaryngologist may be on a binary scale: either they give a ‘satisfactory’ rating of five stars or they leave a rating of fewer than five stars. Additionally, providers may ask patients who had a positive experience to leave a review, further contributing to an unbalanced pool of reviews.
• Physician review websites are frequently consulted by patients to choose healthcare providers
• In this study, words of positive otolaryngologist comportment were most associated with the best sentiment and star scores
• Younger age and male gender of the otolaryngologist were associated with better sentiment and star scores
• Words representing clinical outcomes had less predictive power in determining the outcome of a review
Our results could also be affected by patients’ characteristics, such as gender or age; however, this information was unavailable on the publicly accessible websites. Lastly, it is not possible to scrutinise data points to determine whether or not each review comes from a unique reviewer on Healthgrades, leaving open the possibility that single patients generated multiple reviews for a given physician.
Conclusion
The present study suggests that patient reviews are determined by a combination of factors both within (e.g. bedside manner) as well as outside (e.g. provider age, gender and practice location) the otolaryngologist's control. Online indications of patient satisfaction with their physicians are important to understand, given the increasing popularity of physician rating websites and the outsized role that physician rating websites appear to play in a patient's selection of a new healthcare provider. Based on this, otolaryngologists can understand what patients desire and alter their practice to better fit their goals when providing care.
Data availability statement
The data supporting the findings of this study are publicly available.
Competing interests
None declared.