I. Introduction
Wine is widely perceived as an experience good whose market structure is typically a monopolistic competition, where information plays a key role in consumers’ purchasing decisions and their willingness to pay (WTP). Research into wine prices or WTP allows for a better understanding of consumer behavior, including their reaction and sensitivity to price changes, to understand market trends, to anticipate changes in supply and demand, and to provide support for decision-making, in order to increase the economic performance of wineries (Le Fur et al., Reference Le Fur, Thelisson and Guyottot2023). Therefore, knowledge about the predictors, or determinants, that influence consumers’ WTP is strategically important for wineries to increase their profitability and to develop more effective pricing and marketing strategies aligned with consumer preferences.
The WTP has been generally estimated through the hedonic price function that models the WTP as a function of predictive objective and subjective product attributes, as well as control variables as a set of consumers’ socioeconomic characteristics. Thus, the estimation of WTP has been dominated by a theory-driven paradigm, in which the researcher imposes a structure on data, and models consumer decisions based on utility theory. The best model is selected by comparing the econometric results based on alternative functions differing both in terms of functional form and in the selected explanatory variables.
Nevertheless, alongside the restrictive assumptions about consumer behavior, it remains uncertain whether the chosen econometric model adequately represents the data generation process and if it can be used for predictive and assertive inferences (Rodrigues et al., Reference Rodrigues, Ortelli, Bierlaire and Pereira2022). To overcome these potential drawbacks, recent alternative modelling and estimation techniques have emerged based on data-driven analysis supported by machine learning (ML) algorithms, complementing and enlarging the traditional choice modelling approach (e.g., van Cranenburgh et al., Reference van Cranenburgh, Wang, Vij, Pereira and Walker2022). This approach has been recently applied to the field of wine (e.g., Niklas and Rinke, Reference Niklas and Rinke2020; Rinke and Ho, Reference Rinke and Ho2023). In this line of research, van Cranenburgh et al. (Reference van Cranenburgh, Wang, Vij, Pereira and Walker2022) point out the need for additional work integrating and comparing the results of the two modelling paradigms, the econometric approach and ML one, to which this paper aims to contribute. Associated with this objective, the main research questions are: (i) what are the main predictors of WTP for Portuguese sparkling wine consumers? and (ii) how similar are the results obtained with the econometric and ML approaches?
Thus, the contribution of this article is threefold: (i) identifying the main drivers of WTP of Portuguese sparkling wine consumers; (ii) exploring the determinants of WTP for wine, employing a dual lens of traditional econometric methods and contemporary ML techniques; and (iii) providing useful information for wineries to outline better marketing strategies for sparkling wine.
This article is structured as follows: section II presents an overview of the main potential predictors of WTP for wine, including the sparkling wine category; section III describes the data and methods, i.e., the ordered probit model and the ML approach; section IV describes and discusses the results; and, finally, section V presents the conclusions of the study.
II. Theoretical background
The WTP for wine has been the subject of extensive research in several countries over the last 30 years. The literature has revealed a series of determinants that influence WTP, using econometric models based on Lancaster's (Reference Lancaster1966) theory of consumer behavior, which emphasizes the importance of product characteristics. Generally, WTP is assumed to be influenced by intrinsic and extrinsic wine cues, which consumers use according to experiential and psychological factors, such as wine knowledge (Dodd et al., Reference Dodd, Laverie, Wilcox and Duhan2005) and involvement (Cox, Reference Cox2009), as well as socioeconomic and demographic variables, including income, age, education, and socioeconomic status (Elliot and Barth, Reference Elliot and Barth2012; Lange et al., Reference Lange, Martin, Chabanet, Combris and Issanchou2002; Lerro et al., Reference Lerro, Vecchio, Nazzaro and Pomarici2020; Skuras and Vakrou, Reference Skuras and Vakrou2002). Outreville and Le Fur (Reference Outreville and Le Fur2020) present a descriptive review of empirical studies on the determinants of wine price during the 1993–2018 period. In the same line, Le Fur et al. (Reference Le Fur, Thelisson and Guyottot2023) include a recent and detailed literature review and a bibliometric analysis of academic research on wine prices in economics, covering 180 articles published in journals between 1992 and 2022. Both literature reviews provide useful knowledge on price predictors and the methodologies that have been used, highlighting the role of the theory-driven paradigm and the associated econometric estimation of hedonic price functions.
Early studies usually employed hedonic pricing models to analyze the formation of wine prices (Combris et al., Reference Combris, Lecocq and Visser1997; e.g., Oczkowski, Reference Oczkowski1994), evaluating the price premium associated with specific characteristics. More recent econometric efforts have expanded their framework to include experimental auctions (Vecchio, Reference Vecchio2013) and discrete choice experiments (D'Alessandro and Pecotich, Reference D'Alessandro and Pecotich2013; Gonçalves et al., Reference Gonçalves, Lourenço-Gomes and Pinto2020; Palma et al., Reference Palma, Ortúzar, Rizzi, Guevara, Casaubon and Ma2016), offering a deeper understanding of consumer preferences and their impact on WTP.
The results of previous studies obtained for sparkling wines suggest distinct preferences and WTP for appellations, Prosecco in this case (Onofri et al., Reference Onofri, Boatto and Bianco2015; Rossetto and Gastaldello, Reference Rossetto and Gastaldello2018; e.g., Thiene et al., Reference Thiene, Scarpa, Galletto and Boatto2013), brands (Vecchio et al., Reference Vecchio, Lisanti, Caracciolo, Cembalo, Gambuti, Moio, Siani, Marotta, Nazzaro and Piombino2019), and loyalty (Bassi et al., Reference Bassi, Pennoni and Rossetto2021). Culbert et al. (Reference Culbert, Ristic, Ovington, Saliba and Wilkinson2017) reveal that the production method influences the sensory profile of Australian sparkling white wine styles, and the Charmat is the preferred method. In turn, Lange et al. (Reference Lange, Martin, Chabanet, Combris and Issanchou2002) compare two mechanisms, the hedonic test and the Vickrey auction, to reveal consumers’ WTP for Champagne, concluding that, in general, participants are willing to pay more for wines from big brands, and older consumers are willing to pay higher prices than younger people for reserve wines. Pickering et al. (Reference Pickering, Duben and Kemp2022) focused on the effect of label information, compare the evaluated WTP and quality perception in different information scenarios for a set of two simulated sparkling wine labels (Champagne and Prosecco). They conclude that those who consider themselves more knowledgeable about sparkling wines are willing to pay more for the Prosecco wine style and that WTP increases with the amount typically paid for both wine styles. Researchers have also found that when sparkling wine is purchased for a special occasion (e.g., a celebration), consumers are often willing to spend more (Morton et al., Reference Morton, Rivers and Healy2004; Velikova et al., Reference Velikova, Charters, Fountain, Ritchie, Fish and Dodd2016). Verdonk et al. (Reference Verdonk, Wilkinson, Culbert, Ristic, Pearce and Wilkinson2017) point out that when consumers buy wine as a gift, they are more willing to purchase expensive, prestigious brands of sparkling wine, including Champagne.
In parallel with advances in econometric models, the development of data science and ML, well described in the literature, has opened new avenues for the analysis of wine WTP. ML techniques, known for their ability to deal with extensive data sets and discover nonlinear relationships, have demonstrated high potential in the wine industry. For example, some studies have applied various ML algorithms to predict wine quality (Jain et al., Reference Jain, Kaushik, Gupta, Mahajan and Kadry2023), taste preferences (Cortez et al., Reference Cortez, Cerdeira, Almeida, Matos and Reis2009), wine price (Niklas and Rinke, Reference Niklas and Rinke2020), and indirectly shedding light on WTP.
In summary, the literature review allowed us to conclude that the main predictors of WTP can be organized into four groups of variables: (i) socioeconomic characteristics (gender, age, education, marital status, place of residence, income); (ii) personal and behavioral aspects (consumer knowledge, who buys, motives of purchase, knowledge of production method, consumption frequency, consumption of other beverages); (iii) collective (region of origin, collective brand) and individual (brands and awards) reputations; and (iv) attributes of the sparkling wine (e.g., category, sweetness, production method, and organic production).
III. Data and methods
a. Data
Based on the findings provided by the literature review and advice from experts in the sparkling wine market, the authors developed a survey that includes three groups of questions: (i) purchasing and consumption patterns; (ii) discrete choice experiment; and (iii) sociodemographic information. Data collection was carried out online and managed by a company specialized in market research in Portugal, in September 2022. This company guaranteed a representative sample in terms of gender, age, income, and professional status. The sample has information from 800 people over 18 years old who consumed sparkling wine at least once a year. Table A1 (Appendix A) describes the variables used in this study and briefly characterizes the data collected. Next, we will present the main characteristics of the database.
Regarding the price of a bottle of sparkling wine (PriceRan), only 8% of the respondents revealed willingness to spend more than €15, while the majority (>70%) said they would spend less than €10 per bottle.
Concerning the socioeconomic variables, 51% of respondents are male; 28.9% of the sample is aged between 55 and 64, followed by 18.4% in the 35 to 44 age group; 83.6% of the sample lives in urban areas (residence); 42% earn between €1,000 and €1,999.99 net income per month. The analysis of the professional status of the interviewees reveals that 58% are employees, 16% are self-employed, 15% are retired, 6% are unemployed, and 5% are students. More than 90% of respondents purchase sparkling wine for celebrations, which is in line with evidence reported in other studies.
Concerning the personal and behavioral variables, 61% of respondents are responsible for purchasing sparkling wine (BuySparkling variable). When asked about their knowledge about sparkling wines (SparklingKnow), 51% of those interviewed assume they know a little about sparkling wines, almost 30% consider themselves to have moderate knowledge, 16.6% are new to this market, and only less than 3% state to know a lot or consider themselves experts in sparkling wine. More than 70% claim to know the traditional/classic production method (ClassTrad), 16.4% know the Charmat method, 14.9% know both methods, and 26.9% do not know either of them. More than 90% buy sparkling wine for celebrations (BuyCeleb), 77.1% for other events or parties (BuyParty), 33.6% to consume during meals (BuyFood), and 28.5% buy sparkling wine as a gift (BuyGift). Around 37% consume sparkling wine between 5 and 10 times/year and 2.4% consume it at least once a week (SparklingCons). Sparkling wine is consumed as an aperitif (32%), in cocktails (45%), during (54%) and after (59%) meals. The majority of sparkling wine consumers in the sample report consuming other alcoholic beverages, such as red (ConsRed, 82%) and white wine (ConsWhite, 85%), beer (ConsBeer, 86%), spirits (ConsSpirits, 60%), sangria (a wine-based drink flavored with fruit and spices originating in Portugal and Spain, ConsSangria 64%), and soft drinks (ConsSoft, 71%). Concerning the wines’ collective reputation, i.e., the region of origin, the choices were Bairrada (35%), Champagne (15.4%), and Távora (14.5%), but 20.6% have no preferred region (NotPDO). Concerning the attributes of the sparkling wine and personal preferences, the individual brand, the importance of awards and being organic are highlighted. In summary, the data collected a heterogeneous e sociodemographic profile of Portuguese sparkling wine consumers, considering their preferences, consumption and purchasing habits, as well as the evaluation of the product's attributes for decision-making.
IV Methods
a Ordered probit model
Taking into account the nature of the dependent variable the price range or the WTP for sparkling wine ordered by classes, an econometric ordered probit model is estimated. Under this model, there is a latent continuous metric underlying the ordinal responses, being the latent continuous dependent variable ${y_i}^*$ determined by a vector of explanatory variables (xi) and a disturbance term (${\varepsilon _i}$). Therefore, the ordered probit regression method allows modelling of the ordered outcome based on $J$ categories (in our case four categories) of WTP, as a linear function of the observed vector of xi. The latent regression is specified as follows:
where i is the $N{\text{th}}$ observation, ${y^*}$ is the unobserved $N \times 1$ dependent variable, ${\beta}^{'}$ is the vector of $K \times 1$ estimated parameters, $x$ ($N \times K$) are the covariates (predictors) assumed to be independent of $\varepsilon $, and $\varepsilon { }\left( {N \times 1} \right)$ is the error term including unobservable factors. The probabilities underlying this model are given by
where Φ(●) stands for the cumulative distribution function, and ${\mu _{\text{J}}},{ }j = 1, \ldots ,K,$ are the unknown threshold parameters, between which the categorical responses are estimated. The model estimation through the likelihood function is based on the implied probabilities.
b Machine learning
In the data-driven approach, the identification and ordering of statistically significant predictors of WTP, among a vast set (pool) of potential explanatory variables, named as features in this context, is a category of supervised ML named classification, in which an algorithm “learns” to classify new observations from examples of input and output labelled/categorical data (Kotsiantis et al., Reference Kotsiantis, Zaharakis and Pintelas2007; Mohamed, Reference Mohamed2017; Nasteski, Reference Nasteski2017; Osisanwo et al., Reference Osisanwo, Akinsola, Awodele, Hinmikaiye, Olakanmi and Akinjobi2017; Singh et al., Reference Singh, Thakur and Sharma2016). This approach includes the use of feature selection or feature ranking algorithms (FRA) followed by the development (training/calibration) of a WTP classification model.
In this study, features were ordered and classification models were trained and tested with (i) Orange, which is an open-source ML and data visualization software, that creates data analysis workflows visually, with a large and diverse toolbox and, (ii) MATLAB Classification Learner (CL), which allows the user to explore the data, select features, train, validate, and tune classification models for binary or multiclass problems using supervised ML statistics and ML toolbox 12.4. The choice to use both software packages depends on the user's specific needs and technical proficiency. Both Orange and MATLAB CL are widely used because they are particularly appealing to users seeking an accessible and intuitive tool for data mining and ML (Ciaburro, Reference Ciaburro2017; Demšar et al., Reference Demšar, Erjavec, Hočevar, Milutinovič, Možina, Toplak, Umek, Zbontar and Zupan2013).
Different FRAs were tested and used to rank the features, namely the minimum redundancy maximum relevance (mRMR), univariate FRA for classification using chi-square tests (${\chi ^2}$), ReliefF algorithm with k nearest neighbors, one-way ANOVA for each predictor variable grouped by class, Kruskal–Wallis test (KW), information gain (InforGain), gain ratio, Gini index, fast correlation based filter. These FRAs support categorical and continuous features and are well described in the literature (e.g., Guyon and Elisseeff, Reference Guyon and Elisseeff2003; Radovic et al., Reference Radovic, Ghalwash, Filipovic and Obradovic2017). Nevertheless, we present a brief description of the FRA in Table B1 (Appendix B). The selected features will be presented and sorted in descending order of scores. For ${\chi ^2}$, ANOVA and KW, the features are ranked using the $p - values$ since scores correspond to $-log\left( p \right)$.
The 42 different classification models used are of nine types: decision trees, discriminant analysis, support vector machines, logistic regression, nearest neighbors, naive bayes, kernel approximation, neural network, and ensemble classifiers. A detailed description, comparison and review of these classifiers are easily found in the MATLAB Help Center and in the literature (e.g., Guyon and Elisseeff, Reference Guyon and Elisseeff2003; Kotsiantis et al., Reference Kotsiantis, Zaharakis and Pintelas2007; Mohamed, Reference Mohamed2017; Nasteski, Reference Nasteski2017; Osisanwo et al., Reference Osisanwo, Akinsola, Awodele, Hinmikaiye, Olakanmi and Akinjobi2017; Radovic et al., Reference Radovic, Ghalwash, Filipovic and Obradovic2017; Singh et al., Reference Singh, Thakur and Sharma2016). However, we present a brief description of these algorithms in Table C1 (Appendix C). The accuracy ratio (AR), defined as the ratio of cases correctly predicted during calibration, is used as a measure of the performance of the classification models calibrated with the features/variables selected by each FRA. The AR of each FRA is the maximum performance achieved by the classification models.
V. Results and discussion
The results of an ordered probit model with ordered price intervals (in euro) as the dependent variable, which is a proxy for WTP for a bottle of a sparkling wine, include the ordered list of the 34 covariates or predictors, their respective regression coefficients and standard errors (Table 1 and Table A2, Appendix A). It is important to highlight that the null hypothesis of errors normally distributed is observed, the value of the likelihood ratio test also confirms that the parameters are globally significant at a 1% level and the ratio of correctly predicted cases (AR) is 45% (Table 1). Based on the decreasing level of significance (1%, 5%, and 10%), the 15 predictors statistically significant of the ordered probit WTP (Table 1) are the Champagne brand, income, importance of awards (ImpDCE2), being the one who buys or purchases the product, the importance given by the respondent to a set of variables related to organic production (ImpDCE6), production method (ImpDCE5), and sweetness (ImpDCE4), to be a consumer of red wine, to buy sparkling wine as a gift, the importance of the brand, gender, local of residence, region of production and not being protected designation of origin (PDO).
AR = accuracy ratio defined as the ratio of cases correctly predicted
For the sake of simplicity, only the results of five FRAs that present the highest AR among the ML algorithms available in CL and Orange (Table 1) will be presented, although it is important to note that the results of different FRAs are very similar. For example, in CL, the results obtained with ${\chi ^2}$, ANOVA and KW are very alike. The features ranked by ANOVA and KW in the first eleventh positions are the same and sorted in the same order. Three of the last four features selected by these two algorithms are also the same, although they are ordered differently. Features selected with ${\chi ^2}$ are also similar and only differ in the order between some pairs of successive features. The other algorithms (ReliefF, mRMR, InforGain) selected only some of the same features, not always in the same order. However, approximately the same set of features tend to be selected for the top seven positions, namely Income, Champagne, NotPDO, ClassTrad, ImpBrand, BuySparkling, and Tavora. The features selected with the other FRAs of Orange are relatively similar. For example, the set of features in the top seven positions is very similar, except for BuySprakling and Tavora, which are only selected once. On the other hand, BuyGift, ImpDCE1, and ImpDCE3 are selected twice by the Orange FRA. The IngorGain and Gini select the same variables but in slightly different order. The other three used FRAs present some similarities in the common features selected.
Regarding the performance of classification models, the AR is slightly higher for the ordered probit model (45%) than for the ML methods, which ranges between 41% and 43%. AR is only slightly higher for models available in Orange than in CL. In the case of CL FRA, the classifiers with the highest performance are the linear discriminant analysis model for ${\chi ^2}$ and KW, weighted KNN (nearest neighbor classifier) for ReliefF, Ensemble Subspace Discriminant (ensemble classifier) for mRMR, Kernel Naïve Bayes for ANOVA. For Orange, the SVM provides the highest AR for ${\chi ^2}$, Neural Networks for InofrGain e Gini, and Naïve Bayes for gain ratio and ReliefF.
Since, for ${\chi ^2}$, ANOVA and KW, the scores correspond to—log(p), it is important to mention that for these FRA the number of features with p-values below 5% is 11 for ANOVA, 12 for ${\chi ^2}$ and 14 for KW. However, classification models were calibrated with several predictors ranging between 12 and 15 with very similar AR. These results suggest that the inclusion of additional, but less important predictors does not lead to better-performing models. Results obtained with ML methods suggest a tendency toward similarity in the main predictors/features selected by the different methods. This finding is likely a consequence of the fact that the initial features (predictors) in the ML approach are the same as the covariates in the probit model, the inclusion of which is supported by previous studies.
In general, the findings highlight the relevance of the same variables, namely income, the Champagne brand, not being PDO, the traditional/classic production method and the importance of the brand. Additionally, the ordered probit stresses the significance of the six variables that express the personal importance of the respondents given to a set of variables related to production region, awards, categories, sweetness, production method, and organic. Although all ML models confirm the importance of income, Champagne and ClasTrad, the evidence reported for the other variables is different, which draws attention to the need to choose the best method based on a given performance measure.
The results are in line with the findings of previous studies. The brand, Champagne appellation and income are strong determinants of Portuguese WTP for sparkling wine (e.g., Pickering et al., Reference Pickering, Duben and Kemp2022). The greatest importance of these three characteristics for modeling WTP was evidenced with the ordered probit model, four of the five MATLAB classifiers and one Orange model, although two other Orange classifiers select champagne and income in the first three positions. Individuals are willing to pay more for big brands, confirming that Champagne has a strong collective reputation as an indicator of status (Combris et al., Reference Combris, Lange and Issanchou2006; Dal Bianco et al., Reference Dal Bianco, Boatto, Trestini and Caracciolo2018; e.g., Lange et al., Reference Lange, Martin, Chabanet, Combris and Issanchou2002; Pickering et al., Reference Pickering, Duben and Kemp2022; Verdonk et al., Reference Verdonk, Wilkinson, Culbert, Ristic, Pearce and Wilkinson2017). Pickering et al. (Reference Pickering, Duben and Kemp2022) compared Champagne and Prosecco wine style labels and found that respondents with higher incomes are willing to pay more for both sparkling wine styles than their counterparts. In the same line, as expected, the WTP for sparkling wine is influenced by the absence of a PDO, suggesting that the terroir collective reputation that comes from the designation of origin, affects the WTP for sparkling wine.
Additionally, there seems to exist some differentiation between male and female consumers, in line with findings from previous studies. Female consumers are found to consume significantly more sparkling wine than men. Women are slightly more willing to pay for sparkling wine than men, which could reflect the perception that sparkling wine is “feminine” or a “women's drink,” possibly due to its connotations of glamor and romanticism (Bruwer and McCutcheon, Reference Bruwer and McCutcheon2017; Stephen Charters, Reference Charters2005). Furthermore, women tend to be the main shoppers in their households, which may also explain differences in the WTP (e.g., Marshall and Anderson, Reference Marshall and Anderson2000). Our results show that being responsible for purchasing wine increases the WTP of sparkling wine because it enhances the involvement with wine, which is related to the amount typically spent on a bottle of wine, as spending consumers are the most involved with wine in general (Thach and Olsen, Reference Thach and Olsen2015).
A relationship is also observed between the consumption of sparkling wine and the consumption habits of other drinks, in the sense that red wine consumers seem to be positively correlated to the WTP of sparkling wine. This relationship may be associated with the strong presence of a traditional consumption model, in which families buy more wine, especially still red wine, to consume with meals (Dal Bianco et al., Reference Dal Bianco, Boatto, Trestini and Caracciolo2018). In this sense, there is a complementary effect between the consumption of red wine with meals and the WTP for sparkling wine for consumption outside meals.
In line with other studies (e.g., Stephen Charters, Reference Charters2005; Steve Charters et al., Reference Charters, Velikova, Ritchie, Fountain, Thach, Dodd, Fish, Herbst and Terblanche2011, for still wine), our results suggest that sparkling wine is perceived as a separate product type from other beverages (e.g. still white wine, beer, soft drinks, spirits, and sangria). One reason for this result may be that households that consume significant quantities of wine tend to purchase cheaper products, and sensitivity to branded prices is not a significant determinant of their WTP for sparkling wines (Dal Bianco et al., Reference Dal Bianco, Boatto, Trestini and Caracciolo2018). As demonstrated previously, the present study also confirms that consumers are usually willing to spend more on sparkling wines purchased for special occasions, such as festive events/seasons and offers/gifts, which attest to the importance of the purchasing context (e.g., Morton et al., Reference Morton, Rivers and Healy2004; Velikova et al., Reference Velikova, Charters, Fountain, Ritchie, Fish and Dodd2016; Verdonk et al., Reference Verdonk, Wilkinson, Culbert, Ristic, Pearce and Wilkinson2017).
Finally, broad consistency was observed for personal preferences underlying the choice in the decision-making process and WTP highlighting the importance of wine cues (e.g., ImpDCE1 to ImpDCE6) (e.g., Ferreira et al., Reference Ferreira, Costa Pinto and Lourenço-Gomes2021); production region (e.g., Verdonk et al., Reference Verdonk, Wilkinson, Culbert, Ristic, Pearce and Wilkinson2017); awards or reputation; categories; sweetness (e.g., Combris et al., Reference Combris, Lange and Issanchou2006; Verdonk et al., Reference Verdonk, Wilkinson, Culbert, Ristic, Pearce and Wilkinson2017); production method (e.g., Culbert et al., Reference Culbert, Ristic, Ovington, Saliba and Wilkinson2017); and being organic (e.g., Schäufele and Hamm, Reference Schäufele and Hamm2017). This outcome reinforces the importance of the characteristics revealed to the consumer (on the bottle and label), to explain differences in the consumers’ price (e.g., Combris et al., Reference Combris, Lange and Issanchou2006; Lecocq and Visser, Reference Lecocq and Visser2006).
VI. Conclusion
This study analyzes consumers’ WTP for sparkling wine in Portugal. Research on this topic has typically been backed by a theoretical model grounded in utility theory, employing econometric estimation methods tailored to the data structure and study objectives. This study deepens the analysis by comparing the results of a traditional model used to understand the determinants of sparkling wine WTP (ordered probit model) of Portuguese consumers, with the evidence produced by recently emerging alternative methods rooted in ML algorithms.
The results suggest that there is no absolute supremacy of any of the approaches in terms of global performance, although the ordered probit model presents a slightly better performance in the accuracy rate of correctly predicted cases. The two approaches tend to select the same main predictors, highlighting the relevance of the income variable, the Champagne brand, not being a PDO and being a red wine consumer as the main predictors of WTP for sparkling wine in Portugal. However, the advances suggested by ML are quite variable depending on the algorithm and platforms used, which draws attention to the need to choose the best method based on another specific performance measure. In this sense, it should be pointed out that ML classification models are characterized by being parameterizable algorithms. Thus, although a large number of methods were used in this study, this number could have been much higher if other options/parameterizations had been chosen. On the other hand, although the use of ML tools does not require prior knowledge of the relationship to be modelled, it benefits from knowledge of the characteristics of each algorithm and the relationships to be modelled.
This paper contributes to consolidating knowledge on the modelling of consumer behavior and provides useful information for wineries’ marketing strategies. Specifically, the results indicate that to increase sparkling wine sales at a higher price, wineries should segment the market according to income, focusing on higher-income niches that are also red wine consumers. At the same time, they should follow the Champagne strategy as a benchmark, create dynamics of collective and individual reputation, reinforce and benefit from the PDO, boost wine routes, carry out visits and tastings in the vineyard and in the cellar, participating in competitions and tastings, and developing cooperative actions with hotels and restaurants that value the sparkling wine. Moreover, since the ordering of WTP predictors changes with the used method, a detailed analysis and weighting of the different ordering, i.e., a quantified sensibility analysis, is recommended for the robustness of the winery’s marketing plan outlined to a target market.
The authors are aware that the used methods and, consequently, the obtained results of this study can be extended, for instance, by applying and integrating the two analytical paradigms in the choice modelling perspective, as highlighted by van Cranenburgh et al. (Reference van Cranenburgh, Wang, Vij, Pereira and Walker2022). Moreover, in this study, the departing features of the ML methods are the same as the covariates (predictors) of the ordered probit model, which is supported by previous studies, remaining the research question as to whether ML methods are not especially suited for unstructured big data with limited knowledge about the influence on WTP. In this way, it is suggested as a research trend to apply ML techniques to data from digital platforms such as Google Trends and Vivino.
Acknowledgments
The authors thank an anonymous reviewer and the editor for insightful and constructive comments.
This work was supported by national funds, by the FCT—Portuguese Foundation for Science and Technology under the project UIDB/04011/2020 (https://doi.org/10.54499/UIDB/04011/2020) and the project UIDB/04033/2020 (https://doi.org/10.54499/UIDB/04033/2020).
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Competing interests
The authors declare they have no competing interests.
Appendix A
*** significant at the 1%
** significant at the 5%
* significant at the 10%
Appendix B
Appendix C
Notes: The results of DT, DA, LRC, and NBC are easy to interpret while KNN, KAC, EC, and NN classifiers are difficult to interpret. The results of SVMs are easy to interpret if linear but hard for all other kernel types; All classification models accept exclusively numerical or categorical predictors and partly numerical and partly categorical predictors, except DA and EC. However, Classification Learner only offers users the models available according to input data type.