Introduction
Feature models of semantic memory assume that concept representation is composed of a set of features or properties (Jackson et al., Reference Jackson, Hoffman, Pobric and Ralph2015; Martin, Reference Martin2016; Soshi et al., Reference Soshi, Fujimaki, Matsumoto and Ihara2017). However, the degree of contribution of these features to the definition of different concepts is not homogeneous. An important number of features constitute the representation of a concept. These features are derived from verbal and nonverbal experience, both individual and culturally mediated, always involved in certain context and circumstances, that the person has had with the object to which the concept refers. Some of these features are closer to the core meaning of concepts because they are more relevant to identify those concepts. These properties are crucial for facilitating communication among individuals within a particular community. For example, when the concept giraffe is mentioned, speakers of the same language typically think of an animal with a long neck. Moreover, research has demonstrated that these features are shared even across different languages (Vivas et al., Reference Vivas, Montefinese, Bolognesi and Vivas2020). Additionally, there are other properties that members of a community may share but are not essential for defining the concept (e.g., eats leaves for giraffe). And there are even other features which are idiosyncratic and specific to an individual’s particular representation. This last group includes creative (e.g., uses a scarf) and bizarre features (e.g., I can climb it to get to the roof), as well as misconceptions (e.g., it’s oviparous). These three types of features can be seen as continuous, ranging from the more central to the more peripheral feature of a concept’s meaning (for further discussion on this topic, see Vivas et al., Reference Vivas, Kogan, Yerro, Romanelli and Vivas2021). The current article will focus on those features closer to the core meaning, which are fundamental for mutual comprehension.
Several measures have been proposed to characterize the importance of a feature for the central meaning of a concept. Cue validity was the first measure assessed when Rosch and Mervis (Reference Rosch and Mervis1975), Rosch (Reference Rosch, Rosch and Lloyd1978) studied the internal structure of categories. Cue validity was proposed and is useful to calculate the conditional probability that an object belongs to a category if it has a particular characteristic. Into a features norm it is calculated as the production frequency of the feature divided by the sum of the production frequencies of that feature for all the concepts in which it appears (McRae et al., Reference McRae, Cree, Seidenberg and McNorgan2005). For example, the feature flies has a production frequency of 25 in the English norms and a total production frequency of 712, so the corresponding value is .035. The highest value would be 1 if the feature is only produced for that concept. More recently, cue validity has been shown to be a valuable indicator in different processes, for instance, in the processing of words and non-words during reading (Tiwari et al., Reference Tiwari, Mishra, Singh, Singh, Tiwari and Singh2020), in changes in attentional selection (Lou et al., Reference Lou, Lorist and Pilz2022), and to predict the impact of the emotional content of images (Denefrio et al., Reference Denefrio, Simmons, Jha and Dennis-Tiwary2017).
Lately, attention has focused on feature distinctiveness, those characteristics that are unique to a particular concept. Feature distinctiveness has been proposed as a good measure to examine category-specific semantic deficits (Devlin et al., Reference Devlin, Gonnerman, Andersen and Seidenberg1998), to characterize prototypicality (Garrard et al., Reference Garrard, Lambon Ralph, Hodges and Patterson2001), and to study its influence on language acquisition, visual lexical decision, and semantic decision (Siew, Reference Siew2020; Reference Siew2021). It is calculated as “the inverse of the number of concepts in which the feature appears in the norms” (McRae et al., Reference McRae, Cree, Seidenberg and McNorgan2005, p.552). For example, the feature flies appears in 46 concepts in the English norms, so the calculus would be 1/46 obtaining a value of .022 of distinctiveness. Here again a value of 1 would mean that the feature is highly distinctive for that concept, as it does not appear in other concepts. This concept was also central to theories explaining category-specific semantic deficits in neurological patients. For instance, Conceptual Structure Account (CSA) (Taylor et al., Reference Taylor, Moss and Tyler2007) proposes that living and nonliving things differ in the degree to which their distinctive and shared features are correlated with other features of the concept. Distinctive features of living things tend to be weakly related to other features of the concept (e.g., has stripes), making them more vulnerable to the effects of brain damage. In contrast, distinctive features of nonliving things tend to be highly correlated with other important features (e.g., used for cutting and has a handle). This idea was further discussed in a paper by Duarte and colleagues (Duarte et al., Reference Duarte, Marquié, Marquié, Terrier and Ousset2009), who studied this hypothesis in Alzheimer’s disease (AD). They observed that distinctive features appear to be affected regardless of the domain (living and nonliving things). This same result was obtained by Catricalà and colleagues (Catricalà et al., Reference Catricalà, Della Rosa, Plebani, Perani, Garrard and Cappa2015). Additionally, Duarte et al. (Reference Duarte, Marquié, Marquié, Terrier and Ousset2009) noted that in moderate stages of the disease, distinctive features of living things tend to be more affected.
More recently, relevance was proposed as a more elaborated measure combining dominance (i.e., production frequency in feature production norms) with distinctiveness (Sartori & Lombardi, Reference Sartori and Lombardi2004). According to these authors, the former is considered the local component, indicating the importance of the feature for the concept, while the latter, known as the global component, constitutes an index of the importance of that feature for the rest of the concepts included in the norms’ database. As an example, the feature makes honey has a production frequency (dominance) of 27 in the Spanish norms and a distinctiveness value of 1 for the concept bee leading to a value of 233.384 according to the relevance formulae. Mechelli et al. (Reference Mechelli, Sartori, Orlandi and Price2006) suggest that semantic relevance explains category effects in medial fusiform gyri. Additionally, this measure has proven to be the best predictor of name retrieval accuracy in a naming-to-description task. Sartori et al. (Reference Sartori, Lombardi and Mattiuzzi2005) showed that, for both normal participants and those with AD with semantic impairment, the semantic relevance of the concept description predicts response accuracy in name retrieval better than distinctiveness and dominance of the same description. However, it is important to note that semantic significance had not been defined at that time and, as we will see later, this measure adds a small adjustment to semantic relevance. More recently, evidence has shown that damage to relevant features in AD and the semantic variant of Primary Progressive Aphasia tends to be highly correlated with failures on picture naming tasks (Catricalà et al., Reference Catricalà, Della Rosa, Plebani, Perani, Garrard and Cappa2015).
The newest variable proposed to measure feature importance is significance. This metric was developed by Montefinese and colleagues (Montefinese et al., Reference Montefinese, Ambrosini, Fairfield and Mammarella2013) and posteriorly, Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2014) tested the predictive value with speeded verification tasks. It constitutes a combination of accessibility (dominance plus order of production in feature norms) and distinctiveness. As can be seen, this parameter adds the order of production to the values already included in relevance (dominance and distinctiveness). The order of production provides additional and pertinent information regarding the weight of the feature for that concept in a particular language community. For example, for the concept eagle the feature flies tends to be produced earlier, while lays eggs generally appears in the fourth or fifth position in the participant’s production in feature norms. The above-mentioned authors have provided evidence that significance is a good predictor of subjects’ verification latency in a feature verification task (i.e., subjects have to judge whether a particular feature is related to a certain concept).
Although some of these measures are relatively stable across languages (Vivas et al., Reference Vivas, Kogan, Romanelli, Lizarralde and Corda2020) differences can also be observed in several concepts (Perniss et al., Reference Perniss, Vinson, Seifart and Vigliocco2012; Vivas et al., Reference Vivas, Kogan, Romanelli, Lizarralde and Corda2020). For example, the concept turkey (“pavo” in Spanish) has a very different meaning for an Argentinean individual compared to an American English individual, as for the latter it is related to Thanks giving Day. This is the reason why local norms must be developed and, particularly, this is the main aim of the current paper: to present values of semantic significance for the Spanish-speaking population. Besides, in order to compare semantic significance in relation to alternative measures of feature importance, we analyzed its effect on participant´s response speed in two feature verification tasks (feature-concept and concept-feature) and compared its effect with the current gold standard measure, relevance. In the third place, a comparison was performed with the Italian significance values.
Semantic significance calculation
To achieve the main aim of the current paper, the values of semantic significance for the entire database of 400 concepts from the Argentinean Spanish feature production norms (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) comprising 3071 features were calculated (see Supplementary Material). The formula employed here is quite similar to the one used by Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2014, p. 358). The main difference lies in the calculus of accessibility. While they calculated the centile order of production plus dominance (see Montefinese et al., Reference Montefinese, Ambrosini, Fairfield and Mammarella2013, p.445), our calculus included a measure called relative weight (RW). To obtain that measure, the following procedure is performed: each participant (which will be labeled with subscript j) has produced a list of nj features to describe the concept. Each of these features is given a weight determined by its position in the vector of features, divided by the length of the vector. Hence, the first feature will have a weight equal to 1 (i.e., nj/nj), the second one (nj-1)/nj and the i-th feature, starting from the one with weight 1, will have a weight of (nj-(i-1))/nj. After this process, for each feature that was produced by more than one participant, a sum is performed over the respective weights and a new vector is thus formed (again, ordered from higher to lower values). The resulting vector is then normalized with respect to the standard Euclidean norm, obtaining a new vector whose entries are real numbers between 0 and 1. This procedure is performed by the Definition Finder software (Vivas et al., Reference Vivas, Lizarralde, Huapaya, Vivas and Comesaña2014), which can be downloaded from the OSF.
The final formula used to calculate significance is the following:
where RW refers to relative weight, LN to natural logarithm, and CPF to the number of concepts in which the feature appears. Four hundred refers to the full set of concepts available in the norms. The values of relevance and distinctiveness were extracted from the Spanish semantic feature production norms both for young (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) and older adults (Vivas et al., Reference Vivas, Yerro, Romanelli, García Coni, Comesaña, Lizarralde, Passoni and Vivas2022) following the formulae of Sartori et al. (Reference Sartori, Lombardi and Mattiuzzi2005) and McRae et al. (Reference McRae, Cree, Seidenberg and McNorgan2005), respectively. We also included other salience measures already published in Vivas, J. et al. (Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) and Vivas, L. et al. (Reference Vivas, Yerro, Romanelli, García Coni, Comesaña, Lizarralde, Passoni and Vivas2022) (see Supplementary Material). It is worth noting that, as we already had data from Spanish semantic feature production norms for both young and older adults, we decided to calculate semantic significance for both groups. We believe this could offer a valuable resource for researchers requiring centrality measures for the elderly too, given the documented differences in semantic memory between both populations (Mirasso et al, Reference Mirasso, Inveninato, Savastano and Vivas2022; White et al., Reference White, Storms, Malt and Verheyen2018; Yoon et al., Reference Yoon, Feinberg, Hu, Gutchess, Hedden, Chen and Park2004). However, the speeded verification task which will be elaborated on in the following paragraphs was exclusively performed with young adults due to their greater availability for data collection.
Comparison with semantic relevance
To achieve the secondary aim of the current paper, we studied the potential of the significance measure to predict response times through a feature verification experiment. We compared this new measure to another well-established measure: semantic relevance (Sartori & Lombardi, Reference Sartori and Lombardi2004). These experiments were carried out solely with younger adults.
The complete experimental procedure consists of two experiments consisting of speeded feature verification tasks. In Experiment 1, participants performed the task with the presentation of a feature followed by a concept (feature-concept condition). In Experiment 2, the order was reversed, with the concept followed by a feature (concept-feature condition). These two alternatives were proposed because it is likely that the activation spreading differs when the task starts with a feature (e.g., “flies,” which can refer to either a living or a nonliving object) compared to when it starts with a concept (e.g., plane) (Ramscar et al., Reference Ramscar, Yarlett, Dye, Denny and Thorpe2010; Ursino et al., Reference Ursino, Cuppini and Magosso2013).
Experiment 1
Method
Participants
One hundred and twenty participants (96 females and 24 males; mean age = 24.7 years; SD = 5.22) took part in Experiment 1. They were all undergraduate students at the National University of Mar del Plata or young professionals. All subjects reported having normal or corrected-to-normal vision. Eighty-five percent of them reported being right-handed. The presence of neurological or psychiatric diseases was an exclusion criterion. Moreover, they provided written informed consent before participating in the study. This study adhered to the Helsinki principles (World Medical Association, 2013) and was approved by the Ethical Committee of the National University of Mar del Plata.
Materials
Stimuli and semantic measures were selected from the Spanish semantic feature production norms (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017). From the full set of 400 concepts, we selected a subset of 130 according to the following criteria: a) the concept was not a compound word; b) it was not polysemic; c) it belonged to a well-defined semantic category (i.e., easily identified by most people); d) the categories had more than four exemplars; and finally e) they were included in McRae et al.’s (Reference McRae, Cree, Seidenberg and McNorgan2005) set of concepts (just in case some author would be interested in future comparisons).
The stimuli were the same for both experiments and consisted of 130 concrete concepts belonging to 11 categories (animals, tools, fruits, musical instruments, furniture, buildings, clothing, vegetables, utensils, accessories, and vehicles). They were all concrete (mean concreteness = 4.78, min = 4.39, max = 4.97 according to the Argentinean norms by Manoiloff and colleagues (Manoiloff et al., Reference Manoiloff, Artstein, Canavoso, Fernández and Segui2010). The mean familiarity in a scale from 1 to 5 was 2.79 (our data ranged from 1.31 to 4.91) (Manoiloff et al., Reference Manoiloff, Artstein, Canavoso, Fernández and Segui2010) and the mean subjective lexical frequency on a scale from 1 to 5 was 2.68 (our data ranged from 1.08 to 5) (Martínez-Cuitiño, Reference Martínez-Cuitiño, Barreyro, Wilson and Jaichenco2015). Each of the 130 concepts was paired with four features: one core feature, one partially shared feature, one idiosyncratic feature, and a non-related feature. The procedure to operationalize these levels can be seen in Vivas et al (Reference Vivas, Kogan, Yerro, Romanelli and Vivas2021). So, 520 combinations were obtained. Four lists were elaborated in order to provide participants trials with an acceptable number of items. In order to do that concept-feature pairs were distributed taking into account that each concept appeared only once in each list and distributing equivalently the kind of features (core, partially shared, idiosyncratic, and non-related). Each contained 120 experimental concept-feature pairs plus 10 concept-feature pairs for the training session. The final lists can be seen in OSF.
Procedure
Subjects were placed at 60 cm in front of an LCD 17 computer screen (1280 × 960 pixels). Tasks were displayed using E-prime 2.0. Subjects were instructed to press the right button if the feature was reasonably true for that concept and the left button if it was not. Responses were considered correct if the subject responded “true” for those related to the concept. Filler trials (i.e., unrelated features) were discarded.
Trials began with a set of 10 training items to ensure that the participants were familiarized with the task, followed by the 120 experimental items. Each trial consisted of a feature presented for 2000 ms, followed by a fixation cross (inter-stimulus interval—ISI) for 500 ms, and then a concept that remained on the screen for 4000 ms or until the participant’s response. The inter-trial interval (ITI) was 1500 ms (see Figure 1). Features were presented in black, and concepts were presented in blue. Both were written in lowercase fonts (Verdana, 24 pt), on a white background.
Statistical analysis
The following analyses were performed using R version 4.4.0 (R core team, 2021). Reaction times (RTs) greater than 3000 ms and lower than 500 ms were removed as well as erroneous responses (14.08 % for Experiment 1). The number of data points considered for Experiment 1 (feature-concept) was 6186. Each numeric variable was centered and scaled, except for RTs. Following Baayen and Milin (Reference Baayen and Milin2010), RTs were log-transformed (Ln_RT) to meet normality assumptions (Z = 1.057; p = .214 for Experiment 1).
As the data from the experiments violate the independence assumption due to both participants and concepts grouping the data points, simple regressions could not be applied. Therefore, the next step was to perform independent linear mixed model analyses for each measure to identify the most explanatory variable, considering Ln_RT as the dependent variable. As the correlation between significance and relevance is r=0.93, independent analyses were performed for significance and relevance to avoid collinearity. The formula used for each linear mixed model was
where “VAR” is replaced with Significance or Relevance, according to their specific model. Subjects and Concepts were considered as random factors, and their intercepts were taken into account by the models. To compare the explanatory power of relevance and significance in RTs between models, the Akaike information criterion (AIC) was used.
Results
The results for both models (relevance and significance) for Experiment 1 (feature-concept) are shown in Table 1. Both models show statistically significant fixed effects (intercept and significance/relevance) based on the t-values. The significance model’s intercept has an estimate of 2.9832, and the significance term has an estimate of −0.0037. The relevance model’s intercept has an estimate of 2.9897, and the relevance term has an estimate of −0.0006. While the significance and relevance terms are statistically significant, the R-squared values imply that other factors not included in the models may be contributing to the variability in the outcome variable. In this case, individual RTs and complex feature-concept pairs may be contributing with added variability into the results.
To assess the difference in predictive power between significance and relevance, model comparisons were made using the AIC. Table 2 contains the comparative values of the model. While the AIC criterion might favor the significance model, the R-squared values from Table 1 suggest that both models explain a relatively small proportion of the variance. Therefore, the differences between models are not big enough to evaluate one predictive variable as better than the other. This suggests that both significance and relevance have similar predictive power in explaining RTs in Experiment 1.
Discussion of experiment 1
Results indicated that both measures have the same predictive power for the feature-concept condition. Our results are in line with those obtained by Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2013) in that both studies observed an effect of semantic significance in the feature verification task. However, they observed a superiority effect of significance compared to relevance, which was not observed in our experiment. Methodological differences in the calculation of significance between the two studies may explain this discrepancy, and these differences will be discussed in detail in the general discussion.
Experiment 2
Method
Participants
A group of 120 people participated in Experiment 2 (87 females and 33 males; mean age = 24.43 years; SD = 5.25 years). They were all undergraduate students at the [removed for review] or young professionals. All subjects reported having normal or corrected-to-normal vision. The presence of neurological or psychiatric diseases was an exclusion criterion. Moreover, they provided written informed consent before participating in the study. This study followed Helsinki principles (World Medical Association, 2013).
Materials
The materials for Experiment 2 were identical to those used in Experiment 1, but in reversed order.
Procedure
The procedure for Experiment 2 was identical to Experiment 1, except that the concept was presented prior to the feature description.
Statistical analysis
The statistical analysis for Experiment 2 was the same as Experiment 1. For Experiment 2, 8.58 % of the responses were removed for the reasons mentioned above. The number of data points considered for Experiment 2 (Concept-Feature) was 6582.
Each numeric variable was centered and scaled, except for RTs. Following Baayen and Milin (Reference Baayen and Milin2010), RTs were log-transformed (Ln_RT) to meet normality assumptions (Z = 0.696; p = .718 for Experiment 2).
Results
Table 3 contains the model estimates for both variables, relevance and significance, for Experiment 2 (concept-feature). Both the significance model and relevance model show statistically significant results for the intercept and the concept-feature interaction term (significance/relevance) based on their t-values. In the significance model, the intercept has an estimated value of 3.0421 with a standard error of 0.0105. The significance term itself has an estimated value of −0.0030 and a standard error of 0.0002. Similarly, the relevance model shows a significant intercept (estimate: 3.0400, standard error: 0.0001) and a significant relevance term (estimate: −0.0005, standard error: 0.0000). However, the conditional R-squared values for both models are low (0.03 for significance model and 0.02 for relevance model), indicating that these models explain a relatively small portion of the outcome variable’s variance.
To assess the difference in predictive power between significance and relevance, model comparisons were made using the AIC as in Experiment 1. Table 4 contains the comparative values of the model. The differences between models are not large enough to evaluate one predictive variable as better than the other. This indicates that both significance and relevance demonstrate comparable predictive power in explaining RTs in Experiment 2 (concept-feature). While the AIC criterion seems to show that the significance model has a better fit, the R2 (as shown in Table 3) may be implying that this model is only marginally better than the relevance model.
Discussion of experiment 2
In Experiment 2, we once again observed the same effect as in Experiment 1. Both measures, significance and relevance, demonstrated an effect on the concept feature condition of the feature verification task; however, we did not observe a superiority of semantic significance. Combining frequency and order of production for each participant and in a population requires additional work on the raw data. However, the order of elicitation of a trait represents, in some way, the connectivity that exists between a concept and said reference trait. For this reason, we expect significance to be a better variable to evaluate the effect of feature saliency in different tasks. However, our findings showed that significance did not prove to be a better predictor of feature verification than relevance. Perhaps, surely, priming does not work in a similar way in production tasks and in property verification tasks, and this processing system is responsible for not allowing us to see, in this task, the predictive capacity of significance, which it undoubtedly exists at production time. Cognitive and neuropsychological differences have been pointed out in the distinction between identification and production tasks as part of implicit memory processes (Prull & Spataro, Reference Prull and Spataro2017); however, it is a topic that has not yet been clearly elucidated.
Comparison with Italian measures
Finally, to validate the significance values for the Spanish norms, a correlation analysis comparing these values to those for the Italian norms (Montefinese et al., Reference Montefinese, Ambrosini, Fairfield and Mammarella2014) was carried out. To this end, common concepts between the norms were identified (N = 65), and a total of 245 shared features were selected. The results showed a strong positive correlation between them [r = .592; p < .001]. Descriptive values for Spanish and Italian norms are presented in Table 5.
General discussion
One of the main contributions of the current study is to provide values of semantic significance for the Spanish-speaking population of both young and older adults, for the set of 400 concrete concepts included in the Spanish feature production norms for both young (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) and older adults (Vivas et al., Reference Vivas, Yerro, Romanelli, García Coni, Comesaña, Lizarralde, Passoni and Vivas2022). This provides the scientific community with new empirically obtained measures of feature importance.
As regards the feature-concept experiments, the analyses indicated that both variables, relevance and significance, explained RTs to the same extent. These results indicate that semantic significance is a good predictor of response latencies in the feature verification task. However, contrary to the findings of Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2014), we did not observe a superiority of this measure over relevance, considering the delta R2 between both models in both experiments. Similarly, in the concept-feature experiment, similar results were obtained, indicating that significance is a good predictor of RT but not superior to relevance. One explanation for these results could be the differences between semantic significance calculations, as we employed a different formula from Montefinese. Additionally, semantic significance was calculated based on the total number of concepts in each case. It is relevant to highlight that Montefinese’s norms included 120 concepts, while Vivas’ norms included 400. Therefore, it is likely that the significance values will differ to some extent. Also, the variables included in the mixed effects models in each study were not exactly the same, adding another potential source of difference.
Besides, the significance values for the Spanish norms were found to be equivalent to those reported for the Italian population, showing significant correlations between shared features. Additionally, descriptive values, including mean, standard deviation, minimum, maximum, and extreme percentile values, were similar for both norms. It is worth observing here that both norms comprised a different number of concepts: 400 for Spanish and 120 for Italian. Moreover, both studies differed in the number of participants per concept. In this sense, the observed correlations might have been stronger in the absence of these differences.
Finally, some limitations should be acknowledged. First, this experiment was not originally designed to measure the effect of significance but rather to identify differences in response latencies based on the continuous degrees of meaning representation. Therefore, the presence of different degrees of strength in the link between feature and concept (e.g., idiosyncratic features with a very weak relation to the concept) may have delayed response times because the decision requires additional effort. Future research should address this issue by proposing feature-concept pairs that elicit univocal responses, such as true and false features that vary in terms of their degree of significance.
Secondly, it is relevant to acknowledge that significance is a relative measure of feature salience, similar to many other variables derived from feature norms. It is calculated considering the number of concepts included within those norms (in our case, 400), so it is relative to this specific corpus of concepts. Indeed, the degree of distinctiveness of a feature, which is included in the calculus of significance, may vary if the set of concepts is enlarged. It should be noted that semantic feature norms only include a subset of all possible concepts found in a language. Therefore, adding more concepts to the norms will modify the values of these measures. For example, within the category of tools, if the majority of exemplars are cutting tools, adding other types of tools to the norms will likely affect the values of cutting for the different tools.
Conclusion
In the current study, we extend the use of a new measure of feature salience, semantic significance, originally proposed for the Italian population, to the Spanish-speaking population. Unlike previous measures such as production frequency, distinctiveness, and order of production, this measure incorporates additional information to calculate the importance of a feature. Therefore, it is expected to be a better variable for assessing the effect of feature salience in different tasks.
However, our findings indicate that significance did not prove to be a better predictor of feature verification than relevance. Further experiments should be conducted to validate its explanatory capacity in different tasks.
Replication package
Replication data and materials for this article can be found at https://osf.io/kyjb3/.