Semantic feature salience matters: new values of significance and relative weight for the Spanish feature norms for young and older adults

Leticia Vivas; Matías Yerro Avincetto; Sofía Romanelli; Francisco Lizarralde; Jorge Ricardo Vivas

doi:10.1017/S014271642400033X

Semantic feature salience matters: new values of significance and relative weight for the Spanish feature norms for young and older adults

Published online by Cambridge University Press: 21 October 2024

Leticia Vivas

Matías Yerro Avincetto

and

Leticia Vivas*: Affiliation:
IPSIBAT (UNMDP-CONICET), Facultad de Psicología, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Matías Yerro Avincetto: Affiliation:
IPSIBAT (UNMDP-CONICET), Facultad de Psicología, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina Facultad de Ingeniería, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Sofía Romanelli: Affiliation:
Instituto de Humanidades y Ciencias Sociales (INHUS, UNMDP/CONICET), Facultad de Humanidades, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Francisco Lizarralde: Affiliation:
Facultad de Ingeniería, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Jorge Ricardo Vivas: Affiliation:
IPSIBAT (UNMDP-CONICET), Facultad de Psicología, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
*: Corresponding author: Leticia Vivas; Email: [email protected]

Article contents

Abstract
Introduction
Semantic significance calculation
Experiment 1
Experiment 2
General discussion
Conclusion
Replication package
References

Rights & Permissions

Abstract

To determine the semantic importance of certain features has been a topic of interest for many semantic memory models. Diverse measures have been proposed in the last years. Semantic significance, the latest measure proposed by Montefinese and colleagues, is both sophisticated and comprehensive. Given the cultural and linguistic variability of semantic measures, this study presents values of significance and relative weight for the Spanish-speaking population corresponding to 400 concrete concepts. First, we presented data for both young and older adults. Second, we assessed the effect of significance on response times in two speeded feature verification tasks. Third, we compared significance values with the existing Italian significance norms. To evaluate the effect of significance to predict response times, two speeded verification tasks (Experiment 1 and 2) were carried out, selecting a total of a 130 concepts for analysis. In Experiment 1, subjects were presented with a feature followed by a concept, while in Experiment 2, the order of stimulus presentation was reversed (i.e., the concept was presented before the feature). An independent linear mixed model showed that significance was a good predictor of response latencies in Experiments 1 and 2. Moreover, results revealed a strong positive correlation between the Spanish and Italian significance values. Findings are discussed in terms of recent theories of semantic cognition.

Keywords

Relative weight semantic features semantic significance Spanish norms

Type: Original Article
Information: Applied Psycholinguistics , Volume 45 , Issue 5 , September 2024 , pp. 920 - 933

DOI: https://doi.org/10.1017/S014271642400033X [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Introduction

Feature models of semantic memory assume that concept representation is composed of a set of features or properties (Jackson et al., Reference Jackson, Hoffman, Pobric and Ralph2015; Martin, Reference Martin2016; Soshi et al., Reference Soshi, Fujimaki, Matsumoto and Ihara2017). However, the degree of contribution of these features to the definition of different concepts is not homogeneous. An important number of features constitute the representation of a concept. These features are derived from verbal and nonverbal experience, both individual and culturally mediated, always involved in certain context and circumstances, that the person has had with the object to which the concept refers. Some of these features are closer to the core meaning of concepts because they are more relevant to identify those concepts. These properties are crucial for facilitating communication among individuals within a particular community. For example, when the concept giraffe is mentioned, speakers of the same language typically think of an animal with a long neck. Moreover, research has demonstrated that these features are shared even across different languages (Vivas et al., Reference Vivas, Montefinese, Bolognesi and Vivas2020). Additionally, there are other properties that members of a community may share but are not essential for defining the concept (e.g., eats leaves for giraffe). And there are even other features which are idiosyncratic and specific to an individual’s particular representation. This last group includes creative (e.g., uses a scarf) and bizarre features (e.g., I can climb it to get to the roof), as well as misconceptions (e.g., it’s oviparous). These three types of features can be seen as continuous, ranging from the more central to the more peripheral feature of a concept’s meaning (for further discussion on this topic, see Vivas et al., Reference Vivas, Kogan, Yerro, Romanelli and Vivas2021). The current article will focus on those features closer to the core meaning, which are fundamental for mutual comprehension.

Several measures have been proposed to characterize the importance of a feature for the central meaning of a concept. Cue validity was the first measure assessed when Rosch and Mervis (Reference Rosch and Mervis1975), Rosch (Reference Rosch, Rosch and Lloyd1978) studied the internal structure of categories. Cue validity was proposed and is useful to calculate the conditional probability that an object belongs to a category if it has a particular characteristic. Into a features norm it is calculated as the production frequency of the feature divided by the sum of the production frequencies of that feature for all the concepts in which it appears (McRae et al., Reference McRae, Cree, Seidenberg and McNorgan2005). For example, the feature flies has a production frequency of 25 in the English norms and a total production frequency of 712, so the corresponding value is .035. The highest value would be 1 if the feature is only produced for that concept. More recently, cue validity has been shown to be a valuable indicator in different processes, for instance, in the processing of words and non-words during reading (Tiwari et al., Reference Tiwari, Mishra, Singh, Singh, Tiwari and Singh2020), in changes in attentional selection (Lou et al., Reference Lou, Lorist and Pilz2022), and to predict the impact of the emotional content of images (Denefrio et al., Reference Denefrio, Simmons, Jha and Dennis-Tiwary2017).

Lately, attention has focused on feature distinctiveness, those characteristics that are unique to a particular concept. Feature distinctiveness has been proposed as a good measure to examine category-specific semantic deficits (Devlin et al., Reference Devlin, Gonnerman, Andersen and Seidenberg1998), to characterize prototypicality (Garrard et al., Reference Garrard, Lambon Ralph, Hodges and Patterson2001), and to study its influence on language acquisition, visual lexical decision, and semantic decision (Siew, Reference Siew2020; Reference Siew2021). It is calculated as “the inverse of the number of concepts in which the feature appears in the norms” (McRae et al., Reference McRae, Cree, Seidenberg and McNorgan2005, p.552). For example, the feature flies appears in 46 concepts in the English norms, so the calculus would be 1/46 obtaining a value of .022 of distinctiveness. Here again a value of 1 would mean that the feature is highly distinctive for that concept, as it does not appear in other concepts. This concept was also central to theories explaining category-specific semantic deficits in neurological patients. For instance, Conceptual Structure Account (CSA) (Taylor et al., Reference Taylor, Moss and Tyler2007) proposes that living and nonliving things differ in the degree to which their distinctive and shared features are correlated with other features of the concept. Distinctive features of living things tend to be weakly related to other features of the concept (e.g., has stripes), making them more vulnerable to the effects of brain damage. In contrast, distinctive features of nonliving things tend to be highly correlated with other important features (e.g., used for cutting and has a handle). This idea was further discussed in a paper by Duarte and colleagues (Duarte et al., Reference Duarte, Marquié, Marquié, Terrier and Ousset2009), who studied this hypothesis in Alzheimer’s disease (AD). They observed that distinctive features appear to be affected regardless of the domain (living and nonliving things). This same result was obtained by Catricalà and colleagues (Catricalà et al., Reference Catricalà, Della Rosa, Plebani, Perani, Garrard and Cappa2015). Additionally, Duarte et al. (Reference Duarte, Marquié, Marquié, Terrier and Ousset2009) noted that in moderate stages of the disease, distinctive features of living things tend to be more affected.

More recently, relevance was proposed as a more elaborated measure combining dominance (i.e., production frequency in feature production norms) with distinctiveness (Sartori & Lombardi, Reference Sartori and Lombardi2004). According to these authors, the former is considered the local component, indicating the importance of the feature for the concept, while the latter, known as the global component, constitutes an index of the importance of that feature for the rest of the concepts included in the norms’ database. As an example, the feature makes honey has a production frequency (dominance) of 27 in the Spanish norms and a distinctiveness value of 1 for the concept bee leading to a value of 233.384 according to the relevance formulae. Mechelli et al. (Reference Mechelli, Sartori, Orlandi and Price2006) suggest that semantic relevance explains category effects in medial fusiform gyri. Additionally, this measure has proven to be the best predictor of name retrieval accuracy in a naming-to-description task. Sartori et al. (Reference Sartori, Lombardi and Mattiuzzi2005) showed that, for both normal participants and those with AD with semantic impairment, the semantic relevance of the concept description predicts response accuracy in name retrieval better than distinctiveness and dominance of the same description. However, it is important to note that semantic significance had not been defined at that time and, as we will see later, this measure adds a small adjustment to semantic relevance. More recently, evidence has shown that damage to relevant features in AD and the semantic variant of Primary Progressive Aphasia tends to be highly correlated with failures on picture naming tasks (Catricalà et al., Reference Catricalà, Della Rosa, Plebani, Perani, Garrard and Cappa2015).

The newest variable proposed to measure feature importance is significance. This metric was developed by Montefinese and colleagues (Montefinese et al., Reference Montefinese, Ambrosini, Fairfield and Mammarella2013) and posteriorly, Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2014) tested the predictive value with speeded verification tasks. It constitutes a combination of accessibility (dominance plus order of production in feature norms) and distinctiveness. As can be seen, this parameter adds the order of production to the values already included in relevance (dominance and distinctiveness). The order of production provides additional and pertinent information regarding the weight of the feature for that concept in a particular language community. For example, for the concept eagle the feature flies tends to be produced earlier, while lays eggs generally appears in the fourth or fifth position in the participant’s production in feature norms. The above-mentioned authors have provided evidence that significance is a good predictor of subjects’ verification latency in a feature verification task (i.e., subjects have to judge whether a particular feature is related to a certain concept).

Although some of these measures are relatively stable across languages (Vivas et al., Reference Vivas, Kogan, Romanelli, Lizarralde and Corda2020) differences can also be observed in several concepts (Perniss et al., Reference Perniss, Vinson, Seifart and Vigliocco2012; Vivas et al., Reference Vivas, Kogan, Romanelli, Lizarralde and Corda2020). For example, the concept turkey (“pavo” in Spanish) has a very different meaning for an Argentinean individual compared to an American English individual, as for the latter it is related to Thanks giving Day. This is the reason why local norms must be developed and, particularly, this is the main aim of the current paper: to present values of semantic significance for the Spanish-speaking population. Besides, in order to compare semantic significance in relation to alternative measures of feature importance, we analyzed its effect on participant´s response speed in two feature verification tasks (feature-concept and concept-feature) and compared its effect with the current gold standard measure, relevance. In the third place, a comparison was performed with the Italian significance values.

Semantic significance calculation

To achieve the main aim of the current paper, the values of semantic significance for the entire database of 400 concepts from the Argentinean Spanish feature production norms (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) comprising 3071 features were calculated (see Supplementary Material). The formula employed here is quite similar to the one used by Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2014, p. 358). The main difference lies in the calculus of accessibility. While they calculated the centile order of production plus dominance (see Montefinese et al., Reference Montefinese, Ambrosini, Fairfield and Mammarella2013, p.445), our calculus included a measure called relative weight (RW). To obtain that measure, the following procedure is performed: each participant (which will be labeled with subscript j) has produced a list of n_j features to describe the concept. Each of these features is given a weight determined by its position in the vector of features, divided by the length of the vector. Hence, the first feature will have a weight equal to 1 (i.e., n_j/n_j), the second one (n_j-1)/n_j and the i-th feature, starting from the one with weight 1, will have a weight of (n_j-(i-1))/n_j. After this process, for each feature that was produced by more than one participant, a sum is performed over the respective weights and a new vector is thus formed (again, ordered from higher to lower values). The resulting vector is then normalized with respect to the standard Euclidean norm, obtaining a new vector whose entries are real numbers between 0 and 1. This procedure is performed by the Definition Finder software (Vivas et al., Reference Vivas, Lizarralde, Huapaya, Vivas and Comesaña2014), which can be downloaded from the OSF.

The final formula used to calculate significance is the following:

$$\rm {Sig{\rm{ }} = {\rm{ }}10{\rm{ }}^*RW^* LN\left( {400/CPF} \right)}$$

where RW refers to relative weight, LN to natural logarithm, and CPF to the number of concepts in which the feature appears. Four hundred refers to the full set of concepts available in the norms. The values of relevance and distinctiveness were extracted from the Spanish semantic feature production norms both for young (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) and older adults (Vivas et al., Reference Vivas, Yerro, Romanelli, García Coni, Comesaña, Lizarralde, Passoni and Vivas2022) following the formulae of Sartori et al. (Reference Sartori, Lombardi and Mattiuzzi2005) and McRae et al. (Reference McRae, Cree, Seidenberg and McNorgan2005), respectively. We also included other salience measures already published in Vivas, J. et al. (Reference Vivas, Vivas, Comesaña, Coni and Vorano2017) and Vivas, L. et al. (Reference Vivas, Yerro, Romanelli, García Coni, Comesaña, Lizarralde, Passoni and Vivas2022) (see Supplementary Material). It is worth noting that, as we already had data from Spanish semantic feature production norms for both young and older adults, we decided to calculate semantic significance for both groups. We believe this could offer a valuable resource for researchers requiring centrality measures for the elderly too, given the documented differences in semantic memory between both populations (Mirasso et al, Reference Mirasso, Inveninato, Savastano and Vivas2022; White et al., Reference White, Storms, Malt and Verheyen2018; Yoon et al., Reference Yoon, Feinberg, Hu, Gutchess, Hedden, Chen and Park2004). However, the speeded verification task which will be elaborated on in the following paragraphs was exclusively performed with young adults due to their greater availability for data collection.

Comparison with semantic relevance

To achieve the secondary aim of the current paper, we studied the potential of the significance measure to predict response times through a feature verification experiment. We compared this new measure to another well-established measure: semantic relevance (Sartori & Lombardi, Reference Sartori and Lombardi2004). These experiments were carried out solely with younger adults.

The complete experimental procedure consists of two experiments consisting of speeded feature verification tasks. In Experiment 1, participants performed the task with the presentation of a feature followed by a concept (feature-concept condition). In Experiment 2, the order was reversed, with the concept followed by a feature (concept-feature condition). These two alternatives were proposed because it is likely that the activation spreading differs when the task starts with a feature (e.g., “flies,” which can refer to either a living or a nonliving object) compared to when it starts with a concept (e.g., plane) (Ramscar et al., Reference Ramscar, Yarlett, Dye, Denny and Thorpe2010; Ursino et al., Reference Ursino, Cuppini and Magosso2013).

Experiment 1

Method

Participants

One hundred and twenty participants (96 females and 24 males; mean age = 24.7 years; SD = 5.22) took part in Experiment 1. They were all undergraduate students at the National University of Mar del Plata or young professionals. All subjects reported having normal or corrected-to-normal vision. Eighty-five percent of them reported being right-handed. The presence of neurological or psychiatric diseases was an exclusion criterion. Moreover, they provided written informed consent before participating in the study. This study adhered to the Helsinki principles (World Medical Association, 2013) and was approved by the Ethical Committee of the National University of Mar del Plata.

Materials

Stimuli and semantic measures were selected from the Spanish semantic feature production norms (Vivas et al., Reference Vivas, Vivas, Comesaña, Coni and Vorano2017). From the full set of 400 concepts, we selected a subset of 130 according to the following criteria: a) the concept was not a compound word; b) it was not polysemic; c) it belonged to a well-defined semantic category (i.e., easily identified by most people); d) the categories had more than four exemplars; and finally e) they were included in McRae et al.’s (Reference McRae, Cree, Seidenberg and McNorgan2005) set of concepts (just in case some author would be interested in future comparisons).

The stimuli were the same for both experiments and consisted of 130 concrete concepts belonging to 11 categories (animals, tools, fruits, musical instruments, furniture, buildings, clothing, vegetables, utensils, accessories, and vehicles). They were all concrete (mean concreteness = 4.78, min = 4.39, max = 4.97 according to the Argentinean norms by Manoiloff and colleagues (Manoiloff et al., Reference Manoiloff, Artstein, Canavoso, Fernández and Segui2010). The mean familiarity in a scale from 1 to 5 was 2.79 (our data ranged from 1.31 to 4.91) (Manoiloff et al., Reference Manoiloff, Artstein, Canavoso, Fernández and Segui2010) and the mean subjective lexical frequency on a scale from 1 to 5 was 2.68 (our data ranged from 1.08 to 5) (Martínez-Cuitiño, Reference Martínez-Cuitiño, Barreyro, Wilson and Jaichenco2015). Each of the 130 concepts was paired with four features: one core feature, one partially shared feature, one idiosyncratic feature, and a non-related feature. The procedure to operationalize these levels can be seen in Vivas et al (Reference Vivas, Kogan, Yerro, Romanelli and Vivas2021). So, 520 combinations were obtained. Four lists were elaborated in order to provide participants trials with an acceptable number of items. In order to do that concept-feature pairs were distributed taking into account that each concept appeared only once in each list and distributing equivalently the kind of features (core, partially shared, idiosyncratic, and non-related). Each contained 120 experimental concept-feature pairs plus 10 concept-feature pairs for the training session. The final lists can be seen in OSF.

Procedure

Subjects were placed at 60 cm in front of an LCD 17 computer screen (1280 × 960 pixels). Tasks were displayed using E-prime 2.0. Subjects were instructed to press the right button if the feature was reasonably true for that concept and the left button if it was not. Responses were considered correct if the subject responded “true” for those related to the concept. Filler trials (i.e., unrelated features) were discarded.

Trials began with a set of 10 training items to ensure that the participants were familiarized with the task, followed by the 120 experimental items. Each trial consisted of a feature presented for 2000 ms, followed by a fixation cross (inter-stimulus interval—ISI) for 500 ms, and then a concept that remained on the screen for 4000 ms or until the participant’s response. The inter-trial interval (ITI) was 1500 ms (see Figure 1). Features were presented in black, and concepts were presented in blue. Both were written in lowercase fonts (Verdana, 24 pt), on a white background.

Figure 1. Example of items presentation.

Statistical analysis

The following analyses were performed using R version 4.4.0 (R core team, 2021). Reaction times (RTs) greater than 3000 ms and lower than 500 ms were removed as well as erroneous responses (14.08 % for Experiment 1). The number of data points considered for Experiment 1 (feature-concept) was 6186. Each numeric variable was centered and scaled, except for RTs. Following Baayen and Milin (Reference Baayen and Milin2010), RTs were log-transformed (Ln_RT) to meet normality assumptions (Z = 1.057; p = .214 for Experiment 1).

As the data from the experiments violate the independence assumption due to both participants and concepts grouping the data points, simple regressions could not be applied. Therefore, the next step was to perform independent linear mixed model analyses for each measure to identify the most explanatory variable, considering Ln_RT as the dependent variable. As the correlation between significance and relevance is r=0.93, independent analyses were performed for significance and relevance to avoid collinearity. The formula used for each linear mixed model was

$$\rm LogRT \sim VAR + (1| Subj) + (1| Conc)$$

where “VAR” is replaced with Significance or Relevance, according to their specific model. Subjects and Concepts were considered as random factors, and their intercepts were taken into account by the models. To compare the explanatory power of relevance and significance in RTs between models, the Akaike information criterion (AIC) was used.

Results

The results for both models (relevance and significance) for Experiment 1 (feature-concept) are shown in Table 1. Both models show statistically significant fixed effects (intercept and significance/relevance) based on the t-values. The significance model’s intercept has an estimate of 2.9832, and the significance term has an estimate of −0.0037. The relevance model’s intercept has an estimate of 2.9897, and the relevance term has an estimate of −0.0006. While the significance and relevance terms are statistically significant, the R-squared values imply that other factors not included in the models may be contributing to the variability in the outcome variable. In this case, individual RTs and complex feature-concept pairs may be contributing with added variability into the results.

Table 1. Model Estimates for Experiment 1 (Feature-Concept)

To assess the difference in predictive power between significance and relevance, model comparisons were made using the AIC. Table 2 contains the comparative values of the model. While the AIC criterion might favor the significance model, the R-squared values from Table 1 suggest that both models explain a relatively small proportion of the variance. Therefore, the differences between models are not big enough to evaluate one predictive variable as better than the other. This suggests that both significance and relevance have similar predictive power in explaining RTs in Experiment 1.

Table 2. Model comparison between predictive power of significance and relevance for experiment 1

Discussion of experiment 1

Results indicated that both measures have the same predictive power for the feature-concept condition. Our results are in line with those obtained by Montefinese et al. (Reference Montefinese, Ambrosini, Fairfield and Mammarella2013) in that both studies observed an effect of semantic significance in the feature verification task. However, they observed a superiority effect of significance compared to relevance, which was not observed in our experiment. Methodological differences in the calculation of significance between the two studies may explain this discrepancy, and these differences will be discussed in detail in the general discussion.