We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A growing literature explores the effect of economic inequality in citizens’ surrounding environment on their political attitudes and behavior. This literature typically relies on measures of income concentration or gap-size, which reflect under-tested presumptions about how citizens perceive the economic conditions surrounding them. Utilizing survey data to explore perception of economic inequality in Americans’ residential environment, this note finds that measures capturing income concentration or gap-size perform poorly relative to a measure capturing the joint prevalence of “haves” and “have-nots.” These results suggest that commonly used measures of economic inequality may not fully capture the features of people’s daily environment used to perceive the existence or magnitude of inequality. The results guide future research toward using contextual indicators that treat inequality as a compound phenomenon involving manifestations of poverty and affluence.
Conditioning on variables affected by treatment can induce post-treatment bias when estimating causal effects. Although this suggests that researchers should measure potential moderators before administering the treatment in an experiment, doing so may also bias causal effect estimation if the covariate measurement primes respondents to react differently to the treatment. This paper formally analyzes this trade-off between post-treatment and priming biases in three experimental designs that vary when moderators are measured: pre-treatment, post-treatment, or a randomized choice between the two. We derive nonparametric bounds for interactions between the treatment and the moderator under each design and show how to use substantive assumptions to narrow these bounds. These bounds allow researchers to assess the sensitivity of their empirical findings to priming and post-treatment bias. We then apply the proposed methodology to a survey experiment on electoral messaging.
We use data on Latino children in the United States who have been randomly assigned calculation tests in English or Spanish to check for the so-called bilingual advantage, the notion that knowing more than one language improves individuals’ other cognitive skills. After controlling for different characteristics of children and their parents, as well as children's time in the US, we find a bilingual advantage among children who read or write in English and Spanish but not for those who only speak or understand both languages. In particular, bilingual readers or writers perform one-fourth to one-third of a standard deviation better than monolingual children, equal to learning gains of an additional school year. Applying the Oster test, we find that selection on unobservables would need to be 3–4 times stronger than selection on observables to explain away our results. The bilingual advantage is stronger among children in two-parent households with siblings and for those at the upper end of the ability distribution.
This chapter clarifies the theoretical arguments through discussion of issues and questions that may arise in conceptualizing, testing, and evaluating not only comprehensive deterrence theory (CDT) but also, more generally, that can arise in deterrence research. For example, it discusses the nature of punishment. Deterrence scholarship understandably has examined the idea that punishments may deter. What has not been systematically theorized or empirically studied is punishment itself. Historical accounts exist, of course. And numerous scholars certainly have detailed many aspects of certain types of punishment, such as the death penalty. However, deterrence scholarship lacks a coherent foundation for predicting the effects of a wide variety of legal punishments, or how to distinguish when one type of punishment meaningfully differs from another. Similarly, there is a great deal of confusion about legal vs. extralegal punishment as well as specific vs. general deterrence. The chapter examines these and other issues with an eye towards clarifying CDT and charting directions for improving deterrence scholarship.
Mass polarization is one of the defining features of politics in the twenty-first century, but efforts to understand its causes and effects are often hindered by empirical challenges related to measurement and data availability. To address these challenges and provide a common standard of analysis for researchers, this Element presents the Polarization in Comparative Attitudes Project (PolarCAP). PolarCAP clearly defines polarization as a property of group relations and uses a Bayesian measurement model to estimate smooth panels of ideological and affective polarization across ninety-two countries and forty-nine years. The author uses these data to provide a descriptive account of mass polarization across time and space. They further show how PolarCAP facilitates substantive inference by applying it to three sets of variables often hypothesized as causes or consequences of polarization: institutional design, economic crisis, and democracy. Open-source software makes PolarCAP easily accessible to scholars and practitioners.
The field of criminology is limited by a 'hidden' measurement crisis. It is hidden because scholars either are not aware of the shortcomings of their measures or have implicitly agreed that scales with certain properties merit publication. It is a crisis because the approaches used to construct measures do not employ modern systematic psychometric methods. As a result, the degree to which existing measures have methodological limitations is unknown. The purpose of this Element is to unmask this hidden crisis and provide a case study demonstrating how to build a measure of a prominent criminological construct through modern systematic psychometric methods. Using multiple surveys and item response theory, it develops a ten-item scale of procedural justice in policing. This can be used in primary research and to adjudicate existing measures. The goal is to reveal the nature of the field's measurement crisis and show a strategy for solving it.
Over the past quarter-century, the literature on gender, peace, and security has evolved into a substantial interdisciplinary field. In this line of work, researchers have investigated the interplay between state security and women’s security, or how gender equality at the state level affects the occurrence of international and intranational conflict. The conclusion is that more gender-equal countries are less prone to engage in warfare, pointing toward a link between women’s security and national security. Various indicators have been used to capture gender equality in this literature, such as the representation of women in parliamentary roles, the proportion of women participating in the labor force, and school enrollment among girls relative to boys.
The standard measure of authoritarianism asks respondents about desirable qualities in children. Although these questions are gender-neutral, respondents may differ in the gender of the child in their heads when answering. The items also may tap into gendered expectations about boys' and girls' behavior. We conducted three experiments that randomly assigned respondents to be asked about a child, a boy, or a girl in the items. We compare the means, measurement properties, and correlation between authoritarianism and other important variables across the conditions. Asking respondents about a girl creates significant differences in the level and measurement of authoritarianism, which is partially driven by the respondents' sexism. There are, fortunately, few other significant differences in the correlates of authoritarianism.
Building on the availability of geospatial data, improvements in mapping software, and innovations in spatial statistics, political scientists are increasingly taking geography seriously. As we adopt the tools of geographers, we must also consider the methodological challenges they have identified. We focus on the modifiable areal unit problem (MAUP)—the idea that the size of aggregate spatial units and the location of their borders affect the empirical results we obtain. We first describe the logic of the MAUP, and then demonstrate the MAUP through simulations, showing MAUP-related inconsistency in regression results in randomly generated and real-world data. We identify MAUP concerns, and best practices, in top journals in political science. We conclude by suggesting how scholars may respond in theoretical and empirical terms to concerns about validity and reliability that arise from the MAUP.
The emergence of a systematic literature around land-surveying in the late first century AD affords an ideal opportunity to study the development of an ars within the scientific culture of specialized knowledge in the early Roman Empire. The variegated methods that belonged to the historical inheritance of surveying practice challenged the construction of a discrete and coherent disciplinary identity. The surveying writings of Frontinus and Hyginus evince several strategies intended to produce a systematic and explanatory conception of the ars. These include rationalizing explanations of key surveying terminology and practice with a view to natural first principles and an accounting of surveying methods in interdisciplinary perspective with astronomy, natural philosophy, and mathematics. While these earliest surveying works pose several unique challenges, they ultimately provide a precious window onto the challenges and opportunities that greeted the emergence of an ars in the fervid scientific culture of the period.
This paper presents a comprehensive analysis of interference events in automotive scenarios based on radar systems equipped with communication-assisted chirp sequence (CaCS). First, it examines the impact of interference on radar and communication functionalities in CaCS systems according to the orientation of the investigated nodes. For this purpose, a graph-based approach is employed with MATLAB simulations to illustrate the potential occurrence of interference on the graph for communication functionality compared with their counterparts on radar. Second, the paper delves into the impact of interference on the synchronization between two communicating CaCS nodes. It extends a previous study to match the frequency of current radar sensors, where chirp estimation, an adjusted version of the Schmidl & Cox algorithm, and correlation are adopted to synchronize the transmitter and receiver of two CaCS communicating nodes in the time-frequency plane. The proposed synchronization method is finally verified by measurements at ${79}\,\mathrm{GHz}$ with a system-on-chip, where the resulting correlation metric and mean square error are illustrated as validation factors.
Additional language speakers (ALSs) often experience anxiety due to challenges posed by their nonstandard pronunciation. Building on these insights, this paper introduces an instrument, the Accent Anxiety Scale (AAS), specifically designed to assess three sources of anxiety that are experienced by ALSs, including (a) apprehension about negative evaluations from other individuals due to their distinctive speech style, (b) concerns about rejection from the target language community because of their “foreign” pronunciation, and (c) anxieties over potential communication hurdles attributed to the intelligibility of their pronunciation. We evaluated the psychometric robustness of the AAS by analyzing data from a total of 474 immigrant and international student ALSs at a predominantly English-speaking Canadian university. Study 1 focused on immigrants (N = 203) and employed exploratory factor and correlational analyses to isolate a concise number of internally consistent and valid items for each subscale. Study 2 extended these analyses to international students (N = 153) and employed confirmatory factor and correlation analyses to further validate the AAS in this population. Study 3 examined international students (N = 118) at two time points to establish the AAS’s temporal stability. These studies yielded robust psychometric evidence for the factor structure, reliability, and validity of the AAS. The findings not only support the use of the AAS as a research instrument but also offer implications for pedagogical strategies aimed at alleviating ALSs’ accent anxiety.
Our understanding of politics often relies on the ideological placement of political actors—ranging from scaling legislative roll-call voting in the United States to text-based classifications of political parties in Europe. A particularly thorny problem remains estimating individual positions in legislatures with strong partisan discipline. We improve upon recently developed measurement strategies and propose a novel approach for estimating legislators’ ideological positions: an expert survey in which respondents compare pairs of representatives on a left-right dimension. The innovation of our approach lies in the combination of four particular features. First, we rely on political youth leaders who are insightful and easy to recruit. Second, the rating task does not involve numeric scaling and consists of simple pairwise comparisons. Third, we efficiently and automatically detect informative comparisons to reduce the cost and length of the survey without compromising our estimates. Fourth, we use a Bayesian Davidson model with random effects to generate an ideological position for each legislator. As an empirical illustration, we estimate the placement of the 709 members of the 19th German Bundestag. Several validity tests show that our model captures variation within and across political parties. Our estimates offer a thorough benchmark to validate alternative measurement strategies. The presented measurement strategy is flexible and easily extendable to diverse political settings because it can capture comparisons among political actors across time and space.
The main principles underpinning measurement for healthcare improvement are outlined in this Element. Although there is no single formula for achieving optimal measurement to support improvement, a fundamental principle is the importance of using multiple measures and approaches to gathering data. Using a single measure falls short in capturing the multifaceted aspects of care across diverse patient populations, as well as all the intended and unintended consequences of improvement interventions within various quality domains. Even within a single domain, improvement efforts can succeed in several ways and go wrong in others. Therefore, a family of measures is usually necessary. Clearly communicating a plausible theory outlining how an intervention will lead to desired outcomes informs decisions about the scope and types of measurement used. Improvement teams must tread carefully to avoid imposing undue burdens on patients, clinicians, or organisations. This title is also available as Open Access on Cambridge Core.
A crucial first step in helping consumers improve their financial lives is understanding their financial circumstances and well-being. The Financial Well-Being (FWB) scale measures a consumer’s subjective well-being related to aspects of their financial circumstances. It is available in standard-length (10-item) and abbreviated (5-item) versions, but no research has compared how completing either version may alter consumers’ responses. Notably, the 5-item scale includes a higher share of reverse-coded (i.e., negatively framed) items. We hypothesize that the difference in item framing between scale versions influences participants’ feelings about their financial situation, predicting that completing the 5-item FWB scale will result in more negative responses compared to completing the 10-item FWB scale. To test this hypothesis, we implement a randomized survey experiment using the Understanding America Study. In our experiment with nearly 6,000 participants, we find that completing the 5-item versus the 10-item FWB scale reduces FWB scores (average decline in the 5-item FWB score of 0.9 points, 95% CI [–1.552, –0.249]), and increases the share with a “low” 5-item FWB score by 5.0 percentage points, 95% CI [0.028, 0.071]), responses to individual scale items, and self-rated FWB. This pattern is strongest among lower-income respondents (average decline in FWB score of 2.3 points, 95% CI [–3.385, –1.171] and increases the share with a “low” 5-item FWB score by 8.1 percentage points, 95% CI [0.041, 0.121]). These findings highlight that FWB scale choice can have unexpected consequences. We discuss the implications for research on FWB and on the measurement of well-being more broadly.
A popular refrain in many countries is that people with mental illnesses have “nowhere to go” for care. But that is not universally true. Previously unexplored international data shows that some countries provide much higher levels of public mental health care than others. This puzzling variation does not align with existing scholarly typologies of social or health policy systems. Furthermore, these cross-national differences are present despite all countries’ shared history of psychiatric deinstitutionalization, a process that I conceptualize and document using an original historical data set. I propose an explanation for countries’ varying policy outcomes and discuss an empirical strategy to assess it. The research design focuses on the cases of the United States and France, along with Norway and Sweden, in order to control for a range of case-specific alternative hypotheses. The chapter ends with brief descriptions of contemporary mental health care policy in each of the four countries examined in this book.
Information on the time spent completing cognitive testing is often collected, but such data are not typically considered when quantifying cognition in large-scale community-based surveys. We sought to evaluate the added value of timing data over and above traditional cognitive scores for the measurement of cognition in older adults.
Method:
We used data from the Longitudinal Aging Study in India-Diagnostic Assessment of Dementia (LASI-DAD) study (N = 4,091), to assess the added value of timing data over and above traditional cognitive scores, using item-specific regression models for 36 cognitive test items. Models were adjusted for age, gender, interviewer, and item score.
Results:
Compared to Quintile 3 (median time), taking longer to complete specific items was associated (p < 0.05) with lower cognitive performance for 67% (Quintile 5) and 28% (Quintile 4) of items. Responding quickly (Quintile 1) was associated with higher cognitive performance for 25% of simpler items (e.g., orientation for year), but with lower cognitive functioning for 63% of items requiring higher-order processing (e.g., digit span test). Results were consistent in a range of different analyses adjusting for factors including education, hearing impairment, and language of administration and in models using splines rather than quintiles.
Conclusions:
Response times from cognitive testing may contain important information on cognition not captured in traditional scoring. Incorporation of this information has the potential to improve existing estimates of cognitive functioning.
Several methods used to examine differential item functioning (DIF) in Patient-Reported Outcomes Measurement Information System (PROMIS®) measures are presented, including effect size estimation. A summary of factors that may affect DIF detection and challenges encountered in PROMIS DIF analyses, e.g., anchor item selection, is provided. An issue in PROMIS was the potential for inadequately modeled multidimensionality to result in false DIF detection. Section 1 is a presentation of the unidimensional models used by most PROMIS investigators for DIF detection, as well as their multidimensional expansions. Section 2 is an illustration that builds on previous unidimensional analyses of depression and anxiety short-forms to examine DIF detection using a multidimensional item response theory (MIRT) model. The Item Response Theory-Log-likelihood Ratio Test (IRT-LRT) method was used for a real data illustration with gender as the grouping variable. The IRT-LRT DIF detection method is a flexible approach to handle group differences in trait distributions, known as impact in the DIF literature, and was studied with both real data and in simulations to compare the performance of the IRT-LRT method within the unidimensional IRT (UIRT) and MIRT contexts. Additionally, different effect size measures were compared for the data presented in Section 2. A finding from the real data illustration was that using the IRT-LRT method within a MIRT context resulted in more flagged items as compared to using the IRT-LRT method within a UIRT context. The simulations provided some evidence that while unidimensional and multidimensional approaches were similar in terms of Type I error rates, power for DIF detection was greater for the multidimensional approach. Effect size measures presented in Section 1 and applied in Section 2 varied in terms of estimation methods, choice of density function, methods of equating, and anchor item selection. Despite these differences, there was considerable consistency in results, especially for the items showing the largest values. Future work is needed to examine DIF detection in the context of polytomous, multidimensional data. PROMIS standards included incorporation of effect size measures in determining salient DIF. Integrated methods for examining effect size measures in the context of IRT-based DIF detection procedures are still in early stages of development.
A method is discussed which extends canonical regression analysis to the situation where the variables may be measured at a variety of levels (nominal, ordinal, or interval), and where they may be either continuous or discrete. There is no restriction on the mix of measurement characteristics (i.e., some variables may be discrete-ordinal, others continuous-nominal, and yet others discrete-interval). The method, which is purely descriptive, scales the observations on each variable, within the restriction imposed by the variable's measurement characteristics, so that the canonical correlation is maximal. The alternating least squares algorithm is discussed. Several examples are presented. It is concluded that the method is very robust. Inferential aspects of the method are not discussed.
It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.