Sackett et al. (Reference Sackett, Zhang, Lievens and Berry2023) start their focal article by stating that they identified “previously unnoticed flaws” in range restriction (RR) corrections in most validity generalization (VG) meta-analyses of selection procedures reviewed in their 2022 article. Following this provocative opening statement, they discuss how both researchers and practitioners have handled (and should handle) RR corrections in estimating the operational validity of a selection procedure in both VG meta-analyses (whose input studies are predominantly concurrent studies) and individual validation studies (which serve as input to VG meta-analyses). The purpose of this commentary is twofold. We first provide an essential review of Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022) three propositions serving as the major rationales for their recommendations regarding RR corrections (e.g., no corrections for RR in concurrent validation studies). We then provide our critical analyses of their rationales and recommendations regarding RR corrections to put them in perspective, along with some additional thoughts.
Essential review of Sackett et al.’s three propositions regarding RR corrections
Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) advance three propositions regarding RR and RR corrections in concurrent validation studies: (a) “any range restriction (on the predictor of interest, X) will only be indirect”; (b) “with rare exceptions Z is not highly correlated with X” where Z is the direct basis for selection; and (c) “indirect range restriction restricts variance on the selection predictor of interest to only a very modest degree under virtually all realistic circumstances” (pp. 2041–2042). Testing these propositions using simulated data, Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022, Table 1; Reference Sackett, Zhang, Lievens and Berry2023) state that when RR is indirect (e.g., in concurrent validation studies), ux (= restricted, incumbent-based SDx/unrestricted, applicant-based SDx) tend to be .90 or higher, and, thus, the RR correction effect will be relatively small (see Oh et al., Reference Oh, Le and Roth2023, Table 2 for counterevidence). This serves as a basis for their recommendation for not correcting for RR in concurrent studies and most VG meta-analyses reviewed in their 2022 article, whose input is predominantly (approximately 80%) concurrent studies. Relatedly, Sackett et al. (Reference Sackett, Zhang, Lievens and Berry2023) also state that the degree of RR “can be quite substantial” in predictive studies, whereas, as noted above, RR “is likely to be small anyway” in concurrent studies. This differential RR by validation design (predictive vs. concurrent) serves as another basis for their recommendation against applying RR data (i.e., the ux distribution) from applicant-based predictive studies to RR corrections in concurrent studies or VG meta-analyses whose input is mostly concurrent studies. They argue that this (problematic in their view) “uniform RR correction” approachFootnote 1 has been implemented in most previous VG meta-analyses, which is why they claim that RR corrections applied in such meta-analyses have “flaws.”
In summary, Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) solution to the “previously unnoticed flaws” in RR corrections in most VG meta-analyses reviewed in their 2022 article is to question and, in most cases, undo RR corrections with the premise that most input studies in those VG meta-analyses are concurrent studies. That is, most of their newly suggested meta-analytic mean operational validity estimates for various selection procedures are, basically, those uncorrected for RR. Along this line, Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) propose a sensational recommendation for individual validation (and VG) studies: “Absent credible U (= 1/u) ratios for concurrent studies, and in light of the above demonstration that U ratios are likely to be close to 1.0 in concurrent studies, we recommend no range restriction correction for concurrent studies” (p. 2044).
Having reviewed the major underpinnings of Sackett et al.’s recommendations regarding RR corrections, we share our critical analyses of Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) recommendations as follows, focusing on their potential pitfalls and challenges.
Critical analyses of Sackett et al.’s recommendations regarding RR corrections
Dichotomous thinking and sweeping recommendations
We are surprised, if not shocked, by Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) “dichotomous thinking” and sweeping recommendation for “no” RR corrections in any concurrent studies, regardless of which predictor is being considered. If their recommendation were taken at face value, the implication would be that all concurrent studies are alike in terms of the degree of RR (i.e., no RR) and, thus, the resulting ux values are the same (ux = 1) across all such studies and across all predictors. Sackett et al. do not provide sufficient evidence in support of this. In fact, research evidence shows that the degree of RR varies by predictor. In fact, ux values tends to be smaller for cognitively loaded selection procedures than for noncognitive selection procedures. For example, average ux values for personality traits are in the .90 range (e.g., Schmidt et al., Reference Schmidt, Shaffer and Oh2008).Footnote 2 Conversely, average ux values for cognitive ability tests tend to be in the .70 range (e.g., Alexander et al., Reference Alexander, Carson, Alliger and Cronshaw1989).
In addition, Sackett et al.’s recommendation for no corrections for RR in concurrent studies, unlike their disclaimer, does not appear to be limited to input studies included in most VG meta-analyses reviewed in Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022, Table 2). It does, however, appear to be generalizable to future individual validation studies (see the last section about “correcting individual studies” in Sackett et al., Reference Sackett, Zhang, Lievens and Berry2023). One issue here is their assumption about the magnitude and nature of the correlation between X and Z (“rzx”)Footnote 3; as “rzx” increases, ux decreases. Sackett et al.’s assumption that “rzx” always falls in a certain range (e.g., ≤ .50) regardless of the nature of Z and X and their specific application in selection settings certainly lacks sufficient evidence and needs caution (see Oh et al., Reference Oh, Le and Roth2023). For example, as noted in their focal article, we are likely to see increasingly more validation studies on “a comparison between the validity of legacy predictors and gamified assessments, asynchronous video interviews, natural language processing-based scoring of essays, and the like” or a linear composite of multiple predictors, such as a cognitive ability test and a job-relevant personality (e.g., conscientiousness) measure as an effort to balance validity (performance) and adverse impact (diversity). We believe this would produce much higher “rzx” correlations and thus result in much lower ux values than suggested in Sackett et al.’s articles, thus warranting RR corrections.Footnote 4
In summary, our major issue with Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) articles is their sweeping recommendations (i.e., no RR correction in concurrent validation studies regardless of which predictor is being considered) without compelling evidence. As noted in Roberts (Reference Roberts2022), “scientific arguments should not be presented without evidence, as doing so shifts the burden of proof to the reader and pretends that the claims are established facts (burden of proof fallacy). Scientific arguments should also not be framed within dichotomous “either–or” scenarios, as doing so is unsound (false dichotomy fallacy)” (p. 22).
Differential RR by validation (study) design
As briefly noted above, Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) advance statements that appear to invoke differential RR by validation design (concurrent vs. predictive) without compelling evidence. In particular, Sackett et al. (Reference Sackett, Zhang, Lievens and Berry2023) state that:
“[T]he degree of RR can be quite substantial in predictive studies; however, Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) demonstrated that restriction will commonly (though not always) be quite small in concurrent studies. Applying a large correction factor derived from predictive studies to concurrent studies thus results in an overcorrection, which is often quite large.”
To put the statement above in perspective, it is worth noting other esteemed personnel selection scholars’ opinions and findings concerning Sackett et al.’s assertion. In particular, Morris (Reference Morris2023) raises a legitimate apprehension: “While it is important to consider the representativeness of the data used for artifact corrections, it is not clear that predictive and concurrent designs will necessarily produce substantially different levels of range restriction” (p. 238). Morris further elaborate that although it is true that RR can be more substantial in predictive rather than concurrent studies (because RR can be direct), most, if not all, predictive studies are also subject to indirect RR as is the case for concurrent studies; that is, few, if any, predictive studies are subject to direct RR (Schmidt et al., Reference Schmidt, Oh and Le2006). We agree with Morris that the field has yet to see compelling evidence for differential RR by validation design.
In fact, existing evidence does not appear to support Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022) statements suggesting different RR by validation design. For example, in their influential meta-analysis, Schmitt et al. (Reference Schmitt, Gooding, Noe and Kirsch1984) report that “concurrent validation designs produce (observed) validity coefficients roughly equivalent to those obtained in predictive validation designs,” thus illustrating that predictive and concurrent designs are unlikely to result in very different degrees of RR (p. 37). The same conclusion was found in the largest-scale meta-analysis currently available (Pearlman et al., Reference Pearlman, Schmidt and Hunter1980, p. 380). Based on all these findings, Schmidt et al. (Reference Schmidt, Pearlman, Hunter and Hirch1985) state, “contrary to general belief, predictive and concurrent studies suffer from range restriction to about the same degree” (p. 750). In summary, although there might be debate over how to correct for RR in concurrent studies, there is no compelling evidence for differential RR by validation design for each selection procedure reviewed in Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022) article.
(Un)Availability of credible and sound information about RR
Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) repeatedly state that without “credible” and “sound” information about RR, we’d better not attempt to correct for RR. This raises a critical question: What do Sackett et al. indeed mean by the “credible” and “sound” information about RR with which we correct for RR in concurrent studies? The answer is clear: applicant-based SDx. This, in turn, raises another critical question: Is such credible information on RR available from concurrent (and most validation) studies? The answer is clear: NO! Such information as applicant-based SDx is unavailable from concurrent studies because concurrent studies, by definition, are based on incumbents, not applicants. As such, we are left wondering how feasible it is to obtain credible or sound information about RR in concurrent studies because such information is unavailable. Importantly, a lack of credible information about RR does not necessarily mean that RR does not exist in concurrent studies. This naturally leads to the question of whether there are good alternatives to applicant SDx absent in concurrent studies.
Alternatives to applicant-based SDx
In fairness to Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023), they discuss a possible solution to the lack of applicant-based SDx in concurrent (and incumbent-based predictiveFootnote 5) studies. Specifically, Sackett’s own study based on a cognitive ability test (Sackett & Ostgaard, Reference Sackett and Ostgaard1994) shows that a large number of job-specific applicant-based SDx values collected across various jobs are, on average, 10% smaller than their applicant-based national norm SDx. Subsequent studies have been conducted based on this approach (e.g., Hoffman, Reference Hoffman1995 for various ability and mechanical tests; Ones & Viswesvaran, Reference Ones and Viswesvaran2003 for self-reported personality measures). With this approach, we will know how much downward adjustment should be made to the applicant-based national workforce norm SDx to derive a reasonable estimate of the expected applicant-based SDx value that can be used to correct for RR in concurrent and other types of validation studies without such unrestricted SDx values. However, as in the following paragraph from Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022), their tone (note “an argument for skepticism about this approach” below) is not encouraging but rather skeptical as they now do not view it as credible information regarding RR.
“Sackett and Ostgaard (Reference Sackett and Ostgaard1994) obtained applicant pool SDx values for a large number of jobs and then pooled the data across jobs as an estimate of workforce SDx; they reported that the applicant pool SDx values were on average 10% smaller than the workforce SDx estimate. So, based on Sackett and Ostgaard’s finding, it is at least hypothetically possible that it could be reasonable to pool incumbent data across jobs to estimate the unrestricted SDx, and then reduce that SDx by 10%. However, we offer an argument for skepticism about this approach, at least in terms of the U ratio estimate it produced in Hunter (Reference Hunter1983).” (p. 2045)
What is odd to us is that we know that Hunter’s (Reference Hunter1983) approach was based on only incumbent data without any downward adjustments, whereas Sackett and Ostgaard’s (Reference Sackett and Ostgaard1994) approach mentioned above is rightfully based on only applicant data with proper downward adjustments. Thus, we are surprised that Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) discuss these two approaches side by side without clarifying the stark difference in input data (also see Roth et al., Reference Roth, Le, Oh, Van Iddekinge and Robbins2017). To be clear, we advocate the use of applicant-based national workforce norm SDx values with proper downward adjustments.
In the focal article, Sackett et al. (Reference Sackett, Zhang, Lievens and Berry2023) note that it may be acceptable to use job/occupation-specific (vs. national workforce) norm-based applicant SDx. In theory, we do not disagree with this suggestion because it is analogous to using available local applicant-based SDx values (e.g., prior applicant data kept in an organization) for correcting for RR in present concurrent studies (see Hoffman, Reference Hoffman1995, Table 4). However, we are concerned about the (un)availability of job/occupation-specific norm-based applicant SDx values from many test manuals and large-scale validation studies. Sackett et al. appear to be quite restrictive and skeptical (not encouraging) about this applicant-based national workforce norm approach, which is further elaborated in their caution. However, unlike Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023), we believe that applicant-based national workforce norm SDx values can be used as good surrogates for applicant-based SDx with proper downward adjustments; this is better than assuming no RR in concurrent studies. That is, we agree with Sackett and Ostgaard (Reference Sackett and Ostgaard1994) and other subsequent studies that applicant-based national workforce norm SDx should not be used liberally without proper adjustments (Roth et al., Reference Roth, Le, Oh, Van Iddekinge and Robbins2017).
Concluding thoughts
There is no disagreement among scholars that “the influence of selection (RR) upon the resulting validity coefficients becomes a very substantial matter where a high standard of selectivity exists” (Thorndike, Reference Thorndike1949, p. 170). Looking back, all thoughts and evidence presented above suggest that there is no compelling evidence for Sackett et al.’s (Reference Sackett, Zhang, Berry and Lievens2022, Reference Sackett, Zhang, Lievens and Berry2023) sweeping (e.g., without considering which predictor is being considered and in which situation/application) recommendation that the effect of RR correction is so small that it is better to not attempt RR corrections in concurrent validation studies. Looking ahead, we agree with Morris (Reference Morris2023) that “additional work is needed to fully understand the representativeness of range restriction estimates and optimal correction procedures under typical conditions” (p. 238). For example, considering (a) that many firms choose to use a multiple-hurdle model of personnel selection increasingly often, to save hiring-related time and costs, and (b) that traditional RR correction methods based on a compensatory model of selection discussed in this article do not directly apply to the multiple hurdle model (see Mendoza et al., Reference Mendoza, Bard, Mumford and Ang2004 for exceptions), it will be fruitful for future studies to examine a long overdue issue of how to correct for RR in such selection situations, along with developing more innovative and less restrictive RR correction methods. We also recommend the creation of publicly available large applicant-based data reservoirs that can be used collectively in personnel selection.