1 Introduction
As one of the simplest heuristics in the “adaptive toolbox” (Gigerenzer, Todd, & The ABC Research Group, Reference Gigerenzer and Todd1999), the recognition heuristic (RH) exploits recognition and may reach a high level of accuracy in inferential decisions. For example, if asked which of two cities is larger, A or B, and given one recognizes A, but not B, one may simply follow the recognition cue and infer that A is the larger city. In domains in which the probability of recognizing an object is substantially related to its criterion value (here the city’s size), such a simple strategy will lead to many correct answers, far above chance. Goldstein and Gigerenzer (Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002) formulated this strategy as the recognition heuristic and defined it as using only one piece of evidence, namely recognition of the two objects (yes/no). No other knowledge about the objects enters the inference process and could possibly overturn the decision based on recognition. The RH thus represents a case of a non-compensatory, one-reason decision-making strategy. Especially this claim has raised some controversy in the past decade and has led to a multitude of new empirical findings. In other words, besides providing a precisely formulated and thus testable model, one merit of the RH certainly is that it challenged quite a number of researchers, and—as a consequence—extended our knowledge of how inferential decision making may proceed. A new and exciting set of such studies are included in JDMs special issue on “Recognition processes in inferential decision making” (the papers of which can be found in Volume 5, Issue 4, and Volume 6, Issues 1 and 5; see Marewski, Pohl, & Vitouch, Reference Marewski, Pohl and Vitouch2010, 2011a, 2011b).
In the following section (Section 2), I recapitulate the basic features of the RH and its underlying assumptions, looking at its precursors and its fully laid-out version. In the main part of the paper (Section 3), I then discuss in detail the main points of the controversy surrounding the RH and its framework. Note that I do not try to provide a complete review of all theoretical arguments exchanged so far (see, e.g., Brighton & Gigerenzer, Reference Brighton and Gigerenzer2011; Bröder & Newell, Reference Bröder and Newell2008; Dougherty, Franco-Watkins, & Thomas, Reference Dougherty, Franco-Watkins and Thomas2008; Gigerenzer, Reference Gigerenzer2008; Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009; Gigerenzer & Gaissmaier, Reference Gigerenzer and Gaissmaier2011; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Hilbig, Reference Hilbig2010b, 2011; Hilbig, Erdfelder, & Pohl, Reference Hilbig, Erdfelder and Pohl2010; Hilbig & Richter, Reference Hilbig and Richter2011; Marewski, Gaissmaier, & Gigerenzer, Reference Marewski, Gaissmaier and Gigerenzer2010; Marewski, Gaissmaier, Schooler, Goldstein, & Gigerenzer, Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; Marewski, Schooler, & Gigerenzer, Reference Marewski, Schooler and Gigerenzer2010; Newell & Shanks, Reference Newell and Shanks2004; Pachur, Bröder, & Marewski, Reference Pachur, Bröder and Marewski2008; Pachur, Todd, Gigerenzer, Schooler, & Goldstein, in press; Tomlinson, Marewski, & Dougherty, Reference Tomlinson, Marewski and Dougherty2011; see also the editorial to the first volume of this special issue: Marewski, Pohl, & Vitouch, Reference Marewski, Pohl and Vitouch2010). Finally (Section 4), I conclude with some general remarks and a short outlook.
2 The history of the recognition heuristic
The first ancestor of the RH was mentioned as “familiarity cue” in Gigerenzer, Hoffrage, and Kleinbölting’s (Reference Gigerenzer, Hoffrage and Kleinbölting1991) work on probabilistic mental models (PMM). There, in the context of paired comparisons of city names according to the cities’ size, the familiarity cue was defined as “whether one has heard of one city and not the other” (p. 509). This information was considered, for a given domain, as one among five probability cues that govern the building of a PMM and thus the choice behavior (among two alternatives) and corresponding confidence judgments. In an experiment that was planned to test the PMM, the “RH” was born as an explanation for an unexpected finding, namely that the performance of German students who decided which of two cities was larger was about equally good on German and U.S. cities (see Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011, and Hoffrage, Reference Hoffrage2011, for more details on this discovery). Apparently, the German students could exploit recognition (or the lack thereof) to reach such a high performance for the U.S. cities. Accordingly, Gigerenzer and Goldstein (Reference Gigerenzer and Goldstein1996)—introducing models of bounded rationality based on the PMM framework—raised the role of recognition information to a “recognition principle” as the first step in their Take-The-Best (TTB) algorithm. The city-size task was raised, too, namely to the status of a “drosophila” environment of studying satisficing algorithms (like TTB). The authors assumed that “the recognition principle is invoked when the mere recognition of an object is a predictor of the target variable (e.g., population). The recognition principle states the following: If only one of the two objects is recognized, then choose the recognized object. If neither of the two is recognized, then choose randomly between the two. If both of the objects are recognized then proceed to Step 2.” (p. 653)
Step 2 and further steps then describe how additional cues are searched and evaluated until an inference can be drawn. The authors also stated that the proposed TTB algorithm (including the recognition principle) apply only to inferences from memory (where the cue values have to be retrieved from memory), and not to inferences from givens (where the cue values are openly present to the decision maker).
Following the described precursors, the RH was more fully laid out in Goldstein and Gigerenzer (Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002). It now also received the status of a heuristic on its own. The TTB algorithm was also renamed to a heuristic. RH and TTB and several other heuristics were assumed to form the cognitive tools in an “adaptive toolbox” that human decision makers possess (Gigerenzer et al., 1999). “Adaptive” means that, depending on the task and situation, different, ecologically valid tools could be applied. Because these tools exploit regularities of the given environment, they allow good and fast decisions with minimal effort. Hence, these strategies were accordingly also termed “fast and frugal heuristics” (FFH; as opposed to more effortful and presumably time-consuming, complex decision processes).
The RH was assumed to be domain-specific, that is, useful only in domains with a high correlation between probability of recognition and criterion value. Recognition was (and is) still used in a binary fashion, that is, objects are either recognized or not. The most important feature, however, was that recognition should be used as the only cue (Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein1996): The authors stated that, if recognition discriminates between the alternatives (i.e., when one object is recognized and the other not), then (1) no other information beyond recognition will be considered and therefore (2) nothing can overturn the inference based on the recognition cue (Goldstein & Gigerenzer, Reference Goldstein and Gigerenzer2002, p. 82). The first hypothesis is known as “one-reason decision making”, the second one as “non-compensatory strategy.” Apparently, these rather bold proposals have fueled the strongest reactions of other researchers (see below).
In addition, the authors introduced the concepts of recognition validity (α) and knowledge validity (β), which can be helpful in describing a domain or a sample. The recognition validity represents the percentage of cases in which following the recognition cue will lead to a correct inference (given that recognition discriminates). The knowledge validity represents the percentage of correct decisions when both objects are recognized (so that recognition does not discriminate). Given that in some domains the recognition validity could excel the knowledge validity, a peculiar effect was predicted, the “less-is-more effect” (LIME). The LIME entails that the overall inferential accuracy of a person who recognizes only about half of the objects in a domain could be higher than that of a person who recognizes all objects. The assumed reason for this at first glance surprising effect is that a person with full recognition can never use the more valid recognition cue, because all objects are recognized and so recognition does not discriminate. Instead, this person has to rely on her (in this case) less valid knowledge. However, a person with fewer recognized objects can utilize the highly valid recognition cue more often and thus will be more often correct.
In both of their central publications on the RH, Goldstein and Gigerenzer (Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002) presented a number of (partly identical, partly different) studies, consisting of experimental work and computer simulations, to support their conjectures as outlined above. These two original publications have sparked a lot of research in the following years up to now, some leading to supporting, others to more critical evidence (to be summarized below).
In their most recent presentation of the RH, Gigerenzer and Goldstein (Reference Gigerenzer and Goldstein2011) have clarified the conditions and predictions of the RH theory. The authors also asserted that some of the critical papers that have appeared in the past decade could not be considered adequate tests of the RH (see Section 3.3 and Pachur et al., 2008). Other findings, however, were considered crucial and led to an extension of the RH theory. Most importantly, Gigerenzer and Goldstein now posit that, before the RH is applied, an evaluation will be run that tests whether the recognition cue should be used or not (see Sections 3.5 and 3.8 and Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009; Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; Pachur & Hertwig, Reference Pachur and Hertwig2006).
3 Controversial topics
From looking into the literature (with lots of critical papers, commentaries, and replies), it is clear that the discussion of the “adaptive toolbox” approach and its postulated heuristics has led to a rather lively and sometimes heated debate (see, e.g., the discussion in Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011). In this section (and main part of the paper), I present a list of such topics on which researchers diverge or that simply represent open questions to be addressed in the future. I have summarized them under eight headings (Table 1), which I describe and discuss in more detail in the following eight sub-sections.
3.1 Recognition as a memory-based process
While acknowledging that recognition should generally be treated as a continuous variable, Goldstein and Gigerenzer (Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002) focused on the outcome of this recognition process, which is either “recognized” or “not recognized” with only a small and negligible gray zone of uncertainty in between. Accordingly, the quality of these subjective recognition judgments, that is, whether they were true or not or with what confidence, was originally not considered (see Dougherty et al., 2008, and Newell & Fernandez, 2006, for critical discussions, and Gigerenzer, Hoffrage, & Goldstein, Reference Gigerenzer, Hoffrage and Goldstein2008, and Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011, for replies). This simplification of the recognition process nevertheless allowed to predict an impressive portion of people’s inferences. Meanwhile, some researchers have asked whether and how the recognition process itself possibly affects subsequent inferences. This question corresponds to Challenge 1 postulated by Tomlinson et al. (Reference Tomlinson, Marewski and Dougherty2011) and appears even more essential when considering that the proposed heuristics (like RH and TTB) entail memory-search mechanisms, trying to retrieve information (objects, cues, strategies, etc.) from memory. Hoffrage (Reference Hoffrage2011), for example, reported that recognition of city names depended on the size of the reference class from which the cities were drawn, presumably causing a criterion shift for recognition. Another evidence for recognition as continuous variable is that performance for pairs of two “unknown” objects is typically slightly above chance, suggesting that people applied a conservative criterion to “recognize” an object.
One approach that extended the RH theory was presented by Pleskac (Reference Pleskac2007), who considered recognition in a signal-detection framework (see also Schooler & Hertwig, Reference Schooler and Hertwig2005). He distinguished correctly recognized objects (“hits”) and falsely recognized ones (“false alarms”) and investigated how the proportions of these cases influence the performance of the RH in paired comparisons. Pleskac showed that persons’ sensitivity and their decision criteria affect their performance. Generally, performance of the RH decreases if the number of erroneously recognized objects increases. Another approach, based on a two-high-threshold model of recognition memory, was presented by Erdfelder, Küpper-Tetzel, and Mattern (Reference Erdfelder, Küpper-Tetzel and Mattern2011; see also Bröder & Schütz, Reference Bröder and Schütz2009). According to that model, recognition of items depends on whether the memory strength of “old” objects is above the recognition threshold (leading to “hits”) or not (leading to guessing); and whether the memory strength for “new” objects is below the rejection threshold (leading to “correct rejections”) or not (leading to guessing). Thus, objects could be in a “recognized with certainty” state, in an uncertain state, or in a “unrecognized with certainty” state. Depending on these states and their combinations in pairs of objects, specific predictions about choices and reaction times can be derived. Erdfelder et al. corroborated these predictions in an empirical study, showing the importance of adding a third (uncertain) state to the simple yes/no recognition states used so far.
Another area in which recognition processes are considered concerns the fluency heuristic (FH; Hertwig, Herzog, Schooler, & Reimer, Reference Hertwig, Herzog, Schooler and Reimer2008; Schooler & Hertwig, Reference Schooler and Hertwig2005). According to this heuristic, persons use the speed of recognizing an object as another cue. Whenever both objects in a pair are recognized so that the RH cannot be applied, the FH steps in. Given that the fluency of recognition discriminates between the two objects, the FH suggests that the more fluently recognized object should be chosen as having the larger criterion value. This heuristic represents another case of one-reason decision-making. In this context, it is, of course, of paramount interest to understand what determines fluency and how it is perceived and evaluated. In other words, a number of memory search and retrieval processes may play a role here. Hilbig, Erdfelder, and Pohl (Reference Hilbig, Erdfelder and Pohl2011) estimated the frequency of FH use in cases with both objects recognized and came to a negative conclusion, suggesting that fluency is very rarely considered in isolation as proposed by the FH.
Recently, the applicability of fluency was extended to recognition cases in which only one item is recognized (Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010). The authors assumed that the retrieval time for the recognized object determines whether the RH will be applied or not. Whenever retrieval is slow, the decision maker will more likely not use the RH, but will follow it when recognition is fast. Thus, slow retrieval times could be seen as a further reason to stop using the RH (see Section 3.5).
The only sophisticated model so far that took memory processes underlying recognition explicitly into account was presented by Schooler and Hertwig (Reference Schooler and Hertwig2005). They implemented RH and FH in the ACT-R cognitive architecture (see, e.g., Anderson & Lebiere, Reference Anderson and Lebiere1998; Anderson & Schooler, Reference Anderson and Schooler1991) and simulated people’s decision processes. In their model, probability of an object’s retrieval (for the RH) and its retrieval time (for the FH) are both assumed to be functions of the strength of the object’s memory trace and its associative strength to the current retrieval cues. Accordingly, the model allows one to make predictions about whether an object will be recognized, and, if so, how long its retrieval will take. This is certainly an advantage compared to the earlier neglect of memory retrieval processes and also presents a promising test-bed for RH, FH, and other decision processes. However, ACT-R is also a highly complex memory model and necessitates quite a number of assumptions which are not always obvious and which could also be discussed controversially. For example, the specific parameter values can be (and have been) set differently in different model versions, so that empirical predictions were not always that clear.
Another question concerns what it actually means that an object is recognized. Recognition is no doubt helpful in many situations. For example, it helps when someone meets people on the street to know whom to greet (because they are recognized as neighbors) and whom not (because they are not recognized, suggesting that they are strangers). However, even in this simple situation, it is not recognition itself that is helpful, but rather the information associated with it. Maybe the recognized passerby is someone severely disliked or known for other reasons (because he or she is a famous actor or local politician). In these cases, recognition alone wouldn’t suffice to tell what to do. One needs to remember who these persons are, that is, one needs to retrieve further information about them from memory. In other words, it is the combination of recognition and further knowledge that drives behavior in many everyday situations. Newell and Shanks (Reference Newell and Shanks2004) summarized this by stating that (p. 933) “it is not pure recognition that determines an inference but recognition plus an appropriate reason for knowing why a particular object is recognized—or, at least, a correctly interpreted feeling of familiarity. It is not that an object is recognized and chosen without justification, but that the decision maker has a reasonable idea of why he or she recognizes the object and makes an inference on the basis of this secondary knowledge.”
This argument could exemplify why some researchers may feel uneasy that there should be cases in which one’s inferences are based on recognition alone. Of course, one may argue that the recognition validity could be low in situations such as the greeting example above (so that the RH would be less useful), but they nevertheless represent cases in which, to be useful, recognition has to be combined with further knowledge. The same argument applies to the classical city-size task, in which cities are not only recognized, but are recognized for being a state’s capital, being located at the coast, being a tourist site, or hosting a big automobile company. All this knowledge is intertwined with recognition and is probably retrieved in an instant (see Section 3.6). If that were true, the postulated “search memory” and “stop searching memory” assumptions of the RH possibly need to be changed to inhibitory working-memory processes, trying to prevent any of the already retrieved information beyond recognition to enter the decision making process (see Section 3.5).
That recognition alone could represent an important information can, paradoxically, be shown in cases where recognition is not helpful (Pohl, Reference Pohl2006, Exp. 1). In that experiment, I used a task where recognition was not valid (α = .50) and people had (presumably) not much additional knowledge. The task was to decide which of two Swiss cities is located further away from the Swiss city Interlaken, which is close to the geographical center of Switzerland. I found that when one city was recognized and the other not, some participants nearly always inferred that the recognized city was the correct one, while another group of participants used exactly the opposite strategy and nearly always chose the unrecognized city. Of course, both groups’ accuracy was only around chance (given that recognition was not valid and knowledge not available), but maybe recognition was used as the only “straw” one might cling to, in order to have at least some sense of control in this rather extreme case of decision making. This could be taken as evidence that recognition is indeed an important cue also in other situations.
In sum, it might appear useful to look more closely into the memory processes that lead to the recognition (or rejection) of an object, not just because of extending the RH theory, but rather because these processes presumably have direct consequences on people’s behavior and could therefore complement or sharpen predictions as made by the RH alone.
3.2 The RH as a cognitive process model
The heuristics in the adaptive toolbox were devised to replace earlier “one-label” or “as-if” models providing more precise descriptions of the processes underlying inferential decision making (see, e.g., Gigerenzer, Reference Gigerenzer1996). As such, some of the postulated heuristics proved quite successful in predicting people’s behavior (see, e.g., Gigerenzer & Gaissmaier, Reference Gigerenzer and Gaissmaier2011). Yet, the next and in my view highly important question is whether and how these heuristics can be translated into cognitive process models, describing how people actually proceed when making an inferential decision (Fiedler, Reference Fiedler2010).
Surprisingly, Goldstein and Gigerenzer (Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002) were quite reluctant about using the word “use” in the context of what decision makers are doing with the RH. Of course, the typically reported high adherence rates suggest that the RH is not only understood as a predictive device, but also as an explanation of the processes underlying the observed choices. In addition, the RH has been described in terms of working-memory processes (search, stop, decide) and has accordingly been depicted as a flow chart or production rules (Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein1996; Schooler & Hertwig, Reference Schooler and Hertwig2005; see also Figure 1). Accordingly, Pachur and Hertwig (Reference Pachur and Hertwig2006) treated the RH as a cognitive process model and spoke consistently of people using the RH or not (see also Pachur et al., in press).
The question then is of how to derive adequate predictions from the RH, for example, for reaction times (RT). I think that the step-wise procedures described in the RH (as well as in other heuristics) should basically allow one to derive such predictions (see, e.g., Glöckner & Bröder, Reference Glöckner and Bröder2011; Hilbig & Pohl, Reference Hilbig and Pohl2009). Moreover, several formulations suggest at least implicit conclusions about RT differences. For example, discussing TTB, Martignon and Hoffrage (Reference Martignon, Hoffrage, Gigerenzer and Todd1999, p. 137) pointed out that “in the kind of inference task we are concerned with, cues have to be searched for, and the mind operates sequentially, step by step and cue by cue.” Brandstätter, Gigerenzer, and Hertwig (Reference Brandstätter, Gigerenzer and Hertwig2006) argued with respect to the priority heuristic that it “is intended to model both choice and process: It not only predicts the outcome but also specifies the order of priority, a stopping rule, and a decision rule.” (p. 427) In a similar vein, Pachur and Hertwig (Reference Pachur and Hertwig2006) claimed that “recognition is first on the mental stage and ready to enter inferential processes when other probabilistic cues still await retrieval.” (p. 986) Using recognition should therefore be rather fast, while searching for further information will need additional time (see also Pachur et al., in press).
Supporting evidence for a stepwise TTB process resulting in increasing reaction times the more cues had to be searched was provided by Bröder and Gaissmaier (Reference Bröder and Gaissmaier2007), who had analyzed the data of those participants for which TTB was the best model in predicting choices. Pachur and Hertwig (Reference Pachur and Hertwig2006) found that inferences in line with the RH were slower when additional inconsistent information was present.Footnote 1 They also reported that under time pressure inferences more often followed the RH. The latter, however, was found in a comparison between different experiments and is therefore difficult to evaluate. Hilbig and Pohl (Reference Hilbig and Pohl2009) tested several RT hypotheses that they derived from the RH and contrasted them to an alternative mechanism, namely the difference in evidence (or, in other words, the degree of conflict between the options). In three experiments, they found that most RT results were not compatible with the RH assumptions, but supported the evidence-difference view.
In sum, some more effort should be spent of how to derive predictions for reaction times from the RH, and maybe also for confidence ratings (Glöckner & Bröder, Reference Glöckner and Bröder2011). Having an agreed-upon set of such predictions would help devising experiments, and considering more measures than just choices would better allow to disentangle different explanations.
3.3 Proper conditions of testing the RH
Some of the controversy regarding the RH concerned the proper conditions of testing it, and as a consequence, to refuting some of the critical papers as having not followed those conditions (Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Pachur et al., 2008). For example, Pachur et al. (2008) listed eight criteria in which some of the critical studies deviated from the RH theory. These are (1) induced (rather than natural) recognition, (2) induced (rather than natural) cue knowledge, (3) criterion (instead of cue) knowledge, (4) menu-based inferences (i.e., based on openly given, rather than on memory-retrieved information), (5) domain with low recognition validity, (6) unknown nature of additional cue knowledge, (7) artificial stimuli, and (8) cue knowledge available about unrecognized object.
Firstly, such a list defines and limits the scope of situations where the RH could possibly be tested. This is, on one hand, positive, as it further specifies the RH theory. On the other hand, it restricts the range of potential RH uses to highly specific situations, and thus leads straight to the next question: What does a decision maker do when one, two, or more of these criteria are not met? Are there different heuristics for each of these possible cases? This problem has not been satisfactorily answered yet (see Section 3.8).
Secondly, some of the criteria seem to contradict each other or are at least difficult to control simultaneously. For example, if knowledge may not be learned in the lab, how can the nature of additional cue knowledge be controlled? Nevertheless, Pachur et al. (2008) dismissed the critical findings by Pohl (Reference Pohl2006) for exactly that reason, namely that it was unclear what the additional knowledge (which participants in those studies apparently had used) was based on, possibly including some criterion knowledge. Meanwhile, Hilbig, Pohl, and Bröder (Reference Hilbig, Pohl and Bröder2009) have shown that criterion knowledge indeed plays some role (see also Pachur & Hertwig, Reference Pachur and Hertwig2006), but that the main critical findings of Pohl (Reference Pohl2006) remain intact when it is controlled for.
Thirdly, Goldstein and Gigerenzer themselves presented a number of studies that did not conform to the list provided by Pachur et al. (2008). For example, Goldstein and Gigerenzer (1999) reported a simulation study and an experiment on (artificial) cue learning. In the experiment, participants could even keep their notes (with the learned cue values) and use these during decision making (as “givens”), so that memory retrieval was not necessary. Goldstein and Gigerenzer (Reference Goldstein and Gigerenzer2002) also reported a study in which recognition was experimentally induced by repeatedly testing the same new objects in consecutive sessions (one week apart) which created a sense of (artificial) recognition of these objects in their participants (which does not conform to the criterion of naturally acquired recognition). Or take the case of criterion knowledge. In two of their experiments, Goldstein and Gigerenzer (Reference Goldstein and Gigerenzer2002) used sets of German or U.S. cities including the respective largest cities, but did not discuss the potential role of criterion knowledge. Only in a third study was this problem acknowledged and the three largest cities were excluded from the set. In the same paper, the authors wrote (p. 76): “It is also easy to think of instances in which an object may be recognized for having a small criterion value. Yet even in such cases the recognition heuristic still predicts that a recognized object will be chosen over an unrecognized object.”
This statement directly contradicts the last criterion in Pachur et al.’s (2008) list, but it conforms to Oppenheimer (Reference Oppenheimer2003, p. B3) who stated that the RH should be used “even if the recognized city were known to be small.” This prediction (and the corresponding empirical test), despite its equaling Goldstein and Gigerenzer’s consideration, was later criticized as not fulfilling the proper conditions for testing the RH.
All this is, of course, somewhat confusing and may prevent one from “seeing” the proper criteria. One of the main goals of the most recent RH paper by Gigerenzer and Goldstein (Reference Gigerenzer and Goldstein2011) was therefore to clarify these conditions. They name three central conditions that define the applicability of the RH: (1) a substantial recognition validity, (2) inferences are made from memory (and not from givens), and (3) recognition stems from natural environments (and not artificial manipulations). Applying this list to published papers would indeed lead to dismiss some of the studies (some with supporting, some with critical findings). Of course, “dismissing” experiments does not imply that these were useless. Rather, they should be seen as testing the boundary conditions of the RH (see Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011).
Let me close this section with a short note on one of the relatively undisputed criteria, the recognition validity. The RH is assumed to be useful whenever recognition is a valid cue, but not if it isn’t (see Pohl, Reference Pohl2006, Exp. 1, for supporting evidence). Accordingly, Pachur et al. (2008) dismissed some of the critical studies because recognition validity was apparently low.Footnote 2 The underlying and not yet resolved problem, however, is that there is no such thing as an “objective” recognition validity, in the sense that it reflects properties of the real world. The computed validity always depends on two features, namely (a) the set of objects from a domain and (b) the tested participants. For example, if one takes the 20 largest cities of Italy, or the largest 30, or the largest 40, or the 20 cities on ranks 21 to 40, or 41 to 60, or a random sample from all Italian cities with more than 100,000 inhabitants, the resulting recognition validity will differ (see Hoffrage, Reference Hoffrage2011, for an empirical example). This is why it is important to exactly define the reference class from which the objects are drawn (see, e.g., Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Pachur et al., in press) and to have a rather large set of such objects to avoid the influence of single, “peculiar” objects. In addition, the recognition validity also depends on the sample (e.g., laymen or experts in a domain, or inhabitants from the same country as the cities or from a different one). Given that different persons recognize different objects and different numbers of objects, individual recognition validities will vary. This is fine as long as individual validities are all that is needed. But, to compare data on an aggregate level across experiments, overall recognition validities are necessary. In that case, the mean of individual recognition validities is typically taken as a proxy. But it should be clear that recognition validity represents an abstract concept that is difficult to capture in the real world.
In sum, there has been some debate as to what may count as a proper test of the RH and what rather presents testing its boundaries. The current lists of crucial RH conditions (Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Pachur et al., 2008) help a lot in this regard and should sharpen future research.
3.4 Measures of using the RH
Right from the beginning of doing research on the RH, researchers reported the adherence or accordance rate, that is, the percentage of times a participant chose the recognized object whenever recognition discriminated. These figures were typically high (90% or higher) and when depicted as individual percentages revealed that a large portion of participants almost always chose the recognized object. Respective histogram figures were quite impressive and can be found in many RH papers until today (see Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009; Goldstein & Gigerenzer, Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002; Hertwig et al., Reference Hertwig, Herzog, Schooler and Reimer2008; Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; Pachur et al., 2008; Pachur & Hertwig, Reference Pachur and Hertwig2006; Reimer & Katsikopoulos, Reference Reimer and Katsikopoulos2004). But what do these rates actually tell us? Bröder and Schiffer (Reference Bröder and Schiffer2003) asserted that “simple counting of choices compatible with a model tells us almost nothing about the underlying strategy.” (p. 197) The reason is that we should be careful not to confuse having chosen the recognized object with having applied the RH, because a recognized object may be chosen for a number of reasons, among them recognition. However, other information might have entered the decision process. Typically, retrieved further knowledge about recognized objects correlates positively with the object’s criterion value, that is knowledge is generally confounded with recognition. Thus, without further measures one cannot tell whether inferences were based on recognition, on other knowledge, on recognition plus other knowledge, or on guessing (Hilbig, Reference Hilbig2010a). Hilbig (2010b) demonstrated this obvious, but often neglected fallacy in a convincing way by introducing a non-sense heuristic which nevertheless “explained” a significant proportion of choices. Tomlinson et al. (Reference Tomlinson, Marewski and Dougherty2011) have addressed this problem as one of their main challenges to the current RH research.
Moreover, which role does guessing play? We would assume “guessing” if an adherence rate was 50%, because there was no clear tendency to choose the recognized object more often than the unrecognized one. Let us assume we have an adherence rate significantly above chance, say 70%. Does that mean that the RH was (potentially) followed in 70% of the cases? Probably not, because the remaining percentage (of 100 – 70 = 30%) is likely due to guessing processes (assuming that nothing spoke explicitly against choosing the recognized object), and therefore the same portion of choices conforming to the RH would presumably also have resulted from guessing (30%). Only the remaining (100 – 30 – 30 =) 40% might be indicative of the RH, which is less impressive than the adherence rate of 70%. Of course, if adherence rates are as high as typically reported, guessing apparently plays only a minor role.
Apart from guessing: What happened when someone chose the unrecognized object? Was knowledge involved that spoke against the recognized object? Again, from adherence, or non-adherence, rates we cannot tell. We need further information to understand what actually caused the observed choice behavior. Therefore, other measures that went beyond simple adherence rates were introduced, namely a discriminability parameter based on signal detection theory (d’; Pachur & Hertwig, Reference Pachur and Hertwig2006) and a discrimination index (DI; Hilbig & Pohl, Reference Hilbig and Pohl2008). Pachur and Hertwig focused on how well participants can discriminate between the recognized object representing a correct or false inference. A correctly chosen recognized object would then represent a hit, a falsely chosen recognized one a false alarm. From these proportions, they computed d’ as an estimate of a participant’s discrimination ability. This index should be zero if only recognition was used. But it wasn’t, suggesting that participants were to some extent able to distinguish between valid and invalid RH-based inferences. In a similar vein, the DI computes how often the recognized object was chosen when it was in fact the correct choice, minus the number of choices when it was the false one. This index should be zero, if participants use only recognition and can thus not discriminate between recognized objects being correct or false. If the index is different from zero (as Hilbig & Pohl, Reference Hilbig and Pohl2008, consistently found), some further information in addition to or instead of recognition must have been used. When applied to individual data, the DI suggested that the majority of participants did not use the RH.
In a recent attempt to overcome the problems of adherence rates, Hilbig, Erdfelder, and Pohl (Reference Hilbig, Erdfelder and Pohl2010) proposed and validated a multinomial processing tree model, named the r-model, as a measurement tool to yield bias-free estimates for the probability of RH use. Their general result was that these estimates are significantly smaller than adherence rates suggest, but still significantly above chance (see also Hilbig et al., 2011, for an extension of the r-model to measure use of the FH). Hilbig (Reference Hilbig2010a) compared the different measures of RH use (adherence, d’, DI, and r) in simulation studies and found that the r-model delivered the best results.Footnote 3 But note that the r-model is simply a measurement tool and not a theoretical model, that is, it does not explain why people did or did not use the RH in their inferences. It only estimates the respective frequencies.
In sum, while accordance rates as a measure of RH use appear faulty because they are confounded, other measures have been introduced that allow better estimates of how often the RH was used. The r-model provides the latest of these measures and could prove a helpful tool in testing the RH.
3.5 Reasons for not using the RH
One argument that could be used to explain evidence that is contradictory to the RH is to assume that people decide in each case whether the RH would be the best strategy to apply. If not, they use some other strategy. Pachur and Hertwig (Reference Pachur and Hertwig2006, p. 993) stated that “people appear to decide case by case whether they will obey the recognition heuristic. Moreover, these decisions are not made arbitrarily but demonstrate some ability to discriminate between cases in which the recognition heuristic would have yielded correct judgments and cases in which the recognition heuristic would have led astray.”
They also assumed that the RH is typically chosen as the default strategy in recognition cases (i.e., whenever one object is recognized and the other not), but that it can be “suspended” for a number of reasons and thus not applied to the current case. The reasons for suspending the RH include (1) availability of probabilistic cues with larger validities than the recognition validity; (2) source knowledge (i.e., knowing that an object is recognized for other reasons than its criterion value; e.g., Chernobyl is recognized by most people, but not because of its size, but because of the nuclear accident in 1986); and (3) conclusive criterion knowledge. These reasons could explain why the RH is not applied in every single case.
The third reason is probably the most obvious one. If criterion knowledge is available, that is, knowledge that allows a direct conclusion whether or not the recognized city is small or large, the decision (for or against the recognized city) can be directly deduced from the available knowledge. A probabilistic inference such as the RH will then be superfluous.Footnote 4 But the problem for this and the first two potential reasons for suspending the RH is conceptual: Before the RH can be applied, all available knowledge needs to be retrieved and scanned whether it contains anything that speaks against applying the RH. Thus, memory search cannot stop as soon as recognition is assessed as the RH assumed. Accordingly, Pachur and Hertwig (Reference Pachur and Hertwig2006) suggested a two-stage-process, in which recognition is followed by an evaluative step that determines whether the RH should be applied (see also Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009, p. 132; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010).
Just recently, Marewski, Gaissmaier, Schooler et al. (Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010) added another reason why the RH might not get applied. They assumed that the retrieval time (i.e., the time to decide whether an object is recognized or not) can be used as a cue: When the recognized object is retrieved fast, persons should go more often with the RH than when it is retrieved slowly. The rationale for this is that objects with further available cue knowledge are typically retrieved (recognized) faster than objects without additional knowledge. And since additional knowledge more often speaks for the recognized object than against it, it would be wise to go with the RH. In other words, faster recognition times (fluency) could simply be taken as a proxy for the existence of additional information that speaks for the recognized object. A slow retrieval, however, would signal that no additional information is available that would possibly speak for the recognized object. In this case, one should hesitate to go with recognition and thus not use the RH.Footnote 5
Two of the given reasons for suspending the RH have an important implication. If an inference is based on a more valid knowledge cue or on a slow recognition time, leading to suspending the RH, this inference may nevertheless choose the recognized object. It is thus clear that simple adherence rates generally overestimate use of the RH and that it depends on the proportions of these other cases as to how much its use is overestimated (see Section 3.4 and Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Hilbig, Reference Hilbig2010b; Hilbig, Erdfelder, & Pohl, Reference Hilbig, Erdfelder and Pohl2010; Hilbig & Pohl, Reference Hilbig and Pohl2008).
In sum, the decision process is more complicated than previously assumed. Besides, proposing that, before the RH is applied, memory is searched in order to check whether anything better can be found or whether something speaks against using the RH, appears tantamount to saying that recognition is used as a cue whenever nothing better is available. But then the RH is no longer a shortcut, intentionally ignoring other, potentially useful information. Moreover, similar arguments may apply to the FH, where it needs to be checked whether fluency can be used as a cue or whether it should be attributed to some other source (not related to the criterion) and therefore discarded.
3.6 The RH as a non-compensatory strategy
As I have discussed above (in Section 3.4), choosing the recognized object is not identical with basing one’s decision on recognition alone (see Hilbig, Reference Hilbig2010a, 2010b). In natural environments, recognition and knowledge are most likely confounded, that is, the probability of recognition increases with the cities’ size and so does cue knowledge that speaks for the cities’ largeness. Pohl (Reference Pohl2006), for example, reported differences in adherence rates, depending on (a) whether participants merely recognized the object’s name or knew more about it, and (b) whether the recognized object was actually the correct choice or not. When participants knew more about the recognized object or when it represented the correct choice, they chose it consistently and significantly more often (see also Hilbig & Pohl, Reference Hilbig and Pohl2008; Newell & Fernandez, Reference Oeusoonthornwattana and Shanks2006; Oeusoonthornwattana & Shanks, Reference Oeusoonthornwattana and Shanks2010; Pachur et al., 2008; Richter & Späth, Reference Richter and Späth2006). These results suggest that more (or something different) than recognition was used in these inferences and it remains unclear how often (or whether at all) an inference was based on recognition alone. Note that these observations could be based on two cases: (1) The recognized object was chosen for another reason than recognition alone; and (2) the unrecognized object was chosen despite non-recognition. The latter could possibly represent compensatory inferences, where the decision based on recognition was overturned by another cue (that spoke for the unrecognized or against the recognized object), thus suggesting that something different than the non-compensatory RH was used in these cases. But they could also result from yet some other non-compensatory mechanism not considering recognition at all. One approach to tackle these problems is to formulate and test respective compensatory and non-compensatory models (see Marewski, Gaissmaier, Schooler, et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010).
In other cases, even if evidence that is contradictory to the recognition cue is retrieved and considered, the impact of this evidence might be too weak to overturn the inference based on recognition. This could be a matter of (subjective) validity. If recognition has a high validity and the additional knowledge a low one, this additional knowledge will most likely fail to dominate the decision. So, only if the additional cue’s validity is large enough, it could eventually overrule recognition. Similarly, Pachur and Hertwig (Reference Pachur and Hertwig2006) argued that one possible reason for not using the RH would be if an additional probabilistic cue had a larger validity than the recognition information (see also Newell & Shanks, Reference Newell and Shanks2004). In sum, given that the RH applies only in situations with large recognition validities, situations in which the validity of additional knowledge is even larger could be rare. Accordingly, most decisions in such domains might indeed be non-compensatory based on recognition only.
One way to test this assumption is to experimentally control participants’ cue knowledge by introducing additional knowledge cues which validly contradict the recognition cue and then to observe the according choices (Goldstein & Gigerenzer, Reference Goldstein and Gigerenzer2002; Newell & Fernandez, Reference Newell and Fernandez2006; Pachur et al., 2008; Richter & Späth, Reference Richter and Späth2006). This procedure, however, would not conform to the “proper” criteria as defined above and might not get accepted as a test of the RH (see Section 3.3). A summary of such research is given in Pachur et al. (in press). The evidence suggests that, on an aggregate level, mean adherence rates drop somewhat when additional, contradictory evidence is present, but that the effect is much smaller when analyzed on an individual level, showing that a large portion of participants chooses the recognized object irrespective of any contradictory evidence. Only some participants apparently change their strategy.Footnote 6
3.7 Evidence for a Less-is-more effect (LIME)
The LIME is defined as a pattern of results in which recognition of fewer objects leads to more accurate inferences than recognition of more objects does. One question concerns the conditions under which such an effect is predicted. Goldstein and Gigerenzer (Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002) argued that (1) the recognition validity α must be higher than the knowledge validity β and that (2) α and β remain constant across the number of recognized objects (n). But, after presenting a simulation study, they added (Goldstein & Gigerenzer, Reference Goldstein and Gigerenzer2002), that “the simplifying assumption that the recognition validity α and knowledge validity β remain constant is not necessary for the less-is-more effect to arise.” (p. 81)
The classical example of the LIME (with three Scottish brothers, or Parisian sisters; Goldstein & Gigerenzer, Reference Goldstein, Gigerenzer, Gigerenzer and Todd1999, 2002) was unfortunately not too enlightening in this respect. The authors assumed that Brother A recognizes none of the, say 20, objects (n = 0), Brother B recognizes half (n = 10), and Brother C all (n = 20). Now consider the α and β values of these three persons. Brother A, recognizing no object, has to guess all the time, that is, he has neither a recognition nor a knowledge validity (both are not defined in this case). For Brother B the authors assume, for example, a recognition validity of .80 and a (lower) knowledge validity of .60. For Brother C, the recognition validity cannot be determined, because he recognizes all objects. His knowledge validity is also assumed to be .60. In sum, only one brother has a recognition validity (nothing can be known about the other two), and two brothers are assumed to have the same knowledge validity (nothing can be known about the third one). Thus it remains unclear from these examples, too, how α and β actually behave or should behave relative to n (see Dougherty et al., Reference Dougherty, Franco-Watkins and Thomas2008, for a discussion of further critical details, and Gigerenzer et al., Reference Gigerenzer, Hoffrage and Goldstein2008, for a reply).
Meanwhile, a number of studies extended the originally formulated conditions. One finding is that people’s memory sensitivity to distinguish between recognized and non-recognized objects should be high (Pleskac, Reference Pleskac2007; see Section 3.1), another that decision makers should actually behave as the RH assumes (see Hilbig et al., Reference Hilbig, Pohl and Bröder2009). Pachur (Reference Pachur2010) tested the above mentioned validity dependencies (i.e., the correlations between n and both validities α and β) in computer simulations and found that they could have a strong limiting effect on the LIME. Katsikopoulos (Reference Katsikopoulos2010) showed that the relation α > β is not a necessary precondition (see also Beaman, Smith, Frosch, & McCloy, Reference Beaman, Smith, Frosch and McCloy2010; Davis-Stober, Dana, & Budescu, Reference Davis-Stober, Dana and Budescu2010; and Smithson, Reference Smithson2010; for still other variants of the LIME). Thus, there appear to be several situations in which a LIME may occur. Theoretically, the LIME could be of quite a large size. Assuming strict adherence to the RH and extreme values, namely a recognition validity of 1.0 and a knowledge validity of .50, the effect reaches its maximum with a difference of 26.3% (Pohl, Reference Pohl2006), that is, a person recognizing all objects will show a percentage of correct inferences that is 26.3% below the performance of someone who recognizes less, but just the right number of objects so that the high recognition validity can be most effective (in this case, the optimal number would be to recognize 50% of the objects).
Another question is whether the LIME has been shown empirically so far. Most of the manifestations are based on simulations only. Goldstein and Gigerenzer (Reference Goldstein and Gigerenzer2002) thus admitted that “the curious phenomenon of a less-is-more effect is harder to demonstrate with real people than by mathematical proof or computer simulation.” (p. 83) In one of their studies, they interpreted a performance difference of 0.3% as showing a slight LIME (without reporting a statistical test). In a second study, they used experimentally induced recognition and found a significant LIME of 3.5%. However, that induction procedure was later criticized by themselves as well as by Pachur et al. (Reference Pachur, Bröder and Marewski2008) as not conforming to the proper RH conditions (see also Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; and Section 3.3). Pohl (Reference Pohl2006) computed theoretically possible LIMEs in eight data sets and found that the LIME was not predicted in four sets (because α ≤ β) and rather small in the remaining sets (ranging from 2.2 to 8.0%). Computing the real LIME was unfortunately not possible, because the range of recognized objects was too small (but see Pachur, Reference Pachur2010, who computed predicted accuracy curves for those data). In Exp. 3, Pohl (Reference Pohl2006) compared different domains (namely Belgian, Italian and German cities). Participants had mean recognition rates of 6.6, 9.5, and 11.0 (out of 11 cities each) for these three domains, yet performance increased significantly with the number of recognized cities, that is, it showed a “more-is-more” effect (see also Pachur & Biele, Reference Pachur and Biele2007).
Using a design in which inferences were recorded from groups rather than individual persons, Reimer and Katsikopoulos (Reference Reimer and Katsikopoulos2004) reported cases of LIMEs ranging from 2 to 8%, but without reporting statistical tests. Besides, they used a rather lax criterion to define a LIME. Whenever there exist two persons (or in this case, groups) with different numbers of recognized objects, n 1 and n 2, such that n 1 < n 2, then a LIME is said to occur if the performance is higher for n 1 than for n 2 (see also Pachur, Reference Pachur2010). The problem is that such cases simply must occur just by chance (unless individual performance data are perfectly monotonically ordered along values of n). For example, Pachur et al. (in press) cited results from Snook and Cullen (Reference Snook and Cullen2006) as showing a LIME, but they had picked two single participants out of the sample (see Fig. 5 of Snook & Cullen, Reference Snook and Cullen2006), with one participant having the highest percentage of correct inferences (86%) and recognizing about half the objects, and the other one recognizing the most objects, but performing less well (76%). Hence, these two persons “show” a LIME. Such selective comparisons appear questionable as long as they are not guarded against chance results. The Snook and Cullen (Reference Snook and Cullen2006; Figure 5) data nicely demonstrate this problem as it is easy to find pairs of persons with the opposite pattern. For example, when one picks the two persons with the highest number of recognized objects, they show a clear “more-is-more” effect. In a further analysis, Pachur (Reference Pachur2010) again used the data from Snook and Cullen (Reference Snook and Cullen2006), but ran a regression analysis. The results suggested a quadratic relationship between the number of recognized objects and accuracy, which would be indicative of a LIME (but see Figures 2 and 5 of Pachur, Reference Pachur2010).Footnote 7
In sum, several recent studies have more deeply explored the conditions under which a LIME could theoretically be expected, thus extending earlier formulations of this phenomenon. The empirical evidence for a LIME, however, remains scarce with the reported effects mostly being of minor size. Perhaps it is difficult to find real domains that exactly possess those conditions that theoretically foster a LIME.
3.8 The RH as part of the toolbox
In typical experiments using paired comparisons (e.g., with city names), participants answer a series of such comparisons and infer which of the two objects in each pair is the larger one. For example, in a set of 20 objects and with all possible pairwise combinations, participants work through some 190 trials. Given that not all objects are recognized or not all are unrecognized, there will be different types of pairs, or “cases”, depending on how many of the objects in a pair are recognized: (1) Recognition cases, in which one object known and the other not. These cases represent the central ones for studying the RH. (2) Guessing cases, consisting of two unknown objects, such that persons have nothing left but to guess (or to infer probabilistic cues from the names of the objects, e.g., to which country a city might belong, thus allowing inferences about its size). Recognition is not helpful here, because none of the objects is recognized. (3) Knowledge cases, consisting of pairs in which both objects are known. Again, recognition is of no help since both are recognized. In this case, other knowledge has to be assessed to reach a decision. This could be the fluency of retrieving the objects from memory, such that persons infer that the faster retrieved object is possibly the larger one (FH; Schooler & Hertwig, Reference Schooler and Hertwig2005; Hertwig et al., Reference Hertwig, Herzog, Schooler and Reimer2008; but see Hilbig et al., Reference Hilbig, Erdfelder and Pohl2011, for conflicting findings). Or, if fluency is similar for both objects, further cue knowledge must be invoked. Here, still another heuristic, namely Take-the-Best (TTB), comes into play. According to TTB, knowledge cues are searched one by one following their cue validity. As soon as one cue discriminates between the two objects, search will stop and the decision will be made based on that cue. Again, further knowledge is ignored. If all fails, one must guess.
This cascaded decision tree is depicted as a flow chart in Figure 1 (see Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein1996, Fig. 2; Schooler & Hertwig, Reference Schooler and Hertwig2005, Tables 1 and 2; for similar descriptions). The areas surrounded by dashed lines represent the three different heuristics involved: the upper area includes the RH, the middle area the FH, and the bottom area the TTB heuristic. Note that this chart is only meant to be a summary of all potential decision steps, and not a strictly serial process model of how a decision maker actually proceeds. The chart nevertheless shows that the studied paired comparisons are more complicated than each of the “fast and frugal” heuristics when viewed as a single strategy suggests.
Even more complicating, according to the recently proposed evaluation stage, memory has to be searched in every single recognition case whether any information is available that would argue against using the RH (see Section 3.5; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; Pachur & Hertwig, Reference Pachur and Hertwig2006). These evaluative processes are not shown in Figure 1, but they—together with the other two heuristics (FH and TTB), for which similar evaluative processes may hold (see Marewski, Gaissmaier, & Gigerenzer, Reference Marewski, Gaissmaier and Gigerenzer2010)—make it questionable whether deciding which of two objects is larger still appears as “fast and frugal” as the FFH approach originally assumed.
Inherent in this description is also one of the main problems of the toolbox approach (see Glöckner, Betsch, & Schindler, Reference Glöckner, Betsch and Schindler2010; Newell, Reference Newell2005; Newell & Lee, Reference Newell and Lee2010): How does one know when to take which heuristic? This corresponds to Challenge 3 postulated by Tomlinson et al. (Reference Tomlinson, Marewski and Dougherty2011) and also to one of the five main research questions posited by Marewski, Schooler, and Gigerenzer (Reference Marewski, Schooler and Gigerenzer2010). Goldstein and Gigerenzer (Reference Goldstein and Gigerenzer2002; see also Gigerenzer & Gaissmaier, Reference Gigerenzer and Gaissmaier2011) suggested that the knowledge which strategy should be applied in which situation might be either (a) genetically coded, (b) socially or culturally transmitted, or (c) learned individually (see also Rieskamp & Otto, Reference Rieskamp and Otto2006). While these mechanisms seem plausible, they also remain somewhat vague yet, so that the strategy-selection problem is certainly an area in which more research is needed.
The heuristic selection in experimental studies is further complicated by the fact that this decision has to be made anew for each of the, say, 190 trials (given 20 objects and all combinations), in which the different types of pairs appear in random order. One cannot a priori stick to the same heuristic for the next trial. This makes the repeated traversing through some or all potential decision steps (as depicted in Figure 1) look quite strenuous.Footnote 8
4 Conclusions
In this paper, I started with a short description of the development of the theory underlying the recognition heuristic (RH) and then discussed at length some of its controversial issues. Note that the selection of these issues and their handling reflects my personal preferences and opinions. As such, this paper was not intended to be “neutral”, although I nevertheless strove for a (sometimes more, sometimes less) balanced presentation. Others, no doubt, would have focused on other topics and would presumably have come to other conclusions (see, e.g., Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009; Gigerenzer & Gaissmaier, Reference Gigerenzer and Gaissmaier2011; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011; Marewski, Gaissmaier, & Gigerenzer, Reference Marewski, Gaissmaier and Gigerenzer2010; Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; Marewski, Schooler, and Gigerenzer, Reference Marewski, Schooler and Gigerenzer2010; Tomlinson et al., Reference Tomlinson, Marewski and Dougherty2011). What I think all involved researchers would agree upon is that the RH and the FFH framework represent an enormous advantage over previous conceptions. The precise formulations and sometimes bold predictions, moreover, fueled not only the described debate but also a wealth of empirical research, leading to the development of new methods and theoretical ideas. For example, the RH was extended (a) from single participant’s inferences to a “wisdom-of-crowd” measure (Gaissmaier & Marewski, Reference Gaissmaier and Marewski2011; Herzog & Hertwig, Reference Herzog and Hertwig2011); (b) from paired comparisons to multi-alternative decisions (Frosch, Beaman, & McCloy, Reference Frosch, Beaman and McCloy2007; Marewski, Gaissmaier, Schooler et al., Reference Marewski, Gaissmaier, Schooler, Goldstein and Gigerenzer2010; McCloy, Beaman, & Smith, Reference McCloy, Beaman and Smith2008); or (c) from inferences to preferences (Oeusoonthornwattana & Shanks, Reference Oeusoonthornwattana and Shanks2010). As such, the whole field has certainly benefited.
Let me summarize the empirical findings on the RH with a quote from Pachur et al. (Reference Pachur, Bröder and Marewski2008, p. 205) who stated that “it is now clear that the recognition heuristic—in particular in terms of the hypothesized non-compensatory use of recognition—is not used by all people all the time and under all circumstances.” And that (p. 206) “individuals appear to differ greatly in their reliance on recognition for inferences.” These conclusions may also lead the way to future research, that is, to further define the influences of domains, tasks, and individual characteristics on which strategy is preferred in which situation. Thus, one viable and legitimate question, asked from the early days of the FHH approach, is which heuristics like the RH are suited for which domains and tasks. This certainly helps to define the boundary conditions of the RH and any other heuristic. Another question, although also asked from the beginning, namely that of individual differences, might be more difficult to answer. First of all, if the “adaptive” use of such decision strategies like the RH reflects environmental regularities, why should individuals differ so much in their perception or evaluation of these regularities? Why should, within the same domain, some people rely on recognition almost always, and others only occasionally (as has been reported in some studies)? This in my view is still somewhat puzzling, although some preliminary and tentative answers regarding individual difference in use of heuristics have meanwhile appeared (e.g., Bröder, Reference Bröder2003; Hilbig, Reference Hilbig2008; Pachur, Mata, & Schooler, Reference Pachur, Mata and Schooler2009).
In sum, the general question concerning the RH would then not be to ask whether it is used but rather when and by whom it is used (see, e.g., Gigerenzer & Brighton, Reference Gigerenzer and Brighton2009; Hilbig, Erdfelder, & Pohl, Reference Hilbig, Erdfelder and Pohl2010; Hilbig, Scholl, & Pohl, Reference Hilbig, Scholl and Pohl2010; Pachur & Hertwig, Reference Pachur and Hertwig2006; Pohl, Reference Pohl2006). When phrased in such a way, the current controversy surrounding the RH looses much of its impetus and one may wonder why such a simple question has raised so many debates. One answer could be that some have not stopped there and have instead questioned the RH as a valid tool and finally the whole FFH approach (see Dougherty et al., Reference Dougherty, Franco-Watkins and Thomas2008; Fiedler, Reference Fiedler2010; Glöckner & Betsch, Reference Glöckner and Betsch2008a; Glöckner et al., Reference Glöckner and Betsch2010; Hilbig, Reference Hilbig2010b; Newell, Reference Newell2005). One reason for such a fundamental critique may be grounded in the FFH’s central assumption that there are a number of different tools available from which the decision maker has to choose the appropriate one that best fits a given environment. Moreover, according to the theory, once a potentially useful strategy is identified, the decision maker has to check whether any reason would speak against using that otherwise optimal tool. Pachur and Hertwig (Reference Pachur and Hertwig2006) listed a number of such reasons why the RH could be “suspended” (see Section 3.5; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein2011). All these reasons may lead to the selection, evaluation, and (finally) application of other tools. Such a series of strategy selection and evaluation steps, repeatedly for each single trial, seems quite cumbersome (see Section 3.8) and too complicated to work in practice. This is especially so, if numerous paired comparisons have to be made in a row (e.g., 190 different ones for a set of 20 objects), as is typical for the experiments that have been run to study these heuristics.
In addition to this more theoretical argument, some researchers came to a negative conclusion regarding the FFH approach when summarizing the available empirical evidence.
For example, Hilbig (Reference Hilbig2010b, p. 923) concluded that “the empirical evidence available does not warrant the conclusion that heuristics are pervasively used.” Similarly, Fiedler (Reference Fiedler2010, p. 22) asserted that “it seems fair to conclude that strict empirical tests have resulted in a more critical picture of the validity and scope of the postulated heuristics.”
In this situation, alternative conceptions that posit fewer or only one mechanism instead of multiple tools have been proposed and may gain ground (see Hilbig & Pohl, Reference Hilbig and Pohl2009). Without going into too much detail I mention only two, namely the evidence-accumulation models, reappearing in the “adjustable-spanner” metaphor (Lee & Cummins, Reference Lee and Cummins2004; Newell, Reference Newell2005; Newell, Collins, & Lee, Reference Newell, Collins, Lee, McNamara and Trafton2007; Newell & Lee, Reference Newell and Lee2010), and the recently proposed “Parallel Constraint Satisfaction” network model (PCS; Glöckner & Betsch, Reference Glöckner and Betsch2008a, 2008b; Glöckner et al., Reference Glöckner, Betsch and Schindler2010; Glöckner & Bröder, Reference Glöckner and Bröder2011). One of the major advantages of these approaches is that they can be applied to all comparison types (not just recognition cases) and also easily combine compensatory and non-compensatory use of probabilistic cues within the same architecture and thereby avoid the need to change tools from one trial to the next. For example, Glöckner and Bröder (Reference Glöckner and Bröder2011) tested the RH against the PCS, albeit in a different situation than the RH was proposed for, namely with cue values openly available to the participants (as “givens”) and also for unrecognized alternatives. Using a maximum-likelihood classification method (including choices, response times, and confidence ratings) the authors found that 77.5% of their participants’ behavior was best explained by the PCS strategy and that only a small portion of participants (up to 7.5%) were classified as RH users. Newell and Lee (Reference Newell and Lee2010) also used a “givens”-procedure and tested a sequential evidence-accumulation approach (SEQ) against TTB. Using a minimum-description-length criterion (to account for the different complexities of the models), they reported that the pattern of results was best captured by their SEQ model treating TTB as a special subcase. Comparing these alternative models to the toolbox approach then really is a bigger controversy (than just discussing the rate of RH use). Of course, it remains to be seen how these alternatives succeed in the originally proposed inferences-from-memory situation (but see Hilbig & Pohl, Reference Hilbig and Pohl2009). Thus it is still too early to draw any further conclusions about how good these alternatives will fare in the end. But is is quite clear that there is more, perhaps even more fundamental, debate to come in the near future (see, e.g., Glöckner & Betsch, Reference Glöckner and Betsch2010; Marewski, Reference Marewski2010).