Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-22T04:27:47.825Z Has data issue: false hasContentIssue false

Growth From Uncertainty: Understanding the Replication ‘Crisis’ in Infant Cognition

Published online by Cambridge University Press:  15 November 2023

Jane Suilin Lavelle*
Affiliation:
School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK
Rights & Permissions [Opens in a new window]

Abstract

Psychology is a discipline that has a high number of failed replications, which has been characterized as a “crisis” on the assumption that failed replications are indicative of untrustworthy research. This article uses Chang’s concept of epistemic iteration to show how a research program can advance epistemic goals despite many failed replications. It illustrates this by analyzing an ongoing large-scale replication attempt of Southgate et al.’s work exploring infants’ understanding of false beliefs. It concludes that epistemic iteration offers a way of understanding the value of replications—both failed and successful—that contradicts the narrative centered around distrust.

Type
Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Philosophy of Science Association

1. The crisis

“Don’t trust everything you read in the psychology literature. In fact, two thirds of it should probably be distrusted” (Baker Reference Baker2015, 1). Thus opens a report in the journal Nature, commenting on the findings of the Open Science Foundation project, which conducted replication attempts of 100 psychology experiments and reported that only “39% of effects were subjectively rated to have replicated the original result” (Open Science Collaboration 2015, 943) Such claims lie at the foundation of a crisis in confidence in the field, whereby the failure of findings to replicate is often taken to imply (tacitly or otherwise) that they are false. The characterization of this mass failure of reproducibility of psychological findings as a “crisis” rests on the assumption that “replication is one of the most important tools for the verification of facts within the empirical sciences” (Schmidt Reference Schmidt2009, 90). Under such a characterization, those findings that can be repeated by different researchers in different laboratories can be considered verified facts, and those that cannot are dismissed as coincidental or the result of bad scientific practice (Loscalzo Reference Loscalzo2012; McNutt Reference McNutt2014; Nosek et al. Reference Nosek, Hardwicke, Moshontz, Allard, Corker, Dreber, Fidler, Hilgard, Kline Struhl, Nuijten, Rohrer, Romero, Scheel, Scherer, Sch¨onbrodt and Vazire2022; Simons Reference Simons2014). Subsequently, those fields that have a higher rate of failed replications are considered less trustworthy than those that have lower rates. Thus, the high rate of replication failure in psychology constitutes, in this diagnosis, a crisis, in that the work produced by its researchers is considered to be unreliable.

The assumption that the successful replication of experiments distinguishes “trusted” from “untrusted” science has not gone unchallenged by philosophers, many of whom have argued that a high rate of replication failure can be perfectly compatible with responsibly conducted, high-quality science (Bird Reference Bird2021; Feest Reference Feest2019; Fletcher Reference Fletcher2021; Irvine Reference Irvine2021; Lavelle Reference Lavelle2022; Leonelli Reference Leonelli, Fiorito, Scheall and Suprinya2018; Schickore Reference Schickore2011). This article offers a new addition to this counteroffensive. It argues that when researchers are working in fields that we don’t yet know very much about, failed replications are not only to be expected but are necessary to furthering our understanding. I demonstrate this by a novel application of Hasok Chang’s (Reference Chang2004, Reference Chang2012) framework of “epistemic iteration” to a very live and controversial puzzle in infant cognition, namely, whether babies can attribute false beliefs to others. Chang’s aim is to show how progress can be made even when our starting point is shrouded in uncertainty, and I argue that the unfolding of the infant false-belief research program exemplifies this. Furthermore, Chang’s notion of scientific progress gives a front-and-center place to the idea that there are always multiple epistemic goals in play. Although this is not a new idea, its emphasis helps us to see how even though failed replications may not be informative about the hypothesis under consideration, they nevertheless contribute to other epistemic aims, such as the validation or calibration of measurements or the refinement of concepts (see sec. 3; see also Van Dongen et al. Reference Van Dongen, Van Bork, Finnemann, Haslbeck, Van Der Maas, Robinaugh, De Ron, Sprenger, Borsboom, Machery, Chase, Makovec, Koberinski, Hui Choi, Elber, Krempel and Blanken2022). Finally, the article uses the case study to illustrate one of Chang’s most important contributions: that our scientific inquiries have to start somewhere. With hindsight, that starting point may look terribly bad. But in order for hindsight to occur, the starting point needs to be there. This is why failed replications are a necessary and expected part of good science: they are needed in order for the epistemic gains to be made that move us forward. A narrative of failed replications centered around “distrust” not only masks these gains but also runs the risk of losing them altogether by casting dismissive doubt on the value of those fields currently experiencing high rates of failed replication.

2. Anticipatory looking: A case study

2.1. Children, babies, and the false-belief task

The field of infant psychology is one that, I believe, is currently experiencing a large amount of uncertainty in some of its methods of measurement while also grappling with conceptual questions about how to characterize the phenomena such methods intend to measure. Nowhere is this more manifest than in research examining infants’ abilities to attribute psychological states to other agents. On the one side, there are high-stakes debates about the nature of the psychological states that infants attribute to others and, in particular, whether they can attribute false beliefs to them. On the other side, there is a growing awareness that the methods of measurement, in particular, those that rely on infants’ spontaneous looking behaviors, are not as well understood as previously thought. Much of the key work in this field concerns preverbal infants Footnote 1 who have “limited attention spans, processing capacities and fine and gross motor skills” (Kominsky et al. Reference Kominsky, Lucca, Thomas, Frank and Hamlin2022, 1). Consequently, most experimental paradigms rely on indirect measures to explore infants’ cognitive capacities, for example, by measuring how long a baby looks at a particular event or where the baby looks. Some of the established causes for low rates of replication in the psychological sciences are attributed to small sample sizes leading to low statistical power, the specialized nature of the equipment required, and a lack of standardization across measurements (Asendorpf et al. Reference Asendorpf, Conner, De Fruyt, De Houwer, Denissen, Fiedler, Fiedler, Funder, Kliegl, Nosek, Perugini, Roberts, Schmitt, Van Aken, Weber and Wicherts2013; Collins Reference Collins1985; Nosek et al. Reference Nosek, Hardwicke, Moshontz, Allard, Corker, Dreber, Fidler, Hilgard, Kline Struhl, Nuijten, Rohrer, Romero, Scheel, Scherer, Sch¨onbrodt and Vazire2022). Infant psychology is a field afflicted by all these factors, plus the additional problem of incredibly sensitive and temperamental participants (Byers-Heinlein et al. Reference Byers-Heinlein, Bergmann, Davies, Frank, Kiley Hamlin, Kline, Kominsky, Kosie, Lew-Williams and Liu2020; Frank et al. Reference Frank, Bergelson, Bergmann, Cristia and Floccia2017; Lavelle Reference Lavelle2022; Peterson Reference Peterson2016). It is therefore unsurprising that there have been multiple studies in the field that researchers have had trouble replicating. This case study focuses on one such replication project concerning infants’ understanding of other people’s psychological states.

For decades, it was widely accepted that children could not successfully attribute false beliefs to other people until around their 4th birthday. This was due to their performance on elicited-response false-belief tasks. In the original elicited-response false-belief task (Wimmer and Perner Reference Wimmer and Perner1983), children watch a puppet, Maxi, hide some chocolate in one of two cupboards. Maxi leaves the chocolate in cupboard X and goes out to play. In his absence, his mother enters and moves the chocolate from cupboard X to cupboard Y. She leaves and Maxi returns, and then the child is asked where Maxi will look for his chocolate. Three-year-olds overwhelmingly respond that Maxi will look in cupboard Y, that is, where the chocolate really is and not where Maxi believes the chocolate to be. Around 4 years of age, children correctly answer that he will look in cupboard X. The authors explained their result with the hypothesis that 3-year-old children are limited in their ability to attribute psychological states to other people and are unable to attribute false beliefs to others, whereas 4-year-old children have developed this ability. This task, and those like it, is an elicited-response task because it requires the child to respond to a question asked by the experimenter: “Where will Maxi look for his chocolate?”

This result for the elicited-response false-belief task has been replicated hundreds if not thousands of times. It was therefore groundbreaking when Kristine Onishi and Renée Baillargeon published an article in 2005 arguing that 15-month-olds showed evidence of attributing false beliefs to others. Because 15-month-olds cannot participate in elicited-response tasks, the researchers used a spontaneous-response paradigm that measured how long an infant looked at an event in which an agent acted in a way that matched with their (the agent’s) belief, in contrast to events in which the agent acted in a way that did not match with their belief. This is the violation-of-expectation paradigm, which works on the premise that infants look longer at events that surprise them (i.e., that violate their expectations of what they predict will happen) than they do at events that match their expectations. They reported that infants would look longer at those test trials where the actor did not act in accordance with her (the actor’s) belief about a toy’s location, regardless of whether that belief was true or false, making the following claim:

‘Whether the actor believed the toy to be hidden in the green or the yellow box and whether this belief was in fact true or false, the infants expected the actor to search on the basis of her belief about the toy’s location. These results suggest that 15-month-old infants already possess (at least in a rudimentary and implicit form) a representational theory of mind: They realize that others act on the basis of their beliefs and that these beliefs are representations that may or may not mirror reality.’ (Onishi and Baillargeon Reference Onishi and Baillargeon2005, 257)

Naturally, this article caused quite a stir, disrupting the “developmental dogma” of the previous 20 years that children below the age of 4 years could not attribute false beliefs to others (Rakoczy Reference Rakoczy2017). Until this point, the dominant conceptual frameworks had been designed to explain the developmental dogma; now these theories were hastily reconfigured to explain the new “developmental gap” in performance between infants’ responses on spontaneous-response tasks and children’s performance on elicited-response tasks. Onishi and Baillargeon’s work was succeeded by a slew of research using a variety of spontaneous-response methods to test infants’ understanding of false beliefs, with a recent statement from Rose Scott and colleagues that “over thirty reports, using eleven different behavioral and neural methods, have yielded positive evidence of early false-belief understanding in non-traditional [i.e., spontaneous] tasks” (Scott et al. Reference Scott, Roby and Baillargeon2022, 258). This article follows the replication attempts of a spontaneous-response task originally created by Victoria Southgate and colleagues (Reference Southgate, Senju and Csibra2007). Footnote 2 This task uses the “anticipatory looking” (AL) paradigm, which is based on the premise that babies will look to where they expect an agent to go before they see that agent’s movements. Therefore, if babies expect agents to behave in ways that are congruent with their (the agent’s) beliefs, they should look to where an agent will look for an object based on where that agent believes the object to be. The AL paradigm forms the basis of my case study because there are multiple documented replication attempts, many of which use Southgate’s stimuli.

At this point, an important disclaimer is in order. Onishi and Baillargeon, Southgate et al., and many others take the results of spontaneous-response false-belief tasks to support the hypothesis that infants can attribute false beliefs to others. This is a controversial explanation of the data. Other hypotheses abound: that infants’ looking behavior evidences the ability to track behavioral patterns in other agents, but they do not attribute psychological states to them (Heyes Reference Heyes2014a, Reference Heyes2014b; Santiesteban et al. Reference Santiesteban, Catmur, Hopkins, Bird and Heyes2014), or that infants attribute psychological states to others that are similar to beliefs but that differ by being nonrepresentational (Apperly and Butterfill Reference Apperly and Butterfill2009; Butterfill and Apperly Reference Butterfill and Apperly2013; Low et al. Reference Low, Apperly, Butterfill and Rakoczy2016). This article will not evaluate these hypotheses. Footnote 3 Instead, it focuses on the existence of a phenomenon: whether infants anticipate that an actor will behave in a way that accords with her (the actor’s) psychological states. I will refer to this as the anticipation phenomenon. The anticipation phenomenon describes a certain pattern of infant looking behavior, but it remains neutral on its causes; that is, it makes no claims about whether the infant displays this looking behavior because she is attributing psychological states to the agent, because she is tracking some behavioral pattern, or for any other reason. Because the anticipation phenomenon is distinct from the diverse hypotheses evoked to explain it, should it turn out not to exist, then each of the hypotheses just mentioned would require significant revision. Whether the anticipation phenomenon exists is the central question of this replication debate.

2.2. The anticipatory looking false-belief task

In 2007 Victoria Victoria Southgate and colleagues published a study that used the AL paradigm to examine 2-year-olds’ understanding of false beliefs. In this paradigm, participants watch a video showing a puppet; two boxes, each with a window above it; and a human actor. First, the baby watches the familiarization trials: the puppet puts a ball in a box while the actor watches; a chime sounds, and two windows above the boxes flash; and then the actor reaches through the window above the box with the ball in it, placing her hand in the box. The baby watches this sequence twice (once for each box). The aims of the familiarization trials are to show the baby that the actor wants the ball and for the experimenters to check that the baby’s looking behavior demonstrates that the baby expects the actor to reach for where the ball is—that is, that when the chime sounds, the baby looks to the box where the ball is (more on this follows). Next, the babies watch one of two test conditions. In the first false-belief condition (false-belief 1), the actor watches as the puppet puts the ball in the left-side box, then moves it to the right-side box and closes the lid of the left-side box. The actor then turns away, distracted by a phone ringing. The puppet takes the ball out of the right-side box and leaves the scene, taking the ball with it. The actor turns back to the scene, the chime sounds, and the windows above the boxes flash. In this trial, babies should expect the actor to reach through the right window, with this expectation manifesting through (a) the babies looking first to the right window as soon as they perceive the chime and flashing cues (first-look measurement) and (b) their looking longer at the right window than the left window. The puppet’s behavior in the other test condition—false-belief 2—is the same as in false-belief 1, but the actor is distracted as soon as the puppet places the ball in the left box and does not turn back to the scene until the puppet has left, meaning that she should reach through the left window when she turns back to the scene.

Southgate and colleagues (Reference Southgate, Senju and Csibra2007) reported that 9/10 infants in false-belief 1 looked to the correct window when they perceived the cues, and 8/10 did so in false-belief 2. Regarding how long infants looked at the correct window, they write, “As the infants were familiarized to a delay of 1750ms between the onset of illumination and the opening of a window, we coded only the first 1750ms after onset of illumination on the test trial. The infants spent almost twice as long Footnote 4 focusing on the correct window as the incorrect window” (Southgate et al. Reference Southgate, Senju and Csibra2007, 590).

As mentioned earlier, one of the roles of the familiarization trials is to ascertain that infants show the right looking behaviors. Infants who did not look toward where the actor should reach for the ball by the end of the second familiarization trial were excluded from the study. This is because of two assumptions in the methodology:

  1. 1. The baby’s gaze direction indicates that they anticipate something to happen at that location.

  2. 2. The baby’s anticipation is caused by some kind of cognitive mechanism that tracks the actor’s movements and predicts what she will do next.

These assumptions should be uncontroversial. Footnote 5 If infants do not show the right pattern of gaze in the familiarization trials, this suggests either that they are not able to track simple goal-directed actions or that their ability to do so is not revealed by the methodology. Because both of these explanations for their behavior mean that the AL methodology is not appropriate for examining that infant’s understanding of false beliefs, those who showed this behavior were excluded from the study. An additional 11 babies were excluded from the study for failing to meet this criterion.

2.3. Replicating the anticipatory looking false-belief task

Southgate et al.’s (Reference Southgate, Senju and Csibra2007) AL false-belief task has faced mixed replication success. Sebastian Dörrenberg and colleagues tested 66 2-year-olds with Southgate’s stimuli and found that participants looked longer at the correct window only in false-belief 1. Similarly, infants’ first looks upon perceiving the cues were to the correct window in false-belief 1, but they more often went to the incorrect window in false-belief 2. Tobias Schuwerk and colleagues (Reference Schuwerk, Kampis, Bohn, Fisher, Wiesmann, Hyde, Kulke Friedrich-Alexander, Mahowald, Mascaro, Prein and Raz2022) also used Southgate’s stimuli, but they had to exclude 58% of participants (28 out of 48 children) for failing to look in the correct direction at the end of the familiarization period. Of the 20 participants who remained, only 7 looked first toward the correct window, and there was no difference in how long they looked at the correct and incorrect windows across both trials. In the same year, Louisa Kulke and Hannes Rakoczy (Reference Kulke and Rakoczy2018) collected data on both published and unpublished attempts to replicate Southgate et al.’s experiment, showing that of the 20 researchers who responded to their call for data, only 5 managed to successfully replicate Southgate et al.’s data (see Table 1 for their criteria for evaluating replications).

Table 1. Data collected by Kulke and Rakoczy (Reference Kulke and Rakoczy2018). The authors coded data as follows: (1) false-belief 1 above chance, (2) false-belief 2 above chance, (3) measured by how long participants looked at the correct versus incorrect window, and (4) measured by the “first look.” A result was coded as a replication if it met criteria 1–4 and as a partial replication if it met criteria 1 or 2 and criteria 3 or 4

Replication Partial replication Nonreplication
Unpublished 0 7 5
Published 5 3 0

What can be gleaned from this collection of replication data? Taking the more upbeat news first, it appears that more participants succeed in false-belief 1 than in false-belief 2 (Baillargeon et al. Reference Baillargeon, Buttelmann and Southgate2018). If robust, this pattern is something that theories of mind reading could reasonably accommodate. For example, infants need to hold in mind the actor’s false belief for longer in false-belief 2 in contrast to false-belief 1, requiring a greater demand on their limited processing capacity and resulting in their forgetting the actor’s belief and defaulting to reality. This would be in keeping with prominent accounts of why 3-year-olds fail elicited-response tasks (Carruthers Reference Carruthers2013, Reference Carruthers2018, Reference Carruthers2020; Scott and Baillargeon Reference Scott and Baillargeon2009, Reference Scott and Baillargeon2017).

More worrying, however, is the lack of a pattern in infants failing the familiarization trials, ranging from over 50% of participants being excluded at this stage (Schuwerk et al. Reference Schuwerk, Priewasser, Sodian and Perner2018) to just 4% in other studies (Dörrenberg et al. Reference D¨orrenberg, Rakoczy and Liszkowski2018). On the basis of these data alone, one might question the AL paradigm’s suitability for measuring infants’ anticipation of another’s goal-directed movement, and this problem is made all the more pressing because we do not understand why it works for some babies and not others. These data serve to highlight lacunae in our understanding of this methodology.

In their response to this and other replication work concerning different false-belief tasks, Baillargeon et al. (Reference Baillargeon, Buttelmann and Southgate2018) wrote the following:

We do not agree with claims in some of the special-issue papers that these negative findings cast doubt on the conclusion that some capacity for belief understanding is already present in infants and toddlers…. [T]he non-replications stand in contrast to a large body of positive and convergent findings: as was mentioned earlier, over 30 published reports, using 11 different methods, have now provided evidence of false belief understanding in children under 3-years of age. (123)

Notably, these authors each support theories of mind reading that predict that infants should be able to attribute false beliefs and other psychological states to other people. Yet researchers whose theoretical commitments lead them to be less confident that infants’ understanding of psychological states stretches to false belief take quite a different interpretation of the replication data, claiming that we are not yet in a position to know whether infants attribute false beliefs to others (Poulin-Dubois et al. Reference Poulin-Dubois, Rakoczy, Burnside, Crivello, Dorrenberg, Edwards, Krist, Kulke, Liszkowski, Low, Perner, Powell, Priewasser, Rafetseder and Ruffman2018). Footnote 6

Allow me to reiterate that the focus of this article is the anticipation phenomenon (sec. 2.1), not whether infants can attribute false beliefs to others. One can reasonably reframe the debate just discussed to reflect this: one side believes that the data support the existence of the anticipation phenomenon, whereas the other does not; one side believes that a particular effect—infants looking toward where an agent will act—has been replicated, whereas the other does not. What makes the debate more intractable are new doubts, revealed by this replication work, about how the AL paradigm works. This yields a double uncertainty. First, there is uncertainty about the phenomenon: we do not know whether infants expect an agent to act in accordance with her (the agent’s) psychological states, which is why we are conducting the experiments in the first place. But additionally, there is also uncertainty about our methods of measurement: we do not know if the AL paradigm is a reliable method, so when infants’ looking behavior suggests they have not correctly anticipated the agent’s behavior, we don’t know if this is because they have not done so or if they have but it somehow has not been captured by the constraints of the AL paradigm. These uncertainties about the measurement and the phenomenon in turn fuel interpretation of the replication data in different ways, dependent on one’s prior theoretical leanings. Those who think infants can attribute psychological states to others will suggest there is something amiss with how the AL paradigm has been implemented, whereas those on the other side of the debate are more likely to accept the suitability of the AL paradigm but question the existence of the phenomenon. This comes out particularly fiercely in an exchange about the suitability of the violation-of-expectation method for measuring infants’ understanding of false beliefs, with Renée Baillargeon et al (Reference Baillargeon, Buttelmann and Southgate2018) suggesting that small differences in how the paradigm was implemented were responsible for the failure to replicate her work. By contrast, Paula Rubio-Fernandez (Reference Rubio-Fern´andez2019) has expressed concerns that researchers are adjusting how they implement the paradigm until it yields results supportive of the view that infants can attribute false beliefs to others (see also Peterson Reference Peterson2016). And yet, if the phenomenon does exist (as many researchers believe it does), then calibrating our methods of measurement such that they can detect it could be a perfectly reasonable thing to do. The problems arise when, as here, there are doubts about the existence of the phenomenon.

This section has reviewed an ongoing debate about how to interpret attempts to replicate Southgate et al.’s (Reference Southgate, Senju and Csibra2007) experiment using the AL paradigm to ascertain if infants can discriminate between belief-congruent and belief-incongruent behaviors. Thanks to these replication endeavors, an important gap in our knowledge about the AL methodology has become apparent: we do not understand why a significant number of babies fail the familiarization trial. This leads to more pressing questions in our application of the paradigm: What needs to be in place for us to be confident that it is suited to tracking infants’ anticipations about events? And when infants’ looking behavior fails to support the anticipation hypothesis (sec. 2.1), is this because they have not made this discrimination or because it has not been detected by the AL method?

The next section turns to work by Hasok Chang (Reference Chang2004, Reference Chang2012) that argues that even when a field faces a conundrum such as the one outlined here, it is still able to yield epistemic goods. This is due to the process of “epistemic iteration,” wherein by repeating experiments and keeping a variety of different theoretical options open, researchers are able to meet their epistemic goals and, in so doing, make progress with their discoveries. I will argue that replication is an essential part of the epistemic iterative process and that therefore, fields that experience high rates of failed replications can nevertheless be seen as producing important knowledge.

3. Epistemic iteration

3.1. Imperfect ingredients and the “principle of respect”

The structure of the puzzle outlined in section 2.3 is by no means unique to infant psychology. Every scientific field will, at various points in its history, have faced a problem where the current standards of measurement were inadequate for examining the phenomena researchers were interested in. Yet despite these uncertain foundations, the scientists involved were able to progress toward their epistemic goals: calibrating a widely agreed new standard, improving theoretical unity or explanatory power, improving quality and quantity of evidence, or some other epistemic virtue (Chang Reference Chang2004, 227). This movement, argues Chang, occurs thanks to the process he calls epistemic iteration:

Epistemic iteration is a process in which successive stages of knowledge, each building on the preceding one, are created in order to enhance the achievement of certain epistemic goals. In each step, the later stage is based on the earlier stage, but cannot be deduced from it in any straightforward sense. Each link is based on the principle of respect and the imperative of progress, and the whole chain exhibits innovative progress within a continuous tradition. Iteration provides a key to understanding how knowledge can improve without the aid of an indubitable foundation. What we have is a process in which we throw very imperfect ingredients together and manufacture something just a bit less imperfect. (Reference Chang2004, 46)

Progress begins when a community acknowledges that its current system of knowledge is imperfect. In Chang’s example, scientists realized that our sensations of hot and cold were insufficient to permit the investigation of the phenomena they were interested in. In our case, we could say that prior to Onishi and Baillargeon’s pioneering work, we lacked a method to investigate infants’ understanding of false beliefs because the only methods available were designed for children over 36 months. Moving forward to the debate as it stands today: replications of Southgate et al.’s (Reference Southgate, Senju and Csibra2007) work have served to spotlight “imperfections” in our understanding of the AL paradigm, for example, our lack of knowledge of why performance in the familiarization trials is so variable. This is one of the most valuable functions of replications: highlighting gaps in our knowledge of which we were previously unaware (see sec. 3.3).

How do we move on from this state of uncertainty? Here, Chang (Reference Chang2004) argues that we should develop a new standard, whose relation to the old one is captured by the “principle of respect.” Our first iteration of thermoscopes needed to respect our folk sensations of temperature, showing that the things we reliably perceive as hot show a higher temperature than those that we reliably perceive as cold. But while guided by our sensations, the thermoscopes were not constrained by them because, in being more accurate than our sensations, they could later be used to correct judgments of temperature based on sensation alone: a hand that has been in the snow will feel a bucket of tepid water as warm, and one that has been snug in a mitten will feel it as cold, but the thermoscope will reveal that the water is a uniform temperature (Chang Reference Chang2004, 43).

We see the principle of respect in action in the ongoing multilaboratory Many Babies 2 collaboration, which is conducting a large-scale replication project concerning whether babies expect an agent to look for something based on the agent’s knowledge of where that thing is (Schuwerk et al. Reference Schuwerk, Kampis, Bohn, Fisher, Wiesmann, Hyde, Kulke Friedrich-Alexander, Mahowald, Mascaro, Prein and Raz2022). The study uses the AL paradigm. One of the “imperfect” foundations upon which we set the AL paradigm is our acceptance that babies can attribute goals to other agents and expect them to act on these goals. There are multiple lines of support for this acceptance. First, we know that adults cannot help but see certain movements as goal directed, as was shown most famously by Heider and Simmel’s (Reference Heider and Simmel1944) work. Second, it is a feature widely observed in the nonhuman animal kingdom, from a pride of lions hunting an impala to Sarah the chimpanzee recognizing the various outcomes her trainer’s behavior was aimed toward (Woodruff and Premack Reference Woodruff and Premack1978). Third, there are strong evolutionary arguments for the ability to recognize goal-directed movements early in development as a critical means of enhancing survival. Fourth, there are a number of experiments, using a range of different methods (e.g., the visual habituation paradigm), yielding evidence to support the claim that by 8 months, infants reliably distinguish goal-directed from non-goal-directed movements. Footnote 7 And last, but by no means least, caregivers through the ages have treated their babies as though they can recognize goal-directed actions. Taken as a whole, this collection of reasons from a range of disciplinary perspectives—although imperfect—nevertheless gives a foundation against which to calibrate an instance of the AL paradigm: if babies do not respond to a particular set of stimuli in ways that indicate that they have attributed a goal to the protagonist, then those stimuli need to be reconfigured until such a response is reliably procured. The epistemic iteration framework explains why this kind of calibration is acceptable: we are calibrating to an imperfect starting point, but provided we keep an open mind about how the next iteration of measurement might change this (see following discussion), it will be good enough. From their pilot work, the researchers on the Many Babies 2 team are confident that their implementation of the AL paradigm is able to track babies’ expectations of the goal-directed movements of others, with 68% of toddlers (65; 18–25 m) and 69% of adults (42) looking to where a chaser (a bear) would go in order to catch a chasee (a mouse) (Schuwerk et al. Reference Schuwerk, Kampis, Bohn, Fisher, Wiesmann, Hyde, Kulke Friedrich-Alexander, Mahowald, Mascaro, Prein and Raz2022, 19). Footnote 8

3.2. Enrichment, correction, and contradiction

In the previous section, I loosely used the phrase “keep an open mind” about how iteration could change our imperfect starting point. I now draw on three more concepts from Chang to explain what this entails.

First, our new measurements may contradict our previous ones in some ways (see the earlier example of the tepid bucket of water). Some contradiction can be tolerated: after all, the whole point of developing a new system of measurement is because the previous one is in some way inadequate, so we should expect some differences in their outputs. But if every instantiation of the new system leads to a contradiction with the old, then this gives us good reason to abandon the new system. For example, if we could not generate any stimuli that caused babies to look to where a goal-directed agent should go, then this would raise questions about the suitability of the AL method for this age group. Such doubt would be compounded if other methods did show that babies anticipate other people’s goal-directed actions. But there is also a more subtle manifestation of this problem peculiar to infant cognition. Babies have very limited cognitive and motor abilities, and in adjusting the stimuli until participants show AL behaviors, one can end up with images and situations that are very far removed from the everyday reality that babies typically encounter. For example, the Many Babies 2 stimuli are a simple cartoon bear and mouse, an upside-down Y-shaped tunnel through which the bear chases the mouse, and a box at each end of the “Y” where the mouse hides. But generalizability is inherent to the nature of the cognitive capacity we are studying: if babies only show looking behaviors consistent with goal attribution in a very specific circumstance and no other, then this is insufficient to support the claim that they anticipate the goal-directed behaviors of others because this ability is meant to underpin all (or most) perceptions of goal-directed actions, not just a tiny subset of them. Footnote 9 If babies’ looking behavior were specific to just one set of stimuli, this would contradict the hypothesis at the center of our imperfect foundation and lend support to abandoning the AL paradigm.

The second virtue of the iterative process is “enrichment,” wherein “the initially affirmed system is not necessarily negated but refined, resulting in the enhancement of some of its epistemic virtues” (Chang Reference Chang2004, 228). The researchers on the Many Babies 2 team are confident that their stimuli reliably cause babies to look where they expect the bear to chase the mouse. This places them in a position to extend their method from collecting data about a phenomenon about which we are reasonably confident (babies’ ability to anticipate goal-directed action) to one about which we are less certain: babies’ ability to anticipate what someone will do based on their epistemic states (knowledge vs. ignorance). This work is currently underway, using the same stimuli as described for the earlier study but with a minimal adjustment: whether the bear sees which box the mouse enters upon leaving the tunnel. If the babies’ looking patterns do not show that they expect another to act on their knowledge states, the researchers can be reasonably confident that this is due to the babies’ cognitive limitations rather than quirks of the stimuli or measurement window because these remain the same as in the pilot. This process instantiates the principle of respect and also illustrates how the iterative process can lead to progress in allowing methods of experimentation to extend to new domains.

The last virtue of epistemic iteration that Chang (Reference Chang2004) discusses is “self-correction.” This occurs when a new standard gives us reason to adjust our hypotheses that were based on data from the old standard. In this case, one could call the Many Babies 2 stimuli a step toward a new standard. However, the stimuli themselves cannot be the standard, for the reasons explained at the start of this section. Instead, we need to develop our understanding of why these stimuli are more successful at eliciting goal-directed AL behavior. Once this has been done, the principles can be applied to the creation of new stimuli that give more uniform data concerning false beliefs than Southgate et al.’s (Reference Southgate, Senju and Csibra2007) data. Whether a self-correction is required depends on how these data turn out. Another form of self-correction is evident in the calibration process described earlier as the researchers on the Many Babies 2 team developed their stimuli. The adjustment made to the stimuli to get the effect of AL behavior is itself a process of self-correction and can only occur through repeatedly testing different participants.

3.3 Multiple epistemic goals

Central to Chang’s framework is the idea that there are always multiple goals at work in scientific research, and his emphasis on this aspect is helpful for understanding the epistemic gains made in our case study and through replication work more generally. More often than not, the stated goal of an experiment is to provide data for or against a specific hypothesis. If this is one’s only goal, then failed replications are certainly problematic. Popper (Reference Popper1959) famously argued that replicating results is necessary for distinguishing data that support a hypothesis from “mere isolated coincidence” (45). Later, Collins (Reference Collins1985) articulated the problem of the “experimenters’ regress,” namely, how different research teams decide which experimental outcome is the “correct” one: that of the original or of the failed replication (see Feest [Reference Feest2016] for further discussion). Returning to our case study: the data from the replications are insufficient to allow us to evaluate the anticipation hypothesis; thus, they fail to meet this epistemic goal. Yet despite failure on this front, the previous analysis shows how progress has been made toward achieving other epistemic goals: improving our understanding of how the AL paradigm works and, in so doing, making it a more reliable measure of infants’ expectations. This view of progress seems to capture the epistemic gains that come from replication work better than a single-minded focus on whether the results support the hypothesis under consideration.

One worry with this characterization of progress is that it does not match up with how experimenters view their own work. Southgate and colleagues’ (Reference Southgate, Senju and Csibra2007) aim was to test their false-belief hypothesis; the aim of those conducting the replication work was to test the anticipation hypothesis; none of these parties succeeded in attaining these ends. Is it fair to argue for progress on the grounds that different epistemic goals have been achieved when it is not at all clear that anyone involved in the work has these goals in mind? Footnote 10 I think this question can be addressed by revisiting part of the quotation cited in section 3.1: “Epistemic iteration is a process in which successive stages of knowledge, each building on the preceding one, are created in order to enhance the achievement of certain epistemic goals. In each step, the later stage is based on the earlier stage, but cannot be deduced from it in any straightforward sense” (Chang Reference Chang2004, 46; emphasis added). This investigation of infants’ understanding of false beliefs began with an imperfect foundation: the assumption that the AL methodology would be able to provide evidence for or against the false-belief hypothesis. From this beginning, it could not be deduced in a straightforward sense that the next step would be to dissemble the methodology. That this would be a productive step only became apparent later in the research journey, when the failed replications came in. It seems uncontroversial to say that improving our understanding of the AL methodology is an epistemic gain. But it is not one that could have been foreseen from the starting point and thus could not have been a goal. Crucially, without the imperfect starting point, these gains would not have been possible. This is a liberal view of scientific progress, but I do not think it is too liberal. It gives boundary limits for when more experiments are unhelpful: when they fail to meet any of the epistemic goals mentioned earlier. But it is nevertheless healthy, for science and philosophy, to consider the exclusion of wrong answers to be a form of progress.

3.4 Uncertainty revisited

This section has argued that epistemic iteration offers a way of understanding how infant psychology can make epistemic gains despite the dual doubts—about the reliability of the AL method and the existence of the anticipation phenomenon—at its foundation. By using the principle of respect and building out from our initial assumption that infants can attribute goals to others, we can begin to calibrate the AL methodology, which in turn increases our confidence in its reliability when applied to phenomena we are less certain of, such as anticipating another’s actions based on their knowledge states. Crucially, this iterative process can be applied to the other spontaneous methodologies that face the same double uncertainties about measurement and the existence of a phenomenon (e.g., Buttelman et al.’s [Reference Buttelmann, Carpenter and Tomasello2009] spontaneous-helping paradigm or the violation-of-expectation paradigm). Calibrating and standardizing spontaneous methodologies is a key epistemic goal for infant psychology, and the earlier discussion outlines how this is possible even when we are uncertain about the phenomena in question.

One worry about this application of epistemic iteration is that the cases of infant cognition and temperature are disanalogous. Footnote 11 Those developing the first instruments to measure temperature knew that there was a phenomenon “out there” to be measured; they were just unsure how to go about measuring it. In contrast, the central question of the infant psychology debate is whether babies expect people to act in ways that are congruent with their psychological states, and if so, what the limits of this ability might be (goal states, Footnote 12 knowledge states, belief states, etc., as well as the content of these states). In other words, it’s not clear that a phenomenon exists to be measured, unlike the case of temperature. As observed by Kenneth Kendler (Reference Kendler, Kenneth, Kendler and Parnas2012), one cannot iterate “towards a target that isn’t there” (308).

I think this concern can be mitigated from two different angles. First, Chang himself is clear that epistemic iteration is valuable in helping us achieve our epistemic goals, even when we are unsure about whether our inquiries are targeting the phenomena we are after (see also Schaffner Reference Schaffner, Kendler and Parnas2012):

It [epistemic iteration] differs crucially from mathematical iteration in that the latter is used to approach the correct answer that is known, or at least in principle knowable, by other means. In epistemic iteration that is not so clearly the case. (Chang Reference Chang2004, 45)

A null result is nevertheless an epistemic gain. If, after numerous iterative attempts at calibration and standardization across all spontaneous methodologies, babies do not show looking behaviors consistent with the hypothesis that they anticipate the actions of other agents, then we accept that the research program has contradicted its core hypothesis and that babies do not have this cognitive ability. As mentioned earlier, being able to exclude a wrong answer can be useful.

A more likely scenario is that after several iterative processes aimed at improving calibration and standardization for each spontaneous methodology, there is no consensus about the nature of infants’ anticipation of other agents’ actions. This brings me to the second angle from which to address the worry because just as results from the AL methodology alone are insufficient to support the claim that babies anticipate other agents’ actions, neither would the results from all spontaneous methodologies be sufficient to support this claim. Spontaneous methodologies are but one way of exploring and investigating infant cognition. Babies and their carers have, quite literally, always been a part of human history, and there is a vast, messy, and contradictory body of folk knowledge about their abilities. I am reminded here of a passage from Jennifer Nansubuga Makumbi’s (Reference Makumbi2020) novel The First Woman where a Ugandan trainee nurse writes home with news about her first days at medical school:

We have two orphan babies. I am not lying. Real breathing human babies, donated to the school by Ssanyu Babies’ Home, to learn how to look after babies-–winding and bathing them, tying nappies and diet. I said, but these Europeans know how to waste time. Who taught our mothers to bring up children?

We are not yet in a position to know what infants know about other people’s actions. But what we do know is that infants grow into preschoolers who can track false beliefs in others and recognize when someone is hiding their true emotions (Wellman Reference Wellman2014) and eventually into adults who can track three or four levels of deceit in Shakespearean-style plots. Caregivers do not notice a seismic change in their children when they go from failing to passing false-belief tasks, nor when they pass any other purportedly significant mind-reading milestone Footnote 13 in tracking psychological states. We assume that infants know something about the actions of others, and as such, there is a phenomenon there to be explored, no matter how crudely outlined. Footnote 14 Folk knowledge and evidence from other sources (see sec. 3.1) combined with the principle of respect are sufficient to ensure we start our investigations in broadly the right ballpark, and even if the phenomenon under investigation is even less well understood than temperature was prior to the first thermometers, this does not foreclose the prospect of epistemic iteration leading to the fulfillment of our epistemic goals.

4. Imperfection, not falsehood

This article opened with a quote from an editorial in the journal Nature stating that two-thirds of what we read in psychology journals should not be trusted. This section reviews this sentiment in the light of the discussion in section 3.

The aim of Chang’s framework is to show how we can make epistemic inroads in a scientific investigation, be our starting point ever so bad. From an imperfect starting point and with imperfect methodologies, we can nevertheless end up with a better understanding of a phenomenon than that with which we started. Critically, the knowledge we gain would not have been possible had we not started somewhere: the imperfect starting point is necessary to attaining the goods that follow. This position stands in contrast to those who perceive a large number of failed replications to indicate untrustworthy science. A large number of failed replications should be expected when the starting point is bad because there is so much uncertainty about the concepts under investigation and the methods used to find out about them. The problems arise when researchers fail to acknowledge their work for what it is: a process of building outward from an uncertain foundation. An important lesson being learned from the replication crisis is that this starting point needs to be made more explicit (Bringmann et al. Reference Bringmann, Elmer and Eronen2022; Feest Reference Feest2022; Sikorski and Andreoletti Reference Sikorski and Andreoletti2023).

Second, one cannot build the foundation of a scientific research program on distrust. Footnote 15 But epistemic iteration shows that one can build such a foundation upon imperfection. This is not simply an issue of petty wordplay. Inherent in the distrust narrative is the sense that one would be irrational to continue in a field where so many findings fail to be replicated. Indeed, this is expressed with some force by Tal Yarkoni (Reference Yarkoni2020), who exhorts psychology graduates to go do something else with their lives. Epistemic iteration, on the contrary, shows such a starting point to be acceptable because it implies that there is considerable scope for improvement and plenty of work for scientists to do.

One may object that this is an overly Pollyanna-ish interpretation of a field with many failed replications. Sometimes, so the criticism goes, we should take a slew of failed replications to indicate that a hypothesis or research program ought to be abandoned. How can we distinguish between a foundation that is imperfect but has scope for improvement and one that is hopeless? We distinguish it through the system’s ability to achieve the epistemic goals it sets, and those that consistently face self-contradiction in the pursuit of these goals can be abandoned (see sec. 3.2). Getting the same data from the same methods is one epistemic goal, but it is not the only one; subsequently, a large number of failed replications should not be the only reason to abandon a research program.

5. Conclusions

Infant psychology is a field with a high number of failed replications. Yet it is also, as argued in this article, a field where significant epistemic gains are being made in our understanding of the methods used to investigate infant cognition. This is the case despite the high degree of uncertainty in the field regarding both the phenomena under investigation and the reliability of the methods used to examine them. This article offers an explanation for how this can be in the form of epistemic iteration. Epistemic iteration offers the tools to see how we can progress toward our epistemic goals even when our starting point, both in terms of the phenomena under examination and the methods used to examine it, is imperfect. When a field is in this stage of having relatively few affirmed foundations, it is unsurprising that it also has many instances of failed replications because there is so little to build on (Irvine Reference Irvine2021). Importantly, we need to start somewhere, and without the messy data generated by these imperfect concepts, we would not be in a position to work out how we might advance our epistemic goals. It is as we start building on these data that we come closer to creating experiments that can be replicated.

There are several big issues that have been skirted in this piece, which I defer to later articles. The biggest is how we should view progress within infant psychology or even psychology as a more general field. The article accepts, without much defense, Chang’s proposal of progress as characterized by meeting epistemic goals, which gives a very localized view of progress because the goals of most epistemic import will vary from field to field and from time to time within a field. Future work could offer further defense of this view of progress, and of the coherentist approach more broadly endorsed by Chang, as appropriate for psychology. Another question is that raised in section 3.2 regarding the balance between making stimuli that are appropriate for infants and concerns about ecological validity and generalizability. The concern from ecological validity is that the stimuli are so different from life as encountered in the real world that one needs to carefully justify the claim that they are tapping into the same cognitive abilities that babies use “in the wild.” The concern from generalizability is that the stimuli may be testing a very specific cognitive ability (e.g., an infant’s theory of cartoon bears and mice) rather than the indefinitely flexible ability to track goals, which is the real target of investigation (Feest Reference Feest2022; Packer and Moreno-Dulcey Reference Packer and Moreno-Dulcey2013).

Through this survey of replications of Southgate et al. (Reference Southgate, Senju and Csibra2007) AL false-belief task and the Many Babies 2 project, we see research that, far from being untrustworthy, exemplifies progress through the iterative processes of self-correction and enrichment. Research into infants’ abilities to attribute psychological states to others has very few certain foundations, and I have shown how the progress made to date is based on the most stable, but still imperfect, of these—namely, infants’ ability to anticipate goal-directed actions. Thus, because failed replications are compatible with flourishing, progressive science, it is time to sever the connection between “does not replicate” and “untrustworthy” and instead recognize the necessity of this work for the epistemic iterative cycle of accumulating knowledge.

Acknowledgments

This research was supported by a Humboldt Experienced Researcher fellowship and by a British Academy (BA) grant, “Replication: Crisis or Opportunity” (SRG2000688), which funded a series of workshops where these ideas were developed. The author gratefully acknowledges these funders. Thanks also to participants at the BA workshops, Enno Fischer, Barry Maguire, the Consciousness and Cognition research group at Ruhr Universität Bochum, and two anonymous reviewers for their time and invaluable feedback.

Footnotes

1 I will use the terms infants and babies to refer to children aged 24 months and younger, unless otherwise specified. This captures the age range of most of the participants in this case study.

2 This work was collaborative with Atsushi Senju and Gergely Csibra.

3 See Lavelle (Reference Lavelle2019) or Rakoczy (Reference Rakoczy and Robert2022) for evaluations.

4 An average of 956 ms looking at the correct window and 49 6ms at the incorrect window.

5 The exact nature of the cognitive mechanisms cited in the second assumption is subject to controversy (is the anticipation caused by attributing psychological states to the agent? or by tracking some behavioral cue?), but as explained in section 2.1, this is not a question for this article.

6 These differences in opinion about whether the failed replications cast doubt on the hypothesis that infants and toddlers can understand false beliefs exemplify another problem running through replication debates, namely, the “experimenter’s regress” (Collins Reference Collins1985). For further discussion of this particular problem, see Lavelle (Reference Lavelle2022).

7 See Luo (Reference Luo2011) for a review.

8 Although the authors do not comment on why approximately 30% of participants did not show the AL behavior, this can be explained by appeal to individual differences in attention span or motivation.

9 Addressing this issue more substantively requires a closer analysis of the fragility of experimental effects, which remains a topic for another article (although see Feest [Reference Frank, Bergelson, Bergmann, Cristia and Floccia2022], Kominsky et al. [Reference Leonelli, Fiorito, Scheall and Suprinya2022], and van Bavel [2016] for contributions in this line).

10 I am grateful to an anonymous reviewer for raising this question.

11 Thanks to Fan Yichu for pushing me on this point.

12 Some authors deny that goal states are psychological (Roessler and Perner Reference Schaffner, Kendler and Parnas2015), but the nuances of this particular debate are not relevant here.

13 For examples of such milestones, see Wellman (Reference Yarkoni1990, Reference Wellman2014) and Wellman et al (Reference Wellman, Cross and Watson2001).

14 An evaluation of how psychologists begin their initial “ballpark” descriptions of phenomena is a topic for another article (see Adetlua [2022], Haig [Reference Heyes2013], Muthukrishnan and Henrich [2019], and Rozin [Reference Schickore2001] for thoughtful contributions in this line).

15 A reviewer cites Merton’s “organized skepticism” as a counter to this claim (Merton 1973). I agree. But the skepticism urged by Morton seems a more respectful kind than the dismissive tone often given to findings that fail to replicate in the current climate.

References

Adetula, Adeyemi, Forscher, Patrick S., Basnight-Brown, Dana, Azouaghe, Soufian, and IJzerman, Hans. 2022. “Psychology Should Generalize from—Not Just to—Africa.” Nature Reviews Psychology 1 (7):370–71. https://doi.org/10.1038/s44159-022-00070-y.CrossRefGoogle Scholar
Apperly, Ian A., and Butterfill, Stephen A.. 2009. “Do Humans Have Two Systems to Track Beliefs and Belief-Like States?Psychological Review 116 (4):953.CrossRefGoogle ScholarPubMed
Asendorpf, Jens B., Conner, Mark, De Fruyt, Filip, De Houwer, Jan, Denissen, Jaap J. A., Fiedler, Klaus, Fiedler, Susann, Funder, David C., Kliegl, Reinhold, Nosek, Brian A., Perugini, Marco, Roberts, Brent W., Schmitt, Manfred, Van Aken, Marcel A. G., Weber, Hannelore, and Wicherts, Jelte M.. 2013. “Recommendations for Increasing Replicability in Psychology.” European Journal of Personality 27 (2):108–19. https://doi.org/10.1002/per.1919.CrossRefGoogle Scholar
Baillargeon, Renée, Buttelmann, David, and Southgate, Victoria. 2018. “Invited Commentary: Interpreting Failed Replications of Early False-Belief Findings: Methodological and Theoretical Considerations.” Cognitive Development 46:112–24. https://doi.org/10.1016/j.cogdev.2018.06.001.CrossRefGoogle Scholar
Baker, Monya. 2015. “Over Half of Psychology Studies Fail Reproducibility Test.” Nature 27:13. https://doi.org/10.1038/NATURE.2015.18248.Google Scholar
Bird, Alexander. 2021. “Understanding the Replication Crisis as a Base Rate Fallacy.” British Journal for the Philosophy of Science 72 (4):965–93.CrossRefGoogle Scholar
Bringmann, Laura F., Elmer, Timon, and Eronen, Markus I.. 2022. “Back to Basics: The Importance of Conceptual Clarification in Psychological Science.” Current Directions in Psychological Science 31 (4):340–46. https://doi.org/10.1177/09637214221096485.CrossRefGoogle Scholar
Buttelmann, David, Carpenter, Malinda, and Tomasello, Michael. 2009. “Eighteen-Month-Old Infants Show False Belief Understanding in an Active Helping Paradigm.” Cognition 112 (2):337–42. https://doi.org/10.1016/j.cognition.2009.05.006.CrossRefGoogle Scholar
Butterfill, Stephen A., and Apperly, Ian A.. 2013. “How to Construct a Minimal Theory of Mind.” Mind & Language 28 (5):606–37.CrossRefGoogle Scholar
Byers-Heinlein, Krista, Bergmann, Christina, Davies, Catherine, Frank, Michael C., Kiley Hamlin, J., Kline, Melissa, Kominsky, Jonathan, Kosie, Jessica E., Lew-Williams, Casey, and Liu, Liquan. 2020. “Building a Collaborative Psychological Science: Lessons Learned from ManyBabies 1.” Canadian Psychology 61 (4):349–63.CrossRefGoogle ScholarPubMed
Carruthers, Peter. 2013. “Mindreading in Infancy.” Mind and Language 28 (2):141–72. https://doi.org/10.1111/mila.12014.CrossRefGoogle Scholar
Carruthers, Peter. 2018. “Young Children Flexibly Attribute Mental States to Others.” Proceedings of the National Academy of Sciences of the United States of America 115 (45):11351–53. https://doi.org/10.1073/pnas.1816255115.CrossRefGoogle ScholarPubMed
Carruthers, Peter. 2020. “Representing the Mind as Such in Infancy.” Review of Philosophy and Psychology 11 (4):765–81. https://doi.org/10.1007/s13164-020-00491-9.CrossRefGoogle Scholar
Chang, Hasok. 2004. Inventing Temperature: Measurement and Scientific Progress. Oxford: Oxford University Press. https://doi.org/10.1093/0195171276.001.0001.CrossRefGoogle Scholar
Chang, Hasok. 2012. Is Water H2O? Evidence, Realism and Pluralism. Dordrecht: Springer. https://doi.org/10.1126/science.aac4716.CrossRefGoogle Scholar
Collins, Harry M. 1985. Changing Order: Replication and Induction in Scientific Practice. Beverly Hills, CA: Sage.Google Scholar
D¨orrenberg, Sebastian, Rakoczy, Hannes, and Liszkowski, Ulf. 2018. “How (Not) to Measure Infant Theory of Mind: Testing the Replicability and Validity of Four Non-Verbal Measures.” Cognitive Development 46:1230. https://doi.org/10.1016/j.cogdev.2018.01.001.CrossRefGoogle Scholar
Feest, Uljana. 2016. “The Experimenters’ Regress Reconsidered: Replication, Tacit Knowledge, and the Dynamics of Knowledge Generation.” Studies in History and Philosophy of Science Part A 58:3445. https://doi.org/10.1016/j.shpsa.2016.04.003.CrossRefGoogle ScholarPubMed
Feest, Uljana. 2019. “Why Replication Is Overrated.” Philosophy of Science 86 (5):895905. https://doi.org/10.1086/705451.CrossRefGoogle Scholar
Feest, Uljana. 2022. “Data Quality, Experimental Artifacts, and the Reactivity of the Psychological Subject Matter.” European Journal for Philosophy of Science 12 (1):125. https://doi.org/10.1007/S13194-021-00443-9.CrossRefGoogle Scholar
Fletcher, Samuel C. 2021. “The Role of Replication in Psychological Science.” European Journal for Philosophy of Science 11 (1):119. https://doi.org/10.1007/s13194-020-00329-2.CrossRefGoogle Scholar
Frank, Michael C., Bergelson, Elika, Bergmann, Christina, Cristia, Alejandrina, Floccia, Caroline, Judit Gervain, J. Kiley Hamlin, Erin E. Hannon, Melissa Kline, Claartje Levelt, Casey Lew-Williams, Thierry Nazzi, Robin Panneton, Hugh Rabagliati, Melanie Soderstrom, Jessica Sullivan, Sandra Waxman, and Daniel Yurovsky. 2017. “A Collaborative Approach to Infant Research: Promoting Reproducibility, Best Practices, and Theory-Building.” Infancy 22 (4):421–35. https://doi.org/10.1111/INFA.12182.CrossRefGoogle Scholar
Haig, Brian D. 2013. “Detecting Psychological Phenomena: Taking Bottom-Up Research Seriously.” American Journal of Psychology 126 (2):135–53.CrossRefGoogle ScholarPubMed
Heider, Fritz, and Simmel, Marianne. 1944. “An Experimental Study of Apparent Behavior.” American Journal of Psychology 57 (2):243–59.CrossRefGoogle Scholar
Heyes, Cecilia. 2014a. “False Belief in Infancy: A Fresh Look.” Developmental Science 17 (5):647–59.CrossRefGoogle ScholarPubMed
Heyes, Cecilia. 2014b. “Submentalizing: I Am Not Really Reading Your Mind.” Perspectives on Psychological Science 9 (2):131–43.CrossRefGoogle ScholarPubMed
Irvine, Elizabeth. 2021. “The Role of Replication Studies in Theory Building.” Perspectives on Psychological Science 16 (4):844–53. https://doi.org/10.1177/1745691620970558.CrossRefGoogle ScholarPubMed
Kendler, Kenneth S. 2012. “Epistemic Iteration as a Historical Model for Psychiatric Nosology: Promises and Limitations.” In Philosophical Issues in Psychiatry II: Nosology, edited by Kenneth, S. Kendler, and Parnas, Josef, 305–22. Oxford: Oxford University Press.CrossRefGoogle Scholar
Kominsky, Jonathan F., Lucca, Kelsey, Thomas, Ashley J., Frank, Michael C., and Hamlin, J. Kiley. 2022. “Simplicity and Validity in Infant Research.” Cognitive Development 63:113. https://doi.org/10.31234/osf.io/6j9p3.CrossRefGoogle Scholar
Kulke, Louisa, and Rakoczy, Hannes. 2018. “Implicit Theory of Mind—An Overview of Current Replications and Non-Replications.” Data in Brief 16:101–4. https://doi.org/10.1016/j.dib.2017.11.016.CrossRefGoogle ScholarPubMed
Lavelle, Jane Suilin. 2019. The Social Mind: A Philosophical Introduction. Abington, Oxfordshire, UK: Routledge.Google Scholar
Lavelle, Jane Suilin. 2022. “When a Crisis Becomes an Opportunity: The Role of Replications in Making Better Theories.” British Journal for the Philosophy of Science 73 (4):965–86.CrossRefGoogle Scholar
Leonelli, Sabina 2018. “Rethinking Reproducibility as a Criterion for Research Quality.” In Including a Symposium on Mary Morgan: Curiosity, Imagination, and Surprise, edited by Fiorito, Luca, Scheall, Scott, and Suprinya, Carlos Eduardo, 129–46. Bingley, UK: Emerald Publishing Limited. https://doi.org/10.1108/S0743-41542018000036B009.Google Scholar
Loscalzo, Joseph. 2012. “Irreproducible Experimental Results: Causes, (Mis)interpretations, and Consequences.” Circulation 125 (10):1211–14.CrossRefGoogle ScholarPubMed
Low, Jason, Apperly, Ian A., Butterfill, Stephen A., and Rakoczy, Hannes. 2016. “Cognitive Architecture of Belief Reasoning in Children and Adults: A Primer on the Two-Systems Account.” Child Development Perspectives 10 (3):184–89.CrossRefGoogle Scholar
Luo, Yuyan. 2011. “Three-Month-Old Infants Attribute Goals to a Non-Human Agent.” Developmental Science 14 (2):453–60. https://doi.org/10.1111/J.1467-7687.2010.00995.X.CrossRefGoogle ScholarPubMed
Makumbi, Jennifer Nansubuga. 2020. The First Woman. London: Oneworld Publications.Google Scholar
McNutt, Marcia. 2014. “Reproducibility.” Science 343 (6168):229.CrossRefGoogle ScholarPubMed
Merton, Robert. 1973. “The Normative Structure of Science (1942).” In The Sociology of Science: Theoretical and Empirical Investigations, edited by Norman, W. Storer, 267–78. Chicago: University of Chicago Press.Google Scholar
Muthukrishna, Michael, and Henrich, Joseph. 2019. “A Problem in Theory.” Nature Human Behaviour 3 (3):221–29. https://doi.org/10.1038/s41562-018-0522-1.CrossRefGoogle ScholarPubMed
Nosek, Brian A., Hardwicke, Tom E., Moshontz, Hannah, Allard, Aur´elien, Corker, Katherine S., Dreber, Anna, Fidler, Fiona, Hilgard, Joe, Kline Struhl, Melissa, Nuijten, Michele B., Rohrer, Julia M., Romero, Felipe, Scheel, Anne M., Scherer, Laura D., Sch¨onbrodt, Felix D., and Vazire, Simine. 2022. “Replicability, Robustness, and Reproducibility in Psychological Science.” Annual Review of Psychology 73 (1):719–48. https://doi.org/10.1146/annurev-psych-020821-114157.CrossRefGoogle ScholarPubMed
Onishi, Kristine, and Baillargeon, Renée. 2005. “Do 15-Month-Old Infants Understand False Beliefs?Science 308 (5719):255–58.CrossRefGoogle ScholarPubMed
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251):aac4716.CrossRefGoogle Scholar
Packer, Martin J., and Moreno-Dulcey, Fernando. 2013. “This Puppet Will Play a Game with You: Is It Time to Take Child Psychology out of the Laboratory?Journal of Chemical Information and Modeling 53 (9):1689–99.Google Scholar
Peterson, David. 2016. “The Baby Factory: Difficult Research Objects, Disciplinary Standards, and the Production of Statistical Significance.” Socius 2:110.CrossRefGoogle Scholar
Popper, Karl. 1959. The Logic of Scientific Discovery. London: Hutchison.Google Scholar
Poulin-Dubois, Diane, Rakoczy, Hannes, Burnside, Kimberly, Crivello, Cristina, Dorrenberg, Sebastian, Edwards, Katheryn, Krist, Horst, Kulke, Louisa, Liszkowski, Ulf, Low, Jason, Perner, Josef, Powell, Lindsey, Priewasser, Beate, Rafetseder, Eva, and Ruffman, Ted. 2018. “Do Infants Understand False Beliefs? We Don’t Know Yet—A Commentary on Baillargeon, Buttelmann and Southgate’s Commentary.” Cognitive Development 48:302–15. https://doi.org/10.1016/j.cogdev.2018.09.005.CrossRefGoogle Scholar
Rakoczy, Hannes. 2017. “In Defense of a Developmental Dogma: Children Acquire Propositional Attitude Folk Psychology around Age 4.” Synthese 194 (3):689707. https://doi.org/10.1007/s11229-015-0860-8.CrossRefGoogle Scholar
Rakoczy, Hannes. 2022. “The Development of Implicit Theory of Mind.” In The Routledge Handbook of the Philosophy of Implicit Cognition, edited by Robert, J. Thompson, 336–50. Abington, Oxfordshire, UK: Routledge.Google Scholar
Roessler, Johannes, and Perner, Josef. 2015. “Pro-Social Cognition: Helping, Practical Reasons, and ‘Theory of Mind.’Phenomenology and the Cognitive Sciences 14 (4):755–67.CrossRefGoogle ScholarPubMed
Rozin, Paul. 2001. “Social Psychology and Science: Some Lessons from Solomon Asch.” Personality and Social Psychology Review 5 (1):214.CrossRefGoogle Scholar
Rubio-Fern´andez, Paula 2019. “Publication Standards in Infancy Research: Three Ways to Make Violation-of-Expectation Studies More Reliable.” Infant Behavior and Development 54:177–88. https://doi.org/10.1016/j.infbeh.2018.09.009.CrossRefGoogle ScholarPubMed
Santiesteban, Idalmis, Catmur, Caroline, Hopkins, Senan Coughlan, Bird, Geoffrey, and Heyes, Cecilia. 2014. “Avatars and Arrows: Implicit Mentalizing or Domain-General Processing?Journal of Experimental Psychology: Human Perception and Performance 40 (3):929.Google ScholarPubMed
Schaffner, Kenneth. 2012. “Coherentist Approaches to Scientific Progress in Psychiatry: Comments on Kendler.” In Philosophical Issues in Psychiatry II: Nosology, edited by Kendler, Kenneth S. and Parnas, Josef, 323–30. Oxford: Oxford University Press.Google Scholar
Schickore, Jutta. 2011. “The Significance of Re-Doing Experiments: A Contribution to Historically Informed Methodology.” Erkenntnis 75 (3):325–47. https://doi.org/10.1007/s10670-011-9332-9.CrossRefGoogle Scholar
Schmidt, Stefan. 2009. “Shall We Really Do It Again? The Powerful Concept of Replication Is Neglected in the Social Sciences.” Review of General Psychology 13 (2):90100. https://doi.org/10.1037/a0015108.CrossRefGoogle Scholar
Schuwerk, Tobias, Kampis, Dora, Bohn, Manuel, Fisher, Cynthia, Wiesmann, Charlotte Grosse, Hyde, Daniel C., Kulke Friedrich-Alexander, Louisa, Mahowald, Kyle, Mascaro, Olivier, Prein, Julia, Raz, Gal, Rebecca Saxe Dana Schneider Friedrich-Schiller, Victoria Southgate, Francis Yuen, Amanda Rose Yuile, Lucie Zimmer, and Michael C. Frank. 2022. “In-Principle Acceptance of Registered Report: Action Anticipation Based on an Agent’s Epistemic State in Toddlers and Adults.” Child Development. arXiv preprint. https://doi.org/10.31234/osf.io/x4jbm.CrossRefGoogle Scholar
Schuwerk, Tobias, Priewasser, Beate, Sodian, Beate, and Perner, Josef. 2018. “The Robustness and Generalizability of Findings on Spontaneous False Belief Sensitivity: A Replication Attempt.” Royal Society Open Science 5 (5):172273. https://doi.org/10.1098/rsos.172273.CrossRefGoogle ScholarPubMed
Scott, Rose M., and Baillargeon, Renée. 2009. “Which Penguin Is This? Attributing False Beliefs about Object Identity at 18 Months.” Child Development 80 (4):1172–96. https://doi.org/10.1111/j.1467-8624.2009.01324.x.CrossRefGoogle ScholarPubMed
Scott, Rose M., and Baillargeon, Renée. 2017. “Early False-Belief Understanding.” Trends in Cognitive Sciences 21 (4):237–49. https://doi.org/10.1016/j.tics.2017.01.012.CrossRefGoogle ScholarPubMed
Scott, Rose M., Roby, Erin, and Baillargeon, Renée. 2022. “How Sophisticated Is Infants’ Theory of Mind?” In The Cambridge Handbook of Cognitive Development, edited by Olivier Houd´e and Ren´ee Baillargeon, 242–68. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108399838.015.Google Scholar
Sikorski, Micha-l, and Andreoletti, Mattia. 2023. “Epistemic Functions of Replicability in Experimental Sciences: Defending the Orthodox View.” Foundations of Science. https://doi.org/10.1007/s10699-023-09901-4.CrossRefGoogle Scholar
Simons, Daniel J. 2014. “The Value of Direct Replication.” Perspectives on Psychological Science 9 (1):7680.CrossRefGoogle ScholarPubMed
Southgate, Victoria, Senju, Atsushi, and Csibra, Gergely. 2007. “Action Anticipation through Attribution of False Belief by 2-Year-Olds.” Psychological Science 18 (7):587–92. https://doi.org/10.1111/j.1467-9280.2007.01944.x.CrossRefGoogle ScholarPubMed
Van Bavel, Jay J., Mende-Siedlecki, Peter, Brady, William J., and Reinero, Diego A.. 2016. “Contextual Sensitivity in Scientific Reproducibility.” Proceedings of the National Academy of Sciences of the United States of America 113 (23):6454–59. https://doi.org/10.1073/pnas.1521897113.Google Scholar
Van Dongen, Noah, Van Bork, Riet, Finnemann, Adam, Haslbeck, Jonas M. B., Van Der Maas, Han L. J., Robinaugh, Donald, De Ron, Jill, Sprenger, Jan, Borsboom, Denny, Machery, Edouard, Chase, Henry, Makovec, Dejan, Koberinski, Adam, Hui Choi, Hong, Elber, Lotem, Krempel, Raquel, and Blanken, Tessa 2022. “Productive Explanation: A Framework for Evaluating Explanations in Psychological Science.” arXiv preprint.CrossRefGoogle Scholar
Wellman, Henry M. 1990. The Child’s Theory of Mind. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Wellman, Henry M. 2014. Making Minds: How Theory of Mind Develops. New York: Oxford University Press.CrossRefGoogle Scholar
Wellman, Henry M., Cross, David, and Watson, Julanne. 2001. “Meta-Analysis of Theory-of-Mind Development: The Truth about False Belief.” Child Development 72 (3):655–84. https://doi.org/10.1111/1467-8624.00304.CrossRefGoogle ScholarPubMed
Wimmer, Heinz, and Perner, Josef. 1983. “Beliefs about Beliefs: Representation and Constraining Function of Wrong Beliefs in Young Children’s Understanding of Deception.” Cognition 13 (1):103–28. https://doi.org/10.1016/0010-0277(83)90004-5.CrossRefGoogle ScholarPubMed
Woodruff, Guy, and Premack, David. 1978. “Does the Chimpanzee Have a Theory of Mind?Brain and Behavior Sciences 1 (4):515–26.Google Scholar
Yarkoni, Tal. 2020. “The Generalizability Crisis.” Behavioral and Brain Sciences 45:137. https://doi.org/10.1017/S0140525X20001685.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Data collected by Kulke and Rakoczy (2018). The authors coded data as follows: (1) false-belief 1 above chance, (2) false-belief 2 above chance, (3) measured by how long participants looked at the correct versus incorrect window, and (4) measured by the “first look.” A result was coded as a replication if it met criteria 1–4 and as a partial replication if it met criteria 1 or 2 and criteria 3 or 4