Introduction
Research on prehistoric gender has sparked a vivid debate in recent decades, with a main point of contention being whether or not prehistoric gender complied with a binary model and to what extent. There is no general agreement on how to identify gender identities in the prehistoric record and the debate is polarized: on the one hand, some theoretical approaches argue that the relevance of biological sex is overestimated in prehistoric archaeology (e.g. Claassen Reference Claassen and Claassen1992; Ghisleni et al. Reference Ghisleni, Jordan and Fioccoprile2016; Moral Reference Moral2016; Voss Reference Voss2008). On the other hand, more empirically grounded research contends that, as biological sex is the only component that can be positively determined in the archaeological record, it is a necessary part of the identification process (e.g. Hofmann Reference Hofmann and Rambuscheck2009; Müller-Scheeßel Reference Müller-Scheeßel, Koch and Kirleis2019). More nuanced perspectives are emerging that attempt to frame the formation of binary gender norms as a historical process rather than as a universal constant (Robb & Harris Reference Robb and Harris2018).
While research on prehistoric gender considers several sources of archaeological evidence, arguments very often revolve around the burial rite, as it provides the unique opportunity to observe the correlation between individuals and their formal representation (e.g. Arnold Reference Arnold and Nelson2006; Sørensen Reference Sørensen, Koch and Kirleis2019; Turek Reference Turek2017). At the same time, the vast majority of archaeologists working in the field and producing integral publications of burial sites (e.g. Cardarelli Reference Cardarelli2014; Dresely Reference Dresely2004; Ebner-Baur Reference Ebner-Baur2020; Jovino Reference Jovino2010; Paresys et al. Reference Paresys, Auxiette, Moreau, Saurel and Vanmoerkerke2009; Weidig Reference Weidig2014) tend to overlook the current theoretical debate and rely on ‘traditional’ binary models. It seems that there is a disconnection between theoretical research on prehistoric gender and empirically based research on burial sites: the former produces new models but seldom tests them against quantifiable variables (or tests small samples: e.g. Müller-Scheeßel Reference Müller-Scheeßel, Koch and Kirleis2019; Rebay-Salisbury et al. Reference Rebay-Salisbury, Bortel and Janker2022); the latter quantifies variables but largely overlooks alternative models.
In this article we attempt to bridge this gap and try to assess if and to what extent data obtained through ‘traditional’ approaches can be used to test the validity of the binary gender model. We do so at the cost of an extreme simplification of quantifiable variables, that nonetheless can provide the opportunity to test hypotheses. Our model is based on sex and gender as two distinct concepts (Butler Reference Butler1990; Money Reference Money1955; Rubin Reference Rubin and Reiter1975), both of which, we assume, are potentially definable in the archaeological domain (Conkey & Spector Reference Conkey and Spector1984), the former through bioanthropological analyses and the latter through archaeological methods. We test the common assumption that grave goods are organized in ´gendered categories´ that are systematically correlated to an individual's biological sex. We synthesize the main sources of analytical error and methodological bias that can potentially affect both bioanthropological and archaeological interpretations, and single out avoidable loopholes. We then introduce the concept of minority—as opposed to the more common concept of exception—as an interpretive framework for individuals potentially diverging from the binary model. In short, we consider a minority every case that escapes the statistical norm and occurs more than once. We consider the term ‘exception’ inappropriate, as we find that non-binary cases are indeed recurrent, and do not denote any exceptional treatment in the burial rite.
Based on these premises, we analyse a sample of 1252 individuals from seven burial sites in central Europe, spanning the Early Neolithic and the Late Bronze Age (c. 5500–1200 bce), and test the hypothesis that prehistoric gender is binary. Our results suggest that the answer is complex. They show that the binary model accounts for most of the variability of the sample, but not all of it. We also find evidence of circular argumentations in the determination of sex and gender in prehistoric burials that appear to bias the data in favour of the binary model. We conclude that old data support the existence of a small but quantitatively relevant minority diverging from the binary model throughout the Neolithic and the Bronze Age. At the same time, we find that the error margin of sex determinations based on osteological analyses still leaves too much room for uncertainty.
Modelling (non-)binarity: a working framework
We consistently use the term sex (and the adjectives male/female) for biological sex determined based on the morphometric analysis of skeletal remains, and gender (and the adjectives masculine/feminine) for hypothetical identities determined based on grave goods and burial practices. Both concepts are mere classification criteria, and as such they are both imperfect (e.g. Agarwal & Wesp Reference Agarwal and Wesp2017; Alt & Röder Reference Alt, Röder and Rambuscheck2009; Kranzbühler Reference Kranzbühler, Koch and Kirleis2019). Some may even question why we need to classify sex and gender separately in the first place (e.g. Moral Reference Moral2016; Turek Reference Turek2017), as the development of these concepts in the sociological domain has moved towards the acknowledgement that biological sex is as much socially constructed as gender is (e.g. Butler Reference Butler1993; also in the bio-medical field: Voß Reference Voß2010). We argue that if our goal is to determine whether gender was binary or not in pre-literate societies, then determining biological sex is an unmissable part of the process. In many societies throughout history—even contemporary ones—the construction of gender identities is either implicitly or explicitly imposed based on phenotypical sex, with little room for self-determination. In other words, some societies accept no variance from the biologically determined binary model. Other societies, however, accept different levels of variance, ranging from high to low (Fig. 1). In a way, the non-binarity of gender is measured by the degree of socially accepted variance from the binary sex-model. It follows that, if we want to understand how much gender-variance is accepted in prehistoric societies, we first need to classify biological sex.
We refer to sex as a biologically determined aspect producing phenotypical traits and potentially identifiable through the examination of skeletal remains by means of osteology and bioanthropological techniques, such as aDNA-typing (e.g. Brown & Brown Reference Brown and Brown2011; Hummel Reference Hummel2003; Mittnik et al. Reference Mittnik, Wang, Svoboda and Krause2016; Skoglund et al. Reference Skoglund, Storå, Götherström and Jakobsson2013) and proteomics (i.e. dimorphic enamel peptide analysis: e.g. Buonasera et al. Reference Buonasera, Eerkens and de Flamingh2020; Gowland et al. Reference Gowland, Stewart, Crowder, Hodson, Shaw, Gron and Montgomery2021; Rebay-Salisbury et al. Reference Rebay-Salisbury, Bortel and Janker2022; Stewart et al. Reference Stewart, Gerlach, Gowland, Gron and Montgomery2017). We are, of course, aware of the difficulties bound with such classifications as, for example, osteological sex only comprises secondary sexual characteristics, which can also be influenced by other factors (e.g. Garofalo & Garvin Reference Garofalo, Garvin and Klales2020, 37; Voß Reference Voß2010); even chromosomal sex can have a number of non-binary variants, or aneuploidies, of which the most common are the Turner syndrome (XO), Klinefelter syndrome (XXY), and trisomy X (XXX), XYY and XXYY (e.g. Skuse et al. Reference Skuse, Printzlau, Wolstencroft, Geschwind, Paulson and Klein2018). These variants, however, are extremely rare and difficult to detect in the archaeological record (e.g. Moilanen et al. Reference Moilanen, Kirkinen, Saari, Rohrlach, Krause, Onkamo and Salmela2022; based on Knüsel & Ripley Reference Knüsel, Ripley, Frazer and Tyrrell2000, 162; Nielsen & Wohlert Reference Nielsen and Wohlert1991); it is even unclear how relevant and apparent these variants are to us today (e.g. Lanfranco et al. Reference Lanfranco, Kamischke, Zitzmann and Nieschlag2004, 273), let alone to prehistoric people. Since the near totality of sex-determinations available are obtained through traditional methods based on osteology—which have no possibility to detect these variants—we have no other option, for now, than to assume biological sex as a binary variable.
Gender, on the other hand, is socially constructed. This implies that, as a relational category, gender results from the interplay between the way individuals perceive themselves, the role that society tends to assign to them and the degree of socially accepted variance. Since gender is not necessarily dependent on sex, bioanthropological determinations can be ineffective towards its determination. In this respect, ‘traditional’ archaeological methods still hold high informative potential. Standard archaeological theory assumes that deceased individuals are granted ‘attributes’ in the burial rite in the form of objects (i.e. grave goods) and materialized practices (e.g. body positions), that somehow reflect the traits that defined their social persona (e.g. Binford Reference Binford1971; Gebühr Reference Gebühr1975; Hodder Reference Hodder1982, 201). As far as gender is counted among these traits, one can assume that the choice of grave goods somehow reflects the gendered role that was either chosen by the individual or imposed on them by society.
Simplifying the question to the core, one can think of socially accepted gender variance as a system with inputs and outputs. Biological sex represents the input—mostly binary by definition (female/male)—while gender represents the output. In a simple deterministic model, the output will be always determined by the input, i.e. a biological male will always be assigned a masculine gender. In other words, sex and gender will always match, allowing only for two possible combinations, i.e. either M-M or F-F (Fig. 2): this is what we call a binary system. If we break the direct causal link between inputs and outputs, however, the system is not binary any more as the possible combinations become four: M-M, F-F, M-F and F-M. While such a simplification may not do justice to the complexity of an individual's identity, it provides a convenient analytical framework that allows quantification. In this perspective, analysis is only possible if sex and gender are taken as separate entities (see e.g. Hofmann Reference Hofmann and Rambuscheck2009; Tori Reference Tori2019, 19). The standard approaches used to identify them are, however, affected by systematic error and bias. While error and bias are not completely avoidable, identifying their sources can at least help in minimizing their effects.
The potential errors and bias of traditional osteological and archaeological sex/gender-determination methods in burial archaeology
The margin of error of osteological sex determinations
Bias in sex determinations is manifold and can have different sources, such as the choice of sex-determination methods (morphological and/or metric traits), the targeted skeletal element, the degree of sexual dimorphism of a given population (which may vary in time and space), the state of preservation of skeletal remains and even the experience level of the anthropologist. It is impossible to quantify the relevance of each of these potential biases, as some of them are still largely under-studied. The accuracy achieved by means of specific sex-determination methods on specific skeletal elements, however, has been examined: according to many authors, an accuracy level of c. 85–99 per cent can theoretically be achieved based on the pelvis, depending on specific methods and bone preservation (e.g. Boldsen et al. Reference Boldsen, Milner and Ousley2021, 7; Brůžek et al. Reference Brůžek, Santos, Dutailly, Murail and Cunha2017; Đurić et al. Reference Đurić, Rakočević and Đonić2005; Herrmann et al. Reference Herrmann, Grupe, Hummel, Piepenbrink and Schutkowski1990, 77; Meindl et al. Reference Meindl, Lovejoy, Mensforth and Carlos1985; Phenice Reference Phenice1969; Ubelaker & Volk Reference Ubelaker and Volk2002). The pelvis and more particularly the ossa coxa are considered the most reliable skeletal elements for sex determination, as, due to functional demands, they feature similar patterns of sexual dimorphism across regions and time (Brůžek et al. Reference Brůžek, Santos, Dutailly, Murail and Cunha2017, 441). The reliability estimates of sex determinations based on adult skulls (i.e. crania and mandibles) alone span c. 70–90 per cent, again depending on different methods and preservation state (e.g. Đurić et al. 2005; Herrmann et al. Reference Herrmann, Grupe, Hummel, Piepenbrink and Schutkowski1990, 77; Meindl et al. Reference Meindl, Lovejoy, Mensforth and Carlos1985). Spradley and Jantz (Reference Spradley and Jantz2011) suggest that determinations using the postcranial skeleton could actually provide estimates superior to those based on the skull, i.e. up to 94 per cent with multivariate models. Finally, all authors warn of sexing non-adult individuals, as sex indicators only start developing with puberty. Depending on the age at death and the skeletal elements taken into consideration, reliability estimates for non-adults range between 50 and 85 per cent (e.g. Brown & Brown Reference Brown and Brown2011, 156–7; Grupe et al. Reference Grupe, Christiansen, Schröder and Wittwer-Backofen2005, 93–4), which—as Bass puts it (Reference Bass2005, 19)—is just ‘a little better than a guess’. Old age may also impact reliability, as masculinization processes may occur on senile, female skeletons (e.g. Burmeister & Müller-Scheeßel Reference Burmeister, Müller-Scheeßel and Müller2005, 93; Knüsel & Ripley Reference Knüsel, Ripley, Frazer and Tyrrell2000, 160; Krishan et al. Reference Krishan, Chatterjee, Kanchan, Kaur, Baryah and Singh2016, 165e2; Walker Reference Walker, Saunders and Herring1995).
To complicate matters even further, most available methods are based on modern skeletal samples, whose degree of comparability with prehistoric populations is largely unknown. Inskip et al. (Reference Inskip, Scheib, Wohns, Ge, Kivisild and Robb2019) tested the accuracy rates of different standard sex-estimation methods based on the skeletal material from the Hospital of St John the Evangelist, Cambridge (thirteenth–sixteenth century) by means of aDNA-typing, and found significant discrepancies in accuracy rates even between modern and post-medieval samples. Accuracy tests for prehistoric individuals are still very limited (e.g. Rebay-Salisbury et al. Reference Rebay-Salisbury, Bortel and Janker2022; Turek Reference Turek, Koch and Kirleis2019), and no one has ever fully tested a burial site of pre-medieval age. The lack of extensive testing is all the more relevant if one considers the highly diverse genetic framework of prehistoric European populations (e.g. Brandt et al. Reference Brandt, Haak and Adler2013; Cox et al. Reference Cox, Ruff, Maier and Mathieson2019; Knipper et al. Reference Knipper, Mittnik and Massy2017; Mittnik et al. Reference Mittnik, Wang and Pfrengle2018; Rivollat et al. Reference Rivollat, Jeong and Schiffels2020).
Standard publication practice of prehistoric burial sites is also a source of indeterminacy. Most of the time, bioanthropological analyses are limited to short sections or appendices, and include only synthetic indications on the methods applied, the overall preservation of skeletal remains and the results of the sex estimates (e.g. Knöpke Reference Knöpke2009; Neugebauer Reference Neugebauer1991). As a consequence, the arguments as to why an individual was categorized as ‘certain’, ‘rather’ or ‘tendentially’ biological male/female are frequently insufficiently or not at all included. Furthermore, many sites have been studied at different stages of research developments and, as a consequence, by means of different sex-estimation methods, and following different region-specific standard protocols. As a result, the sex data tend to be opaque, unverifiable and incomparable.
The potential bias in the determination of archaeological gender
In order to frame how sex and gender intersect in burial archaeology, we introduce a thought experiment based on weapons in graves. By adopting a top-down approach, we start by assuming that weapons are a masculine attribute. Independently of whether this is true or not, there is certainly widespread agreement that warfare in antiquity was mostly men's business (e.g. Cintas-Peña & García Sanjuán Reference Cintas-Peña and García Sanjuán2019; Gentile et al. Reference Gentile, Sparacello, D'Ercole, Coppa, Dolfini, Crellin, Horn and Uckelmann2018; Harding Reference Harding, Cardarelli, Cazzella and Frangipane2015; Jantzen et al. Reference Jantzen, Brinker and Orschiedt2011). Given this premise, one way to investigate socially accepted gender variance would be to ask the following question: Were female individuals formally granted masculine attributes in the burial rite? A positive answer would support the interpretation that society accepted a certain degree of variance. In a first step, archaeologists would identify weapons and anthropologists identify female individuals (Fig. 3a). We then quantify associations between weapons and biologically female individuals in a second step. If the results do not show significant associations, we conclude that gender is binary. If they do, we conclude that gender is not necessarily binary.
The LBA burial site of Neckarsulm provides an ideal example of a ´top-down approach´. The site has been interpreted as a ‘men's cemetery’, based on the alleged absence of any ‘standard’ feminine indicator, and of secure female osteological determinations. The criteria for archaeological gender determinations at Neckarsulm are based on the accumulation of case studies over decades (Knöpke Reference Knöpke2009, 45), and generally fit within a near-universally accepted model (Robb & Harris Reference Robb and Harris2018). These same criteria, however, cannot account for the 58 per cent of individuals that have no gendered grave goods and the 26 per cent for which biological sex is undetermined.
In a top-down approach, the interplay of sex and gender would seem quite straightforward. But what if we wanted to determine if weapons are really a masculine attribute? In this bottom-up scenario, archaeologists would first identify weapons and anthropologists would identify biologically male and female individuals, then one would quantify the associations between weapons and both sexes (Fig. 3b). If weapons are not significantly associated with one sex, we would conclude that weapons are gender neutral. If they are significantly associated with one particular sex, then we would conclude that weapons are a gender-specific attribute, i.e. ‘binary attributes’. For example, for the Neolithic inhumation burial sites of Aiterhofen-Ödmühle (Germany) and Trebur (Germany), gender was assigned from the bottom up after the quantification of the associations of specific object categories with osteologically determined male and female individuals (Nieszery Reference Nieszery1995, 110–12; Spatz Reference Spatz1999, 177–98).
While the bottom-up approach may seem perfectly logical at first sight, it contains a dangerous pitfall: if we identify a positive correlation between e.g. biologically male individuals and weapons, we may be tempted to use weapons to identify biologically male individuals, inevitably triggering a circular argument. The burial sites of Trebur in Germany (early fifth millennium bce) and Olmo di Nogara in Italy (mid-second millennium bce) provide examples of this common practice. At Trebur, two individuals that were classified as male by anthropologists ended up being classified as female because their burial context contained what the archaeologist who authored the book considered feminine attributes (burials 51 and 62: see Spatz Reference Spatz1999, 118, table 73). At Olmo, five individuals identified as male by anthropologists are ultimately classified as female in the interpretive synthesis, based on the association with grave goods considered typically feminine (burials 85, 100, 155, 323, and 411: Salzani Reference Salzani2005, 467).
A certain degree of subjectivity is also noticeable, for example, in the attempt to hierarchize objects considered masculine or feminine attributes within one assemblage or in the difficulty of assigning some objects to clear use categories. At Olmo, for example, interpretation of the function of daggers is dependent on the overall composition of grave goods, and specifically on whether it fits the expectation of either a female or male individual: weapons for men, tools (i.e. kitchen knives) for women, despite the shape of the object being exactly the same (e.g. Salzani Reference Salzani2005, 298). Similar observations have been made, for example, in Bronze Age and Viking Age northern Europe (Bergerbrant Reference Bergerbrant2007; Moen Reference Moen2021).
We argue that while some of these common practices can be to some extent effective in determining broadly defined norms, in practice they produce circular arguments that hamper any possibility of positively identifying ‘mismatched’ sex/gender combinations. Such a circularity can be avoided by addressing biological sex and what is archaeologically perceived as gender as two separate concepts.
Interpretive framework: minorities versus exceptions
In some exceptional cases, ‘mismatched’ sex/gender combinations are sufficiently well documented to allow one to exclude determination error. The question is what these exceptional cases actually represent: are they exceptions or minorities? The difference is crucial, as it defines the very possibility that we will ever be able to understand what these cases actually mean. From an archaeological point of view, we will never be able to understand exceptions: by definition, an exception is something that occurs so rarely that it does not provide enough statistical evidence. By the same token, as far as the perception of a certain social phenomenon is concerned, exceptions escape classification, hence they are difficult to frame within one's world view. Minorities, on the other hand, are recurrent. No matter how small, a minority will always provide enough data to be singled out from the statistical norm and modelled consequently. Similarly, in the social domain a minority can be acknowledged by laws and explicitly assigned rights and duties.
The concept of minority is central to our argument. Following up on our thought experiment, there is little doubt that, in large part, weapons are associated with male individuals. What we do not know exactly is how large this part actually is. The point is not even whether or not weapons were actually a men's prerogative. In fact, acknowledging that weapons were considered masculine attributes does not preclude the possibility that masculine attributes were formally granted to biologically female individuals, and vice versa.
A famous example of an ‘extraordinary exception’ is the tenth-century ad Viking burial Bj 581 in Birka, Sweden. The individual was, since its excavation by H. Stolpe, believed to be of male sex as it was associated with an outstanding assemblage of warrior equipment (Arbman Reference Arbman1943, 188–9). The possibility that the individual was, in fact, a biological female had already been suggested in the 1970s; however, only recent aDNA analyses were able to confirm a female determination (Hedenstierna-Jonson et al. Reference Hedenstierna-Jonson, Kjellström and Zachrisson2017, incl. S2; Price et al. Reference Price, Hedenstierna-Jonson and Zachrisson2019). The Early Holocene Andean highland site of Wilamay Patjxa provides a further case study. The osteological and proteomic analyses of burial 6 show that a young adult female individual was buried with a ‘hunting toolkit’ of stone projectile points and animal-processing tools, which are usually considered masculine attributes (Haas et al. Reference Haas, Watson and Buonasera2020). Further exceptions to the rules have been reported in different regions of Europe, for periods spanning the Neolithic (e.g. Häusler Reference Häusler2012; Wiermann Reference Wiermann1997), the Bronze Age (e.g. Vaňharová & Drozdová Reference Vaňharová and Drozdová2008; see also Rebay-Salisbury et al. Reference Rebay-Salisbury, Bortel and Janker2022; Turek Reference Turek, Koch and Kirleis2019) and early Middle Ages (e.g. Knüsel & Ripley Reference Knüsel, Ripley, Frazer and Tyrrell2000).
In theory, these examples would not even challenge the ‘man-the-hunter’, the ‘man-the-warrior’ nor yet the ‘women-the-housekeepers’ hypotheses, as long as they remain exceptions in the archaeological record. But this is precisely the question: do these cases really represent unrepeatable exceptions, or are they rather the iceberg's tip of a small, albeit quantitatively relevant minority?
Analysis: quantifying sex/gender combinations in Neolithic and Bronze Age burial sites
We have collected the data of a total of seven published burial sites including a total of 1252 inhumed individuals from Neolithic and Bronze Age central Europe (see supplementary material, SI2). The selected burial sites include (Fig. 4): Aiterhofen-Ödmühle (Germany, Early Neolithic: Baum Reference Baum1990; Nieszery Reference Nieszery1995); Trebur (Germany, Middle Neolithic: Spatz Reference Spatz1999); Ostorf-Tannenwerder (Germany, Nordic Middle Neolithic: Bastian Reference Bastian1961; Larsson et al. Reference Larsson, Lüth and Terberger2007; Schiesberg Reference Schiesberg and Müller-Scheeßel2013; Schuldt Reference Schuldt1961); Lauda-Köningshofen (Germany, Final Neolithic: Menninger Reference Menninger2008; Ortolf Reference Ortolf2014; Trautmann Reference Trautmann2012); Gemeinlebarn-Nekropole F (Austria, Early Bronze Age: Neugebauer Reference Neugebauer1991), Olmo di Nogara (Italy, Middle/Late Bronze Age: Pulcini Reference Pulcini2014; Salzani Reference Salzani2005; Salzani et al. Reference Salzani, Rizzi and Tecchiati2016); and Neckarsulm (Germany, Late Bronze Age: Knöpke Reference Knöpke2009). The selection of the sites responds to pragmatic criteria. The sample includes most of the largest individual inhumation burial sites in Neolithic and Bronze Age central Europe. All sites are published integrally and therefore allow us to observe the articulation of archaeological gender estimates and osteological sex estimates. Furthermore, these sites are frequently referred to in relevant literature on prehistoric gender.
The first question is whether or not the representation of gender in the burial rite of Neolithic and Bronze Age cemeteries follows a binary pattern. To address this question, we rely on the sex and gender determinations provided by the original publications of the burial sites in our sample. Only inhumations were considered, to provide maximum accuracy for osteological sex determinations. In all our sources, sex is always determined by osteologists based on the morphometric analysis of skeletal remains, with no application of chromosomal methods. In the same way, the determination of archaeological gender follows the indications given in the source publications and relies mainly on grave goods. Since not all publications in our sample consider other potentially gendered traits of the burial rite (such as the grave structure, size and shape, as well as the position of the body), grave goods are a ‘convenient common denominator’, shared by each site publication regardless of regions and periods, which allows comparability. The assessment of archaeological gender was, however, not straightforward for all case studies, as most authors do not semantically consider sex and gender separately. Archaeological determinations are also not always applied consistently: for example, while the finds (including a stone axe, a hammerstone and a flint blade) associated with individual 47 in grave 1935/I at the site of Ostorf-Tannenwerder are explicitly considered masculine attributes (Bastian Reference Bastian1961, 28), an archaeological gender determination is not indicated for individual 5 of grave 1904/6, which contained a stone axe (Bastian Reference Bastian1961, 22). Therefore, gender determinations were reconstructed following the single authors’ indications scattered throughout the respective publications. In addition, the gender-determination methodology of our sources is not homogeneous, showing variable proportions of top-down and bottom-up approaches. Weapons, for example, are always considered a masculine attribute a priori (e.g. Gemeinlebarn, Neckarsulm, Olmo di Nogara, Ostorf-Tannenwerder). Other objects (e.g. generic tools, ornaments, pottery) are addressed by some sources in a bottom-up fashion, by quantifying the association of different classes of grave goods with osteologically determined sexes (e.g. Aiterhofen-Ödmühle, Trebur, Ostorf, Lauda-Königshofen). Finally, we must always bear in mind that our understanding of archaeological gender is only based on what is preserved, which entails the risk of misinterpreting the original characterization of the deceased. This issue is particularly pressing for the Neolithic, as, for instance, in the absence of metals a large part of the weaponry must have been made of wood (e.g. spears, clubs, bows).
Sex-estimation methods underlie a similar heterogeneity when compared across sites. In addition, the skeletal collections of Olmo di Nogara (Pulcini Reference Pulcini2014) and Ostorf-Tannenwerder (Patolla in Schiesberg Reference Schiesberg and Müller-Scheeßel2013) have been the object of more recent osteological analyses which yielded partially different results. For Ostorf in particular, the new analyses showed the occurrence of sparse skeletal remains of secondary burials in addition to the primary burials previously documented (for further details, see supplementary file SI1). However, neither case provides enough information to evaluate whether or not an actual improvement in accuracy was achieved; therefore, we included both old and new analyses in our quantification, and compared the results.
In order to answer our question, we quantified in how many cases sex and gender determinations coincide, and in how many cases they do not. The quantification produced four categories (Fig. 5 and Table 1): (1) cases in which sex and gender-determinations match (hereafter ‘match’); (2) cases in which the determinations are opposite (‘opposite’); (3) cases in which either sex or gender is determined and the other is undetermined (‘partial’); (4) cases for which both sex and gender are indifferent or undetermined (‘indeterminate’).
The ‘average total’ bar of the graph shows the average percentages of all Neolithic and Bronze Age sites: The sex and gender determinations of 26.5 per cent (27.2 per cent following the new data) of the sample match, and 2.9 per cent (2.2 per cent following the new data) contradict each other. A total of 41.6 per cent (40.7 per cent) of the sample has partial determinations, and for the remaining 28.9 per cent (30.0 per cent) we have neither sex nor gender information. If we break down the data based on their chronology, we can observe that both Neolithic and Bronze Age burials follow the same trend.
The general results of our analysis seem to support traditional models: if one singles out the cases for which we have both sex and gender determinations (based roughly on one-third of the total sample, mostly adults: Fig. 5), the association pattern appears overwhelmingly binary, with 90.0 per cent (or 92.6 per cent considering the new data) of burials showing matching sex and gender indicators (Fig. 6). Finally, we can also observe that for 10.0 per cent (or c. 7.4 per cent based on the new osteological data) of this portion of deceased individuals the osteological and archaeological determinations contradict each other.
Discussion: margins of error versus non-binary minorities
The data would suggest that gender is mostly binary, with a small but noticeable non-binary component that varies slightly between old and new osteological analyses. The sample including the older osteological assessments of Ostorf and Olmo shows 36 individuals with opposite sex/gender determinations (corresponding to 10 per cent of the total number of individuals for which both osteological and archaeological determinations are available), while newer analyses bring this number to 27 individuals (c. 7.4 per cent) (Table 1). All individuals with opposite sex/gender determinations are illustrated in detail in Table 2.
There are two possible ways to interpret this portion: a minimalist approach—in line with the usual procedures—would suggest interpreting it as a product of the error margins of determination methods; as an alternative, one could acknowledge that non-binary minorities were systematically represented in the burial rite of prehistoric Europe (see e.g. Burmeister & Müller-Scheeßel Reference Burmeister, Müller-Scheeßel and Müller2005; Knüsel & Ripley Reference Knüsel, Ripley, Frazer and Tyrrell2000; Müller-Scheeßel Reference Müller-Scheeßel, Koch and Kirleis2019, 48). Other than through an extensive campaign of bioanthropological sex determinations—which is beyond the scope of this article—one can narrow down expectations by addressing potential sources of bias and quantifying their potential impact on the production of the available data. In any case, it is crucial to keep in mind that the error margin of osteological determinations ranging anywhere between 50 and 99 per cent indistinctly affects each single determination, regardless of whether or not it fits our predictions. This means that in principle, the error on matching determinations is exactly the same as the error on opposite determinations. In the following, we will discuss arguments against or in favour of the existence of a non-binary minority in the prehistoric burial rite.
‘Opposites’ are overestimated
We start by assuming that ‘opposites’ are overestimated. A possible cause of overestimation could be found in the distribution of those age categories that are less reliable to sex-determine osteologically, that is individuals younger than c. 20 years of age at death and those older than c. 60 years (see The margin of error of osteological sex-determinations, above). We quantified how many non-adult and senile individuals have matching and opposite sex/gender determinations (Table 3). We used Fisher's Exact Test to determine if the difference between the matching and opposite categories is statistically significant. If the difference is significant, the test would support the interpretation that the category for which the incidence of ‘problematic’ determinations is higher is more likely to have been affected by determination bias. If the difference is not significant, then the two categories are affected by the same bias.
We executed two separate tests on the total sample, one including the old data from Olmo and Ostorf-Tannenwerder and one including the new. For the sample including the old data, ‘problematic’ age categories represent c. 31 per cent of the total determinations of the opposites, while for matching determinations they represent only c. 8 per cent. The difference is significant at p <0.01, suggesting that ‘opposites’ are indeed significantly more affected by potentially biased determinations. For the sample including the new data, those same categories represent respectively c. 18 per cent and c. 9 per cent. The difference here is not significant, which leads us to exclude that ‘opposites’ are in this case significantly more affected by bias.
This leads us to ask whether one can expect that new osteological determinations will always result in a reduction of the ‘opposites’. Testing this would again require a thorough estimation of the theoretical margin of error of each single sex determination, which however is not possible with the available data. One can nonetheless observe that while in Olmo the new analyses resulted in the reduction of the ‘opposites’, in Ostorf-Tannenwerder the ‘opposites’ increased. A closer look at the revisions of previous sex determinations of the Ostorf population shows that the number of ‘matches’ was affected to a greater extent than that of the opposites (supplementary file SI2): while 4 out of 14 ‘old matches’ were not confirmed and turned into 4 ‘new opposites’, the sex determinations of 3 out of 4 ‘old opposites’ were confirmed, and the remaining ‘old opposite’ became a ‘new partial’. By contrast, the osteological re-evaluation of the skeletal collection of the Bronze Age burial site of Olmo (Pulcini Reference Pulcini2014) resulted in a drastically different picture: the revised determinations affected nearly exclusively the ‘opposites’, as 13 out of 14 ‘old opposites’ were turned into 13 ‘new matches’, and 1 out of 105 ‘old matches’ resulted in 1 ‘new partial’. Simply put, the osteological re-evaluations of the two sites seem to suggest that ‘matches’ are just as likely to decrease as ‘opposites’ are to increase, and vice versa.
‘Opposites’ are underestimated
The evidence for ‘opposites’ being overestimated is not conclusive. By the same evidence, one cannot even exclude that ‘opposites’ can, in fact, be underestimated. One can approach the matter from a different perspective and attempt to quantify the impact of circular argumentations in archaeological gender determination on sex determination. We have shown above how the usual lack of separation between the concepts of sex and gender can lead to attributing osteological sex based on grave goods: how far does this practice affect the data, and is there any evidence that archaeological interpretations may somehow influence the outcome of osteological analyses?
In order to clarify this, we conducted a statistical test on the available data: we quantified how frequently individuals that have been archaeologically gendered have also been osteologically sexed, and how frequently individuals that have not been archaeologically gendered have been osteologically sexed (Fig. 7). Our goal is to assess whether a correlation exists between the likelihood that an individual receives a gender determination and the likelihood that the same individual receives a sex determination, and ultimately understand if the former can theoretically influence the latter.
The first graph includes the total sample (n (old) = 1221; n (new) = 1252); the second the Neolithic sample only (n (old) = 441; n (new) = 472); and the third the Bronze Age sample only (n (old/new) = 780). All three graphs show a strikingly consistent pattern, according to which the chances of achieving osteological sex determinations whenever gender determinations are absent amount to a little more than 50 per cent. However, whenever a gender determination is present, the likelihood of sexing an individual osteologically rises to c. 80 per cent. Fisher's Exact Test confirms that this difference is highly significant for all three graphs, at p <0.01. These results suggest that osteological sex estimates tend to be influenced by archaeological gender determinations, and would imply that anthropologists are more confident in determining osteological sex whenever gender indicators (independently of their ‘correctness’) are present—regardless of the methods applied at single sites and their chronological attribution to the Neolithic or the Bronze Age. While it is not possible to determine the single cases in which this bias may have occurred, logic would suggest that if the presence of archaeological gender indicators raises the likelihood of obtaining an estimation of the osteological sex, then the most affected cases would probably be those in which sex and gender determinations match—hence theoretically creating a bias towards the binary model.
The invisible majority
Finally, while much tribute is paid to sex/gender matches and opposites, the largest portion of c. 70 per cent of the average population lacks either sex or gender determinations, or both of them (Fig. 5), and hence remains unaccounted for. Research on later periods has pointed out similar proportions (e.g. Müller-Scheeßel Reference Müller-Scheeßel2011, 210). We emphasize that this portion, if undergone new archaeological and bioanthropological examinations, could potentially provide a substantially different picture. For example, a total of 94 of 508 (or 87 of 510, following the new data) ‘partial’ individuals are associated with gendered attributes, but remained unsexed based on osteological methods. Genetics and proteomics may allow us to fill the gaps, at least for cases featuring sufficient bone preservation (e.g. Krishan et al. Reference Krishan, Chatterjee, Kanchan, Kaur, Baryah and Singh2016, 165e2). Also, more sex determinations may reveal a wider range of gendered grave goods, especially when using bottom-up approaches.
Conclusions
Our analysis of 1252 individuals from seven large burial sites in central Europe—spanning the Early Neolithic and the Late Bronze Age (c. 5500–1200 bce)—largely supports the binary sex/gender model, but also hints at the persistence of a small but quantitatively relevant minority of individuals that escapes the model's expectations. In synthesis, we find that the standard binary model explains c. 90 per cent of the variability of gendered and sexed funerary evidence (both Neolithic and Bronze Age), while it does not account for up to 10 per cent of burials with opposite sex/gender determinations. We also find evidence that archaeological gender determinations can sometimes influence osteological sex determinations, hence incurring circular argumentations.
We assessed sex based on osteological analysis and gender based on grave goods, while entirely relying on determinations provided in the publications of each burial site. We found that in six burial sites out of seven there is a persistent minority of individuals whose determined sex does not coincide with the gender that their respective grave goods are supposed to signal. If we consider only the individuals for which both sex and gender determinations are available, the ‘opposites’ range from c. 1 per cent (Olmo new analysis) to c. 42 per cent (Ostorf-Tannenwerder new analysis) of the analysed sample, with an average of 10 per cent (or 7.4 per cent following the new data). The site of Neckarsulm is the only exception, as the authors of the publication interpret it as a male-only cemetery. However, this interpretation demands caution, as c. 70 per cent of individuals lack either sex or gender determinations, or both of them.
We did not find conclusive evidence that ‘opposites’ are substantially overestimated due to analytical error in sex determinations. While old analyses of Olmo and Ostorf-Tannenwerder show that ‘opposites’ have a higher incidence of age categories that are difficult to sex, new analyses do not. In addition, following new analyses, the opposites are as likely to decrease as they are to increase. We found strong evidence that osteological sex estimates are to some extent influenced by archaeological gender determinations. While this does not necessarily imply an underestimation of ‘opposites’, we interpret this as a potential bias towards the binary model.
We conclude that available data—despite potential biases—support the hypothesis that some degree of gender variance was formally accepted in the burial rite of prehistoric Central European societies. However, the error margins of traditional methods of sex determination cannot be accurately quantified, hence the actual size of the ‘non-binary minority’ is still largely uncertain.
The possible existence of a non-binary minority throughout Europe's late prehistory encourages a reflection on what the divergence from the binary gender model could imply for our understanding of prehistoric European societies. By ‘binary model’, we intend a system with only two inputs that can produce only one outcome each—or, the ‘Two-Sex/Two-Gender Model’ according to Ghisleni et al. (2016, 767–9). That is, a biological man will always be associated with a masculine gender, and a biological woman with a feminine one. Our results suggest that, on the contrary, two inputs can produce two outcomes each (Fig. 2). Even though it is true that the inputs (i.e. sex) are very good predictors of the outputs (i.e. gender)—as sex seems to determine gender in c. 90 per cent of cases, when complete information is available—we cannot ignore the small but significant minority that escapes predictions.
Framing this divergence from the statistical norm as minority rather than exception helps understand its potential relevance. While an exception would be limited to a single person that is different from others—someone that is not included, and in a way unpredictable—a minority can be formally acknowledged, protected and even revered. If future, more accurate analyses confirm their statistical significance, it would seem that ‘opposites’ are not in any way treated differently in death: the attributes granted to them in the burial rite are entirely standard, and do not denote any aspect of exceptionality. In other words, the masculine equipment dedicated to a biological female is not different from the same equipment dedicated to a biological male, and vice versa. As these individuals were treated according to standard norms, this leads us to exclude that they were considered exceptions. On the other hand, there is no indication at all of whether such a ‘mismatched identity’ was chosen by their bearers or rather imposed on them, either in life or in death. In addition, focusing on gender should not overshadow the many different traits that influence an individual's representation in the burial rite. Burial attributes can be also correlated to age, mobility, role and/or social status, and all these traits can simply tend to be correlated to different biological sexes (e.g. Arnold Reference Arnold2016; Bickle Reference Bickle2020; Geller Reference Geller2009, 70; Großmann Reference Großmann2021; Masclans Latorre et al. Reference Masclans Latorre, Bickle and Hamon2021; Müller-Scheeßel Reference Müller-Scheeßel, Koch and Kirleis2019). In this perspective, the deposition of what we perceive as gendered grave goods might be only indirectly correlated to biological sex.
Our case study also suggests caution in interpreting the available evidence, as it shows that our knowledge of prehistoric gender is largely based on insufficient, frequently unverifiable and partly biased data. Only roughly 30 per cent of all burials provide enough data to compare biological sex with archaeological gender, while the remaining part is either partially determined or completely undetermined. If our goal is to identify trends, then the available methodologies are more than effective. If, on the other hand, our goal is to push the boundaries of our knowledge and attempt to identify minorities, then these same methodologies are rather ineffective, as they determine a concrete risk for circular arguments: simply put, the error margin on sex determinations produces a bias in gender determinations which, in turn, generates further error in sex determinations. One way to escape this circularity can be to encourage scientific debate between archaeologists and bioanthropologists and promote the extensive publication of osteological data and analytical methods. Moreover, substantial investment in independent methods of sex determination is necessary. New methodologies such as aDNA and proteomics will hopefully soon become a standard practice for biological sex determinations, not in substitution of, but in addition to traditional osteological methods.
Acknowledgements
We thank Bisserka Gaydarska, Nils Müller-Scheeßel, Lorenz Rahmstorf and Thomas Terberger for their precious comments and critiques on earlier versions of this paper. E.P. designed research; E.P. & N.I. performed research, analysed data and wrote the paper. The authors confirm that all data generated or analysed during this study are included in this article.
Supplementary material
Online material may be found at https://doi.org/10.1017/S0959774323000082.
SI1: General description of sites included in the analysis, including details regarding the ways in which sex and gender data were generated.
SI2: Database of all single burials included in the analysis, including the sex, gender and ag -data, and the bibliography used.