The earliest twin studies (e.g. Merriman, Reference Merriman1924), conducted before World War II, were based on small samples that were studied in person by the investigator. Zygosity could be determined based on either clinical impression or blood groups. It was only when large twin registries were established in Scandinavia that it became necessary to diagnose zygosity remotely using self-report questionnaires. The first systematic approach to the problem was undertaken in the Swedish Twin Register (STR; Magnusson et al., Reference Magnusson, Almqvist, Rahman, Ganna, Viktorin, Walum and Lichtenstein2013), who asked participants whether they were as ‘lika som bär’ (alike as berries). This is why the logo of the STR is a pair of cherries (‘korsbar’ in Swedish).
English versions of the Swedish questionnaire translated the expression as ‘alike as two peas in a pod’, and the peas in a pod question have in the years since become the centerpiece of zygosity questionnaires, which are often known as ‘peas in a pod questionnaires’. Although no universal standard for such questionnaires has ever emerged, the item about peas is usually combined with a series of questions about whether the twins are confused by parents, family members and acquaintances. Many studies have demonstrated that self-report questionnaires of this kind can make accurate decisions about zygosity when validated against blood markers or genotyping (accuracy rate ranges from 92.4% to 98.8%; e.g. Eisen et al., Reference Eisen, Neuman, Goldberg, Rice and True1989; Forsberg et al., Reference Forsberg, Goldberg, Sporleder and Smith2010; Jackson et al., Reference Jackson, Snieder, Davis and Treiber2001; Jarrar et al., Reference Jarrar, Ward, Mangino, Cherkas, Gill, Gillham-Nasenya and Spector2018; Magnus et al., Reference Magnus, Berg and Nance1983; Magnusson et al., Reference Magnusson, Almqvist, Rahman, Ganna, Viktorin, Walum and Lichtenstein2013; Ohm Kyvik & Derom, Reference Ohm Kyvik and Derom2006; Reed et al., Reference Reed, Plassman, Tanner, Dick, Rinehart and Nichols2005; Song et al., Reference Song, Lee, Lee, Lee, Lee, Hong, Han and Sung2010). There is no universal standard for these items, and twin researchers and/or registries have used various forms of these collection of items to assess zygosity. In this article, we refer to our particular version of the questionnaire as the Two Peas Questionnaire (TPQ).
It is somewhat surprising that no systematic examination of the psychometrics of the TPQ has ever been conducted. In fact, the questionnaire is more than just a simple list of questions that can be used with a cutoff to diagnose zygosity; it is a psychological measurement instrument, designed to measure self-reported subjective impressions of similarity and confusability. The validity of the questionnaire as a tool for classification is closely tied to its measurement properties.
There are several reasons to expect that the psychometrics of the TPQ and its application to classification would be less than perfectly straightforward. First, the questionnaire is by design administered to disparate groups of individuals, that is, monozygotic (MZ) and dizygotic (DZ) twins, who might be expected to have different reactions to questions about their similarity and confusability. Second, there is an asymmetry in the way biological differences reflect on zygosity; even small differences are sufficient to demonstrate that a pair of twins is DZ, whereas a high degree of similarity is not sufficient to demonstrate that a pair is MZ. For example, twin pairs with different eye colors are almost certainly DZ, but pairs with the same eye color are not certain to be MZ. This asymmetry leads to an expectation of a difference in the distribution of responses to the TPQ in MZ and DZ twins. When the questionnaire is used as a classification instrument, it will usually be the case that prior probabilities favor a pair being MZ. Identical twins are often easier to ascertain within twin samples, but even if this is not the case in a particular sample, opposite-sex twins will be DZ twins and can be classified without the use of the questionnaire. Finally, there is reason to expect that responses to the questionnaire will vary according to age. Both classic (Scarr & McCartney, Reference Scarr and McCartney1983) and more recent (Beam & Turkheimer, Reference Beam and Turkheimer2013) analyses show that twins become more different as they age, and that DZ pairs do so more rapidly than MZ pairs.
We report a series of psychometric and classificatory analyses in a large sample of twins who have been administered a TPQ, and a smaller subsample who have been genotyped to provide a biological criterion for zygosity. We estimate item factor analysis (IFA) parameters for the psychometric properties of the questionnaire in the MZ and DZ groups and use them to identify differential item functioning (DIF) across groups. We then estimate the distributions of the latent similarity parameters in the two groups and explore several classification models based on the IFA model and methods based on latent class analysis (LCA).
Study 1
The primary goal of study 1 was to examine the item parameters of the TPQ among a sample of same-sex adult twin pairs with DNA-based zygosity. We used IFA models to examine potential DIF in the TPQ items between MZ and DZ twin pairs. IFA models describe the association between the latent trait level (i.e. underlying trait of being identical) and item scores (i.e. scores on the TPQ), allowing DIF analyses that are not affected by potential differences in the latent trait distributions across groups (Embretson & Reise, Reference Embretson and Reise2000).
Methods
Participants
The current study utilized data from 753 same-sex adult twin pairs (33.9% men, 66.1% women) enrolled in the Washington State Twin Registry (WSTR) with DNA-based zygosity (72.4% MZ, 27.6% DZ). The WSTR is a community-based registry of twin pairs primarily recruited through Washington State Department of Licensing records. Details regarding the recruitment procedures of the WSTR and additional information are reported elsewhere (Duncan et al., Reference Duncan, Avery, Strachan, Turkheimer and Tsang2019). Participants in this study were recruited into the WSTR between 2002 and 2014.
DNA Determination of Zygosity
DNA was extracted from twins using either whole blood or saliva (buccal cells). Zygosity was determined by using either the AmpFlSTR® Identifiler® Plus PCR Amplification Kit or the PowerPlex® 16 HS System, per manufacturer’s instructions. The two methods are nearly identical (Hannelius et al., Reference Hannelius, Gherman, MäkeläLindstedt, Lindstedt, Zucchelli, Lagerberg and Lindgren2007; Yang et al., Reference Yang, Tzeng, Tseng and Huang2006). These kits are short tandem repeat multiplex assays that amplify 15 tetranucleotide repeat loci and the amelogenin sex-determining marker in a single PCR amplification. Thirteen of the required loci (CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51 and D21S11) for the Combined DNA Index System are included (Budowle et al., Reference Budowle, Moretti, Baumstark, Defenbaugh and Keys1999). Two additional loci, D2S1338 and D19S433, are included. The combination of these 15 loci along with the amelogenin marker is consistent with zygosity tests conducted elsewhere (Yang et al., Reference Yang, Tzeng, Tseng and Huang2006). When comparing the twins with one another, DZ twins match on 25%−75% of the sites, whereas MZ twins match on 100% of the sites. Zygosity determination for twin pairs in this study was performed between 2009 and 2017.
Two Peas Questionnaire
Five items about childhood similarity were included in the WSTR enrollment survey. The ‘two-peas’ item, ‘When you were children, were you and your twin as alike as two peas in a pod or of ordinary family resemblance?’, has been used by twin registries for many years and is a reliable predictor of zygosity (Eisen et al., Reference Eisen, Neuman, Goldberg, Rice and True1989; Magnus et al., Reference Magnus, Berg and Nance1983; Reed et al., Reference Reed, Plassman, Tanner, Dick, Rinehart and Nichols2005; Sarna et al., Reference Sarna, Kaprio, Sistonen and Koskenvuo1978). Four mistakenness items ask, ‘When you were children, did the following people (parents, other relatives, teachers, and strangers) have difficulty telling you and your twin apart?’ (Buchwald et al., Reference Buchwald, Herrell, Ashton, Belcourt, Schmaling and Goldberg1999; Eisen et al., Reference Eisen, Neuman, Goldberg, Rice and True1989; Magnus et al., Reference Magnus, Berg and Nance1983; Reed et al., Reference Reed, Plassman, Tanner, Dick, Rinehart and Nichols2005). There are four response categories for each of the mistakenness items (1 = never confused, 2 = rarely confused, 3 = sometimes confused, 4 = always confused). For ease of interpretation, these four mistakenness items are subsequently referred to as ‘parents’, ‘relatives’, ‘teachers’ and ‘strangers’, respectively.
Statistical Analysis
We used IFA to estimate the item parameters of the 10 items (i.e. 5 items from each twin, 10 items per twin pair) in the TPQ. The 10 items were operationalized as indicators of the underlying latent trait (θ) of being similar and easily confused (i.e. more MZ-like), with higher levels reflecting stronger endorsement of being identical, whereas lower levels reflecting endorsement of being less identical. Considering that the items in the TPQ consist of ordinal response options, IFA is an alternative to the common linear factor model when item responses are categorical in nature (Wirth & Edwards, Reference Wirth and Edwards2007). One factor-loading parameter was estimated for each of the five items (λ1 – λ5). One threshold parameter (τ1) was estimated for the dichotomous ‘two peas’ item, and three threshold parameters (τ21, τ22, τ23, ... τ51, τ52 and τ53) were estimated for each of the remaining four items, each with four response categories. All factor loadings and threshold parameters were constrained to be the same within twin pairs, and item covariances within twin pairs were allowed to differ between MZ and DZ twin pairs. Participants were designated as MZ and DZ using DNA-based zygosity.
First, we fit a ‘free-baseline’ model in which the factor loadings of a reference item (our selection of the reference item is described below) were fixed to 1, and the threshold parameters were constrained to be equal between MZ and DZ pairs (Stark et al., Reference Stark, Chernyshenko and Drasgow2006). The factor loadings and threshold parameters for the remaining four items were allowed to differ between MZ and DZ pairs. In order to detect items with DIF, we fit four constrained models where, in addition to the reference item, factor loadings and threshold parameters of each item, one at a time, were simultaneously constrained to be equal between MZ and DZ twins. Items with DIF were identified by comparing the changes in chi-square statistics. To control for type I errors due to multiple comparisons, a Bonferroni-corrected critical p value (.05/4 = .0125) was used.
To identify the reference item(s), we fit a fully constrained model in which the factor loadings and threshold parameters of all items were constrained to be equal between MZ and DZ pairs. Next, we fit a series of augmented models by freeing the factor loadings and threshold parameters one item at a time. The item(s) that did not result in a statistically significant increase in model fit when the parameters were allowed to differ between MZ and DZ twins was identified as the reference item(s) (Stark et al., Reference Stark, Chernyshenko and Drasgow2006). To control for type I errors due to multiple comparisons, a Bonferroni-corrected critical p value of (.05/5 = .01) was used.
Model fit indices reported include the comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA) and standardized root mean squared residual. Descriptive statistics were performed using R version 3.5.3 (R Development Core Team, 2015), and IFA models were performed using Mplus version 8.1 (Muthén & Muthén, Reference Muthén and Muthén2012).
Results
Descriptive Statistics
Of the 753 pairs of same-sex twins in this study, there were 545 (72.4%) MZ and 208 (27.6%) DZ twin pairs as determined by genotyping. Selected demographic characteristics of twin pairs in this study are presented in Table 1.
MZ, monozygotic twins; DZ, dizygotic twins.
Descriptive statistics of the five TPQ items are shown in Table 2. For the ‘two peas’ item, most of the MZ twins (93%) reported that they were ‘as alike as two peas in a pod’, whereas the majority of the DZ twins (84%) responded that they were ‘of ordinary family resemblance’ when they were children. Concordance rates of the ‘two peas’ item are presented in Supplementary Table 1. For the four mistakenness items, larger proportions of MZ twins reported being confused by teachers and strangers (68% and 91% always confused, respectively) than by parents and other relatives (12% and 49% always confused, respectively) when they were children. On the other hand, small proportions of DZ twins reported being confused by teachers and strangers (11% and 20% always confused, respectively), and even smaller proportions reported being confused by parents and other relatives (3% and 6% always confused, respectively).
MZ, monozygotic twins; DZ, dizygotic twins; unknown, twin pairs with no DNA-based zygosity.
Note: Two peas: When you were children, were you and your twin as alike as two peas in a pod or of ordinary family resemblance? Parents: When you were children, how often did your parents had difficulty telling you apart? Relatives: When you were children, how often did other relatives had difficulty telling you apart? Teachers: When you were children, how often did teachers had difficulty telling you apart? Strangers: When you were children, how often did strangers had difficulty telling you apart?
Differential Item Functioning
Identify reference item
In order to identify the reference item, we fit a fully constrained model in which the factor loadings and threshold parameters of all items were constrained to be equal between MZ and DZ twins. The model was of acceptable fit (CFI = .980, TLI = .975, RMSEA = .067, 90% CI = .055, .078, SRMS = .060). Next, we fit a series of augmented models in which, one item at a time, the factor loadings and threshold parameters were simultaneously allowed to differ between MZ and DZ twins. Chi-square tests showed that there was no statistically significant improvement in model fit when the parameters for the ‘peas’ or ‘strangers’ item were allowed to differ between MZ and DZ twin pairs (Supplementary Table 2). Considering that the change in model fit was the smallest when the parameters for the ‘strangers’ item differed between MZ and DZ twins, the ‘strangers’ item was used as the reference item in the subsequent analyses.
Test for DIF
To test for DIF among self-report zygosity items, we first fit a ‘free baseline’ model where the factor loadings of the ‘strangers’ item (the reference item identified above) were fixed to 1, and the threshold parameters were constrained to be equal between MZ and DZ. The factor loadings and threshold parameters for the remaining four items — ‘two peas’, ‘parents’, ‘relatives’ and ‘teachers’ — were allowed to differ between MZ and DZ pairs. As shown in Table 3, the model fit was good (CFI = .990, TLI = .985, RMSEA = .052, 90% CI = .038, .066, SRMS = .054) and was a better fit than the fully constrained model, χ2(14) = 70.107, p < .001.
MZ, monozygotic twins; DZ, dizygotic twins; SE, standard error; RMSEA, root mean square error approximation; CFI, comparative fit Index; TLI, Tucker-Lewis index; SRMR, standardized root mean square residual. Note: Only parameters of one twin are shown here, as all item parameters are constrained to be the same within twin pairs. ‘Strangers’ is used as the referent item, with the factor loadings fixed to 1 and threshold parameters constrained to be equal between MZ and DZ twins.
Next, we fit four constrained models in which, one item at a time, in addition to the ‘strangers’ item, factor loadings and threshold parameters of each item were simultaneously constrained to be equal between MZ and DZ pairs. The model fit of these constrained models was compared against the ‘free-baseline’ model using chi-square tests (Supplementary Table 2). With the exception of the ‘two peas’ item, there was a statistically significant decrease in model fit when the item parameters were constrained to be equal between MZ and DZ pairs, suggesting DIF between MZ and DZ twins in the ‘parents’, ‘relatives’ and ‘teachers’ items.
We illustrate the similar item functioning (i.e. no DIF) of ‘two peas’ for MZ and DZ twins using category response curves (CRCs). As shown in Figure 1, the probabilities that MZ and DZ twins responded they were ‘two peas in a pod’ or ‘of ordinary resemblance’ were similar. For example, at θ = 0 (i.e. the average latent trait level of similarity and confusability), there was a 98.8% chance that MZ twins responded they were ‘two peas in a pod’, but only 1.2% chance that they identified themselves as ‘of ordinary resemblance’. At the same latent trait level (θ = 0), DZ twins were also more likely to respond that they were ‘two peas in a pod’ (90.7%) and less likely to identify themselves as of ‘ordinary resemblance’ (9.3%).
DIFs of the other three items are illustrated using CRCs (Supplementary Figure 1). Among twins with similar levels of the latent trait of being identical, DZ twins were more likely to respond that other people had difficulty telling them apart than MZ twins. For instance, at θ = 0, MZ twins were likely to respond that they were ‘rarely confused’ (38.4%) and ‘sometimes confused’ (31.6%) by parents, whereas DZ twins were more likely to respond that they were ‘always confused’ (43.1%) by parents. Likewise, MZ twins at θ = 0 were more likely to respond that they are ‘always confused’ (48.2%) or ‘sometimes confused’ (45.6%) by relatives, whereas DZ twins at θ = 0 were most likely to respond that they are ‘always confused’ by relatives.
Discussion
In study 1, we estimated the item parameters for the TPQ items using IFA models and examined whether there was DIF between MZ and DZ twin pairs. Results showed no loss of model fit when the ‘two peas’ item parameters were constrained to be equal across zygosity, suggesting the ‘two peas’ item functions similarly for MZ and DZ twins. Our analyses showed DIF in three of the mistakenness items on the TPQ, ‘parents’, ‘relatives’ and ‘teachers’. For these items, the probabilities of responses may differ not only by individuals’ underlying trait of being similar and confusable (i.e. more MZ-like or more DZ-like) but also by their actual zygosity (i.e. true MZ or true DZ twins, based on genotyping).
When twin pairs’ responses are used to classify twins with unknown zygosity into MZ or DZ pairs, it is possible that DIF in TPQ items may affect which twin pairs are assigned as MZ or DZ twin pairs. We followed up the current findings with a second study in which we explored several classification methods for zygosity assignment to establish an effective method to determine zygosity assignments among twin pairs that have not yet been genotyped.
Study 2
In study 2, we aimed to investigate three classification methods used to assign twins into MZ and DZ pairs, based on their responses on the TPQ. Zygosity of twin pairs was classified based on their unit-weighted pair zygosity sum (PZS) score, item response probabilities from an IFA model and item response probabilities from a LCA model.
Methods
Participants
Twin pairs included in this study were the 753 twin pairs with DNA-based zygosity described in study 1, as well as 6368 same-sex adult twin pairs (35.9% men, 64.1% women) enrolled in the WSTR without DNA-based zygosity (Table 1). The recruitment procedures of these twin pairs were like those described in study 1.
Two Peas Questionnaire
The TPQ described in study 1 was also used in study 2.
Statistical Analysis
Unit-weighted PZS scores
Using the twins’ responses on the TPQ, we created a unit-weighted PZS score for each twin pair. The four mistakenness questions were first rescaled to the same scale as the dichotomous two peas item (0 = 0; 1 = .33, 2 = .67, 3 = 1). The PZS scores were computed by summing the scores of the 10 items (i.e. 5 items per twin) in the TPQ. PZS score ranged from 0 to 10, with higher scores reflecting higher degrees of similarity and confusability. For twin pairs with missing items, PZS scores were rescaled by:
Probabilities of zygosity (MZ/DZ) from PZS scores
We fit a logistic regression model to estimate the probabilities of zygosity (MZ/DZ) using the PZS scores among twin pairs with DNA-based zygosity. To determine the optimum PZS cutoff value to classify twin pairs into MZ and DZ twin pairs, we performed cross-validation using 75% of the data randomly sampled as the training set, and the remaining 25% of the data used as the testing set. The optimum cutoff was the PZS value with the maximum overall classification accuracy rate (i.e. real MZ/DZ pairs correctly classified as MZ/DZ pairs). This procedure was repeated 1000 times. The final PZS cutoff value was determined by taking the average of the PZS cutoffs from the 1000 cross-validations. Subsequently, twin pairs were assigned as MZ and DZ twin pairs using the final PZS cutoff value; this zygosity assignment was referred to as the ‘PZS zygosity’.
IFA model for MZ and DZ twins
We used IFA to estimate the item parameters of the 10 items (i.e. 5 items per twin) in the TPQ, separately for MZ and DZ twin pairs with DNA-based zygosity. The 10 items were operationalized as indicators of the underlying latent trait (θ) of similarity and confusability (i.e. more identical or MZ-like), with higher levels reflecting stronger endorsement of similarity and confusability, whereas lower levels reflecting weaker endorsement of the latent trait. One factor loading parameter was estimated for each of the five items (λ1 – λ5). One threshold parameter (τ1) was estimated for the dichotomous ‘two peas’ item, and three threshold parameters (τ21, τ22, τ23, ... τ51, τ52 and τ53) were estimated for each of the remaining four items, each with four response categories. To estimate all factor loadings and threshold parameters, the mean and variance of the latent zygosity factor was fixed to 0 and 1, respectively. All factor loadings and threshold parameters were constrained to be the same for corresponding items within twin pairs, and residual item covariances within twin pairs were estimated. The IFA model was fit separately for MZ and DZ twin pairs.
Probabilities of zygosity (MZ/DZ) from IFA model
Using the estimated item parameters from the IFA models, the probabilities of each response category for each item were computed across the latent trait distribution using the Gaussian quadrature procedure (Embretson & Reise, Reference Embretson and Reise2000). We computed the probability of getting a particular response vector ( $${\mathop X\limits_ - _p}$$ ) in a random sample by integrating the IFA model estimation over the range of latent trait distribution:
where gθ is the probability density of the latent trait θ.
As the item parameters were estimated separately for MZ and DZ twin pairs, two sets of probabilities were computed, one for MZ and one for DZ twins. We computed the response pattern likelihoods for each twin pair based on their responses on the TPQ. The probability of a particular response pattern was obtained by multiplying the response likelihoods for each of the 10 items. For ease of computation, likelihoods were log-transformed into log-likelihoods. As such, the log-likelihood of a particular response pattern was the sum of the log-likelihoods of the 10 items.
For each twin pair, we obtained the log-likelihoods of the pair being MZ ( $$ln{L_{MZ}}$$ ) or DZ (lnL DZ) twins. The log-likelihoods are monotonic transformations of the probabilities of the pair being an MZ or DZ pair. By taking the difference between the two log-likelihoods (ΔlnL = lnL MZ − lnL DZ), twin pairs with larger ΔlnL had higher probabilities of being MZ (i.e. lnL MZ > lnL DZ), suggesting they had higher levels of similarity and confusability (i.e. more likely to be MZ twins). Those with smaller ΔlnL (i.e. lnL DZ > lnL MZ) had higher probabilities of being DZ twins. To determine the optimum cutoff value for zygosity classification, we computed the overall classification accuracy rate at each ΔlnL. The optimum cutoff value was determined at the ΔlnL with the maximum accuracy rate; if the maximum accuracy rate occurred at multiple ΔlnL, we took the average of all ΔlnLs as the final cutoff value. Twin pairs were assigned as MZ or DZ twins using the final cutoff value, and we referred to this zygosity assignment as the ‘IFA zygosity’.
Latent class analysis
LCA (McCutcheon, Reference McCutcheon1987) is a type of mixture modeling technique that aims to describe the heterogeneity in a population by identifying substantively meaningful subgroups. These otherwise unobserved subgroups, or latent classes, are characterized by similar patterns of responses on measured categorical indicators (Collins & Lanza, Reference Collins and Lanza2010). Two sets of parameters are estimated from LCA, the latent class membership probabilities and the item response probabilities. The latent class membership probabilities represent the likelihood a participant or a response pattern belongs to the latent class. The probabilities of these latent class memberships sum to 1, within rounding error. The item response probabilities refer to the likelihood of each response category to each item for each latent class. We used LCA to estimate the item parameters of the 10 items (i.e. 5 items per twin) in the TPQ among twin pairs without DNA-based zygosity.
Probabilities of zygosity (MZ/DZ) from LCA model
Using the estimated item response probabilities from the LCA models, the response pattern likelihoods for each twin pair were computed, following similar procedures outlined above for those from the IFA models. The corresponding zygosity assignments were referred to as ‘LCA zygosity’.
Classification accuracy
We evaluated the classification accuracy of the three zygosity assignments — (1) PZS zygosity, (2) IFA zygosity and (3) LCA zygosity — among the twin pairs with DNA-based zygosity. For each zygosity assignment, we computed the classification accuracies for MZ and DZ twin pairs (i.e. the proportion of true MZ/DZ twins correctly classified as MZ/DZ twins). As the item response probabilities for the IFA zygosity assignments were estimated in the same sample (twins with DNA-based zygosity), cross-validation was not possible. To obtain out-of-sample estimates of classification accuracy, the item response probabilities for the LCA zygosity assignments were estimated in the sample without DNA-based zygosity and subsequently validated in the sample with DNA-based zygosity.
Classification consistency
As the true zygosity of twin pairs without DNA zygosity was unknown, we evaluated the extent to which the three zygosity assignments were consistent across one another. We computed the proportion of twin pairs that was consistently assigned as MZ or DZ twin pairs, as well as the proportion of twin pairs that did not have consistent zygosity assignment across the three methods. Reliability across zygosity assignment was evaluated using Fleiss’ kappa (Fleiss, Reference Fleiss1971).
Results
Descriptive Statistics
Selected demographic characteristics and descriptive statistics of the five self-report zygosity items are shown in Tables 1 and 2, respectively.
Zygosity assignments from PZS scores
Among twin pairs with DNA-based zygosity, the average PZS scores were substantially higher in MZ pairs, and the variance of PZS scores was higher in DZ pairs: 7.9 (SD = 1.5) and 2.4 (SD = 2.2) for MZ and DZ pairs, respectively. The average PZS score among twin pairs without DNA-based zygosity was 5.9 (SD = 3.4). The distribution of PZS scores among twin pairs with and without DNA-based zygosity is illustrated in Figure 2 in which the scores of the DZ pairs are more variable and more skewed in the direction of similarity.
Using data from twin pairs with DNA-based zygosity, the optimum cutoff PZS value at which the highest classification accuracy rate was obtained for each of the 1000 cross-validated logistic regression models is illustrated in Supplementary Figure 2. The average optimum cutoff value was at PZS = 4.7 (SE = .03). PZS zygosity was obtained by assigning twin pairs with PZS ≥ 4.7 as MZ twins and those with PZS < 4.7 as DZ twins (Tables 4 and 5).
MZ, monozygotic twins; DZ, dizygotic twins; PZS zygosity, zygosity assignment based on twin pairs’ pair zygosity sum (PZS) scores; IFA zygosity, zygosity assignment based on item factor analysis (IFA) model; LCA zygosity, zygosity assignment based on latent class analysis (LCA) model.
MZ, monozygotic twins; DZ, dizygotic twins; PZS zygosity, zygosity assignment based on twin pairs’ pair zygosity sum (PZS) scores; IFA zygosity, zygosity assignment based on item factor analysis (IFA) model; LCA zygosity, zygosity assignment based on latent class analysis (LCA) model.
Note: aThe percentages did not add up to 100% as 164 (2.6%) twin pairs were not consistently classified as MZ or DZ twins.
Zygosity assignments from IFA and LCA models
IFA item response probabilities were obtained from the sample with DNA-based zygosity (Supplementary Table 5). For each twin pair, the difference between the response pattern likelihoods of being MZ and DZ was computed. The overall maximum classification accuracy rate (93.5%) was obtained at ΔlnL IFA = −4.5 and −4.4; thus, we took the average of these values and determined the optimum cutoff value at ΔlnL IFA = −4.45 (Supplementary Figure 3). The distribution of ΔlnL IFA obtained from the IFA model is illustrated in Figure 3. Descriptive statistics of the zygosity assignment based on IFA item response probabilities (‘IFA zygosity’) are presented in Tables 4 and 5.
Similarly, LCA item response probabilities were obtained from the sample without DNA-based zygosity (Supplementary Table 6). The difference between the response pattern likelihoods of being MZ and DZ was computed. The overall maximum classification accuracy rate (93.8%) was obtained at ΔlnL LCA = 1.0 and 1.4; thus, we took the average of these values and determined the optimum cutoff value at ΔlnL LCA = 1.2 (Supplementary Figure 4). The distribution of ΔlnL LCA obtained from the LCA model is illustrated in Figure 4. Descriptive statistics of the zygosity assignment based on LCA item response probabilities (‘LCA zygosity’) are presented in Tables 4 and 5.
Classification accuracy
We compared the three zygosity assignments against the DNA-based zygosity among the twin pairs with DNA-based zygosity. The overall accuracy ranged from 92.7% (PZS zygosity) to 93.6% (LCA zygosity). Among MZ twins, 94.7% (PZS zygosity) to 95.6% (IFA zygosity) were correctly assigned as MZ. The classification accuracy was lower among DZ twins, with 87.5% (PZS zygosity) to 90.4% (LCA zygosity) correctly assigned as DZ. Fleiss’ kappa = .947 indicated excellent consistency across the three zygosity assignments; 512 (93.9%) MZ pairs were consistently correctly classified as MZ, and 178 (85.6%) DZ pairs were consistently correctly classified as DZ.
Classification consistency
Among the twin pairs without DNA-based zygosity, 66.8% (LCA zygosity) to 68.4% (IFA zygosity) were classified as MZ twins, and 31.6% (IFA zygosity) to 33.2% (LCA zygosity) were classified as DZ twins. The three zygosity assignments were highly consistent, with 6203 (98.5%) twin pairs consistently assigned as MZ (4212 pairs; 66.2%) and DZ (1991 pairs; 32.3%) twins, respectively. Fleiss’ kappa = .961 indicated excellent consistency across the three zygosity assignments.
Discussion
In study 2, we examined zygosity assignments predicated on three classification methods. Among twin pairs with DNA-based zygosity, classification accuracies were consistently high. Zygosity assignments were highly consistent among twin pairs with and without DNA-based zygosity. Although the accuracies of zygosity assignment were improved when using more sophisticated classification methods (i.e. IFA and LCA), the difference was minimal (<1%; <10 twin pairs) as compared to zygosity assignment from a simple logistic regression model.
We noted that among the 753 twin pairs with DNA-based zygosity, 39 pairs (23 MZ, 16 DZ) were consistently misclassified by all three methods. As illustrated in Supplementary Figure 5, the response patterns of these twin pairs were not consistent with those of their respective zygosity. Our classification methods depended on participants’ response patterns to assign zygosity; twin pairs who indicated high levels of similarity and confusability were more likely to be MZ, and those who indicated the opposite were more likely to be DZ. Thus, MZ twins who reported to have low levels of similarity and confusability (e.g. of ordinary resemblance, never or rarely confused by parents and relatives) would be classified as DZ twins, and DZ twins who reported to have high levels of similarity and confusability (e.g. two peas in a pod, sometimes and always confused by others) would be assigned as MZ pairs. Ultimately, for MZ pairs describing themselves as dissimilar or DZ pairs describing themselves as similar, the misclassification that results is inherent in the data and not a modifiable consequence of the psychometric or classificatory models. Although MZ pairs are, typically, more alike, it is possible that some pairs have distinct differences (e.g. birthmarks or a different haircut) that make them less easily confused.
Among twin pairs with no DNA-based zygosity, 164 pairs (2.6% of the current sample) were assigned different zygosity by the three classification methods. The response patterns of these twin pairs did not reflect those typical of MZ or DZ pairs (Supplementary Figure 6), rendering it difficult to assign consistent zygosity across methods. We plotted the estimated parameters from the three classification methods in Supplementary Figure 7. For twin pairs who were consistently assigned as MZ or DZ pairs, the estimated parameters were highly correlated (all rs > .90). However, the associations among the three methods ranged from none to strong for twin pairs who received inconsistent zygosity assignment. Considering that items on the TPQ reflect twin pairs’ subjective perception of their similarity and confusability, it is not an infallible method of assigning zygosity in twin pairs who have yet been genotyped.
As studies have suggested that twin pairs, especially DZ pairs, become more different as they age (Beam & Turkheimer, Reference Beam and Turkheimer2013; Scarr & McCartney, Reference Scarr and McCartney1983), we further explored the extent to which similarity and confusability differ as a function of age among the sample of twins with DNA-based zygosity. To estimate the association between age at the time the questionnaire was completed and the underlying latent trait (θ) of similarity and confusability, age was regressed onto the latent variable of similarity and confusability in the IFA model. Results showed a negative relation between the age and the latent trait of confusability (r = −.013 and −.139; p = .758 and .046 for MZ and DZ pairs, respectively). Although the correlation coefficients were not significantly different (Wald test: χ2(1) = 3.239, p = .072), our findings suggested that similarity decreases with age and more so among DZ pairs than MZ pairs. This will be an interesting finding to pursue in future studies, especially in younger twins whose appearance may be changing more rapidly.
Overall Discussion
In this article, we examined the item properties of the TPQ items among a sample of same-sex twin pairs with DNA-based zygosity, and a larger sample of pairs for which DNA-based zygosity was unknown. We evaluated the TPQ both as a psychometric instrument for the measurement of the construct of ‘confusability’ and as a classification tool for the identification of MZ and DZ pairs. With the exception of the dichotomous ‘two peas’ item, three of the mistakenness items showed DIF. MZ and DZ twin pairs may differ in their response patterns on these items, even if they endorse similar latent traits of similarity and confusability. Upon examining three methods to determine zygosity of same-sex twin pairs, we found that the use of unit-weighted PZS scores was sufficient to provide zygosity assignment with high (>90%) overall classification accuracy. The distributions of PZS scores were markedly different in MZ and DZ pairs, not only in their mean but also in their variability and skew. Finally, we conclude that despite the possibilities of misclassification, the TPQ can be regarded as a generally accurate method to determine zygosity among twin pairs who have not been genotyped. The TPQ is somewhat more accurate in the identification of MZ than DZ pairs, for reasons that are inherent in the nature of twin-pair similarity; there are strong limits on the dissimilarity of MZ pairs, whereas DZ pairs can often be highly similar.
A few limitations of this study should be noted. First, the majority of participants in the WSTR self-identified as Caucasian, which may limit the extent to which our findings can be generalized to other racial and ethnic groups. We urge researchers to replicate our findings using data from twin registries with more racial and ethnic diversity. Second, the TPQ was administered upon participants’ enrollment to the WSTR. Given the cross-sectional nature of the data, we were unable to examine potential changes in TPQ responses over time (e.g. whether twin pairs are more or less likely to claim similarity as they age), and whether such age-related changes may be larger among DZ than MZ twins. Third, as twin pairs are registered with the WSTR on a volunteer basis, twin pairs who consider themselves to be more similar to one another may be more likely to self-select to participate in twins research. It is possible that DZ twins in the current sample are those who identify as being more alike (i.e. more like twins), which might have biased the likelihood estimates in the current study.
In summary, the TPQ is a generally accurate but by no means infallible method of diagnosing zygosity in twins who have not been genotyped. Even in an era when easier access to DNA has made it possible to diagnose twin zygosity directly without resorting to self-report questionnaires, the ongoing use of large population twin datasets will continue to necessitate the peas questionnaire. Understanding its psychometric and predictive properties will help researchers use this well-worn, yet still useful, tool more effectively.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/thg.2020.64.