We examined developmental programming studies that reported sex-specific effects published between 2012 and 2014, and examined whether the authors reported a statistical approach to explicitly test whether the effect of treatment differed between the sexes, for example, a sex by treatment interaction term. Less than half of the studies that reported sex-specific effects described explicitly testing whether effects were indeed sex-specific; in most cases, an effect was considered ‘sex-specific’ if it was significant in one sex but not the other. This is not a robust approach, since significance in one sex and lack of significance in the other sex does not imply a significant difference between the sexes. However, sample size often limits statistical power to detect interactions. We suggest that if the effect is significant in only one sex, but the interaction term is not significant, alternative solutions would be to present the confidence intervals for the effect size for each sex, or using Bayesian approaches to calculate the probability that the effect sizes differ between the sexes. We present a simple example of a Bayesian analysis to illustrate that this approach is reasonably easy to implement and interpret.