References

Michael P. Fay; Erica H. Brittain

doi:10.1017/9781108528825.023

References

Published online by Cambridge University Press: 17 April 2022

Michael P. Fay and

Erica H. Brittain

Show author details

Michael P. Fay: Affiliation:
National Institute of Allergy and Infectious Diseases
Erica H. Brittain: Affiliation:
National Institute of Allergy and Infectious Diseases

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Statistical Hypothesis Testing in Context
Reproducibility, Inference, and Science
, pp. 404 - 419

DOI: https://doi.org/10.1017/9781108528825.023 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Aalen, O., Borgan, O., and Gjessing, H. (2008), Survival and Event History Analysis, New York: Springer.Google Scholar

Aalen, O. O., Cook, R. J., and Røysland, K. (2015), “Does Cox analysis of a randomized survival study yield a causal treatment effect?” Lifetime Data Analysis, 21, 579–593.CrossRef Google Scholar PubMed

Agresti, A. (2013), Categorical Data Analysis, 3rd ed., Hoboken, NJ: John Wiley & Sons.Google Scholar

Agresti, A. and Caffo, B. (2000), “Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures,” The American Statistician, 54, 280–288. [66]Google Scholar

Agresti, A. and Min, Y. (2001), “On small-sample confidence intervals for parameters in discrete distributions,” Biometrics, 57, 963–971. [114, 121]Google Scholar

Agresti, A. and Min, Y.. (2002), “Unconditional small-sample confidence intervals for the odds ratio,” Biostatistics, 3, 379–386. [114, 121]Google Scholar

Ali, M. M. and Sharma, S. C. (1996), “Robustness to nonnormality of regression F-tests,” Journal of Econometrics, 71, 175–205. [274]Google Scholar

Andersen, P., Borgan, O., Gill, R., and Keiding, N. (1993), Statistical Models Based on Counting Processes, New York: Springer. [304, 306, 308, 315, 316, 317, 323, 325]Google Scholar

Andersen, P. K. (2005), “Censored data,” Encyclopedia of Biostatistics, 2nd ed., 1, 722–727. [324]Google Scholar

Anderson, J. M., Samake, S., Jaramillo-Gutierrez, , et al. (2011), “Seasonality and prevalence of Leishmania major infection in Phlebotomus duboscqi Neveu-Lemaire from two neighboring villages in central Mali,” PLoS Neglected Tropical Diseases, 5, e1139. [64]Google Scholar

Anderson-Bergman, C. (2017), “icenReg: regression models for interval censored data in R,” Journal of Statistical Software, 81, 1–23. [324]CrossRef Google Scholar

Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996), “Identification of causal effects using instrumental variables,” Journal of the American Statistical Association, 91, 444–455. [287, 297, 300]Google Scholar

Baggerly, K. A., Morris, J. S., and Coombes, K. R. (2004), “Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments,” Bioinformatics, 20, 777–785. [39]Google Scholar

Baiocchi, M., Cheng, J., and Small, D. S. (2014), “Instrumental variable methods for causal inference,” Statistics in Medicine, 33, 2297–2340. [297, 298, 300]Google Scholar

Baker, S. G. (1994), “The multinomial-Poisson transformation,” The Statistician, 43, 495–504. [198]Google Scholar

Banerjee, M. and Wellner, J. A. (2005), “Confidence intervals for current status data,” Scandinavian Journal of Statistics, 32, 405–424. [320]Google Scholar

Barber, R.F.andCandès, E. J. (2015), “Controlling the false discovery rate via knockoffs,” The Annals of Statistics, 43, 2055–2085. [273]Google Scholar

Barnhart, H. X., Haber, M. J., and Lin, L. I. (2007), “An overview on assessing agreement with continuous measurements,” Journal of Biopharmaceutical Statistics, 17, 529–569. [101]CrossRef Google Scholar PubMed

Basu, D. (1980), “Randomization analysis of experimental data, the Fisher randomization test (with discussion),” Journal of the American Statistical Association, 75, 575–595. [47]Google Scholar

Bauer, P., Bretz, F., Dragalin, V., König, F., and Wassmer, G. (2016), “Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls,” Statistics in Medicine, 35, 325–347. [352, 354, 356]Google Scholar

Bauer, P. and Köhne, K. (1994), “Evaluation of experiments with adaptive interim analyses,” Biometrics, 1029–1041. [352, 355, 356]Google Scholar

Begg, C. B. (1990), “On inferences from Wei’s biased coin design for clinical trials,” Biometrika, 77, 467–473. [34]CrossRef Google Scholar

Benjamini, Y. (2010), “Simultaneous and selective inference: current successes and future challenges,” Biometrical Journal, 52, 708–721. [239]CrossRef Google Scholar PubMed

Benjamini, Y.. (2016), “It’s not the P-values’ fault,” The American Statistician, 70, 1–2. [xi]Google Scholar

Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society: Series B (Methodological), 57, 289–300. [241]Google Scholar

Benjamini, Y. and Yekutieli, D. (2001), “The control of the false discovery rate in multiple testing under dependency,” Annals of Statistics, 1165–1188. [241, 250]Google Scholar

Beran, R. (1997), “Diagnosing bootstrap success,” Annals of the Institute of Statistical Mathematics, 49, 1–24. [192]CrossRef Google Scholar

Berger, J. O., Bernardo, J. M., and Sun, D. (2009), “The formal definition of reference priors,” The Annals of Statistics, 905–938. [392, 401]Google Scholar

Berger, J. O. and Sellke, T. (1987), “Testing a point null hypothesis: the irreconcilability of P values and evidence,” Journal of the American Statistical Association, 82, 112–122. [402]Google Scholar

Berger, R. L. and Boos, D. D. (1994), “P values maximized over a confidence set for the nuisance parameter,” Journal of the American Statistical Association, 89, 1012–1016. [114]Google Scholar

Bernardo, J. M. (1979), “Reference posterior distributions for Bayesian inference,” Journal of the Royal Statistical Society: Series B (Methodological), 41, 113–147. [392, 401]Google Scholar

Bernardo, J. M.. (2011), “Integrated objective Bayesian estimation and hypothesis testing,” Bayesian Statistics, 9, 1–68. [395, 401, 402]Google Scholar

Bickel, P. and Freedman, D. (1981), “Some asymptotic theory for the bootstrap,” Annals of Statistics, 9, 1196–1217. [192]Google Scholar

Bishop, Y., Fienberg, S., and Holland, P. (1975), Discrete Multivariate Analysis: Theory and Practice, Cambridge, MA: MIT Press. [211]Google Scholar

Blaker, H. (2000), “Confidence curves and improved exact confidence intervals for discrete distributions,” Canadian Journal of Statistics, 28, 783–798. [55, 56, 63, 75, 111]Google Scholar

Bland, J. M. and Altman, D. G. (1999), “Measuring agreement in method comparison studies,” Statistical Methods in Medical Research, 8, 135–160. [101]Google Scholar

Blyth, C. and Still, H. (1983), “Binomial confidence intervals,” Journal of the American Statistical Association, 78, 108–116. [63]Google Scholar

Boos, D. D. and Stefanski, L. (2013), Essential Statistical Inference, New York: Springer. [170, 171, 173, 175, 183, 190, 191, 192, 193]Google Scholar

Boschloo, R. (1970), “Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities,” Statistica Neerlandica, 24, 1–9. [112]Google Scholar

Box, G. E. and Cox, D. R. (1964), “An analysis of transformations,” Journal of the Royal Statistical Society: Series B (Methodological), 26, 211–252. [274]Google Scholar

Box, G. E., Hunter, J. S., and Hunter, W. G. (2005), Statistics for Experimenters: Design, Innovation, and Discovery, vol. 2, New York: Wiley-Interscience. [274]Google Scholar

Box, G. E. and Watson, G. S. (1962), “Robustness to non-normality of regression tests,” Biometrika, 49, 93–106. [274]CrossRef Google Scholar

Brazzale, A. R., Davison, A. C., and Reid, N. (2007), Applied Asymptotics: Case Studies in Small-Sample Statistics, vol. 23, Cambridge: Cambridge University Press. [256]Google Scholar

Breslow, N. (1972), “Contribution to the discussion of Cox (1972),” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34, 216–217. [308]Google Scholar

Breslow, N. and Chatterjee, N. (1999), “Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis,” Journal of the Royal Statistical Society: Series C (Applied Statistics), 48, 457–468. [307]Google Scholar

Breslow, N. E. (1996), “Statistics in epidemiology: the case-control study,” Journal of the American Statistical Association, 91, 14–28. [122]CrossRef Google Scholar PubMed

Bretz, F., Hothorn, T., and Westfall, P. (2011), Multiple Comparisons using R, Boca Raton, FL: CRC Press. [208, 209, 210, 211, 212, 244, 250, 251]Google Scholar

Bretz, F., Maurer, W., Brannath, W., and Posch, M. (2009), “A graphical approach to sequentially rejective multiple test procedures,” Statistics in Medicine, 28, 586–604. [246, 247]Google Scholar

Brillinger, D. R. (1986), “The natural variability of vital rates and associated statistics (with discussion),” Biometrics, 42, 693–734. [229]Google Scholar

Brittain, E. and Lin, D. (2005), “A comparison of intent-to-treat and per-protocol results in antibiotic non-inferiority trials,” Statistics in Medicine, 24, 1–10. [372]Google Scholar

Brittain, E. H., Fay, M. P., and Follmann, D. A. (2012), “A valid formulation of the analysis of noninferiority trials under random effects meta-analysis,” Biostatistics, 13, 637–649. [227, 369, 370]Google Scholar

Brown, B. M. and Hettmansperger, T. P. (2002), “Kruskal–Wallis, multiple comparisons and Efron dice,” Australian & New Zealand Journal of Statistics, 44, 427–438. [157, 159]Google Scholar

Brown, L. D., Cai, T. T., and DasGupta, A. (2001), “Interval estimation for a binomial proportion (with discussion),” Statistical Science, 16, 101–133. [60]CrossRef Google Scholar

Brown, M. B. and Forsythe, A. B. (1974a), “372: the ANOVA and multiple comparisons for data with heterogeneous variances,” Biometrics, 30, 719–724. [200, 211]Google Scholar

Brown, M. B. and Forsythe, A. B.. (1974b), “The small sample behavior of some statistics which test the equality of several means,” Technometrics, 16, 129–132. [200]Google Scholar

Brunner, E., Konietschke, F., Pauly, M., and Puri, M. L. (2017), “Rank-based procedures in factorial designs: hypotheses about non-parametric treatment effects,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1463–1485. [221]CrossRef Google Scholar

Brunner, E. and Munzel, U. (2000), “The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation,” Biometrical Journal, 42, 17–25. [95, 147, 157, 361]Google Scholar

Bühlmann, P., Kalisch, M., and Meier, L. (2014), “High-dimensional statistics with a view toward applications in biology,” Annual Review of Statistics and its Application, 1, 255–278. [271]Google Scholar

Burnham, K. P. and Anderson, D. R. (2002), Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed., New York: Springer. [273]Google Scholar

Candès, E., Fan, Y., Janson, L., and Lv, J. (2018), “Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 551–577. [273]Google Scholar

Carlin, B. P. and Louis, T. A. (2009), Bayesian Methods for Data Analysis, 3rd ed., Boca Raton, FL: CRC Press. [40, 402]Google Scholar

Carpenter, J. and Kenward, M. (2013), Multiple Imputation and its Application, Chichester: John Wiley & Sons. [339, 340]Google Scholar

Caruso, J. C. and Cliff, N. (1997), “Empirical size, coverage, and power of confidence intervals for Spearman’s Rho,” Educational and Psychological Measurement, 57, 637–654. [97]Google Scholar

Casella, G. (1986), “Refining binomial confidence intervals,” The Canadian Journal of Statistics/La Revue Canadienne de Statistique, 14, 113–129. [63]Google Scholar

Casella, G.. (1989), “Refining Poisson confidence intervals,” The Canadian Journal of Statistics/La Revue Canadienne de Statistique, 17, 45–57. [75]Google Scholar

Casella, G. and Berger, R. L. (1987), “Reconciling Bayesian and frequentist evidence in the one-sided testing problem,” Journal of the American Statistical Association, 82, 106–111. [402]CrossRef Google Scholar

Casella, G. and Berger, R. L.. (2002), Statistical Inference, 2nd ed., Pacific Grove, CA: Duxbury Press. [xi, 102, 123, 306, 374, 397, 402]Google Scholar

Chen, Y. J., DeMets, D. L., and Lan, K. G. (2004), “Increasing the sample size when the unblinded interim result is promising,” Statistics in Medicine, 23, 1023–1038. [354, 355]Google Scholar

Cheng, S., Wei, L., and Ying, Z. (1995), “Analysis of transformation models with censored data,” Biometrika, 82, 835–845. [261]Google Scholar

Chow, S.-C., Shao, J., Wang, H., and Lokhnygina, Y. (2018), Sample Size Calculations in Clinical Research, 3rd ed., Boca Raton, FL: Chapman and Hall/CRC. [382, 386]Google Scholar

Chung, E. and Romano, J. P. (2016), “Asymptotically valid and exact permutation tests based on two-sample U-statistics,” Journal of Statistical Planning and Inference, 168, 97–105. [157, 160]Google Scholar

Ciarleglio, M. M., Arendt, C. D., and Peduzzi, P. N. (2016), “Selection of the effect size for sample size determination for a continuous response in a superiority clinical trial using a hybrid classical and Bayesian procedure,” Clinical Trials, 13, 275–285. [385]Google Scholar

Cole, S.R.andHernán, M. A. (2008), “Constructing inverse probability weights for marginal structural models,” American Journal of Epidemiology, 168, 656–664. [292, 338]Google Scholar

Coulibaly, Y. I., Dembele, B., Diallo, A. A., et al. (2009), “A randomized trial of doxycycline for Mansonella perstans infection,” New England Journal of Medicine, 361, 1448–1458. [43, 104]Google Scholar

Cox, D. (1972), “Regression models and life-tables (with discussion),” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34, 187–220. [315]Google Scholar

Cox, D. and Hinkley, D. (1974), Theoretical Statistics, London: Chapman and Hall. [364]Google Scholar

Cox, D. R. (1975), “Partial likelihood,” Biometrika, 62, 269–276. [45]Google Scholar

Crow, E. (1956), “Confidence intervals for a proportion,” Biometrika, 43, 423–435. [63]CrossRef Google Scholar

Cui, Y. and Hannig, J. (2019), “Nonparametric generalized fiducial inference for survival functions under censoring,” Biometrika, 106, 501–518. [306]Google Scholar

D’AgostinoSr., R. B., Massaro, J. M., and Sullivan, L. M. (2003), “Non-inferiority trials: design concepts and issues–the encounters of academic consultants in statistics,” Statistics in Medicine, 22, 169–186. [369, 373]Google Scholar

Dagum, C. (2006), “Income inequality measures,” Encyclopedia of Statistical Sciences DOI:10.1002/ 0471667196.ess6030.pub2. [80]CrossRef Google Scholar

Davidson, A. and Hinkley, D. (1997), Bootstrap Methods and Their Application, New York: Cambridge University Press. [181, 184, 190]Google Scholar

De Neve, J., Thas, O., and Gerds, T. A. (2019), “Semiparametric linear transformation models: Effect measures, estimators, and applications,” Statistics in Medicine, 38, 1484–1501. [274]Google Scholar

De Veaux, R. D. and Hand, D. J. (2005), “How to lie with bad data,” Statistical Science, 20, 231–238. [47]Google Scholar

Demets, D. L. and Lan, K. G. (1994), “Interim analysis: the alpha spending function approach,” Statistics in Medicine, 13, 1341–1352. [356]Google Scholar

DerSimonian, R. and Kacker, R. (2007), “Random-effects model for meta-analysis of clinical trials: an update,” Contemporary Clinical Trials, 28, 105–114. [227]Google Scholar

DerSimonian, R. and Laird, N. (1986), “Meta-analysis in clinical trials,” Controlled Clinical Trials, 7, 177–188. [227]Google Scholar

Diaconis, P. and Efron, B. (1985), “Testing for independence in a two-way table: new interpretations of the chi-square statistic,” The Annals of Statistics, 13, 845–874. [198]Google Scholar

DiCiccio, T. J. and Efron, B. (1996), “Bootstrap confidence intervals,” Statistical Science, 11, 189–212. [185]Google Scholar

Ding, P., Feller, A., and Miratrix, L. (2016), “Randomization inference for treatment effect variation,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 655–671. [300]Google Scholar

Druesne-Pecollo, N., Latino-Martel, P., Norat, T., et al. (2010), “Beta-carotene supplementation and cancer risk: a systematic review and metaanalysis of randomized controlled trials,” International Journal of Cancer, 127, 172–184. [28, 47]Google Scholar

Dudewicz, E. and Mishra, S. (1988), Modern Mathematical Statistics, New York: Wiley. [82, 139]Google Scholar

Dudoit, S. and Van Der Laan, M. (2008), Multiple Testing Procedures with Applications to Genomics, New York: Springer. [245, 250]Google Scholar

Dunnett, C. W. (1955), “A multiple comparison procedure for comparing several treatments with a control,” Journal of the American Statistical Association, 50, 1096–1121. [208]Google Scholar

Edgington, E. S. (1987), Randomization Tests, 2nd ed., New York: Marcel Dekker. [34]Google Scholar

Efron, B. and Hinkley, D. V. (1978), “Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information,” Biometrika, 65, 457–483. [171]Google Scholar

Efron, B. and Tibshirani, R. (1993), An Introduction to the Bootstrap, vol. 57, Boca Raton, FL: CRC press. [183, 184, 190]Google Scholar

Ellenberg, J. (2014), How Not to Be Wrong: The Power of Mathematical Thinking, New York: Penguin Press. [37]Google Scholar

Fagerland, M. W. and Hosmer, D. W. (2013), “A goodness-of-fit test for the proportional odds regression model,” Statistics in Medicine, 32, 2235–2249. [365]Google Scholar

Farrington, C. P. and Manning, G. (1990), “Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk,” Statistics in Medicine, 9, 1447–1454. [116, 121]Google Scholar

Fay, M. P. (2010a), “Confidence intervals that match Fisher’s exact or Blaker’s exact tests,” Biostatistics, 11, 373–374. [22, 122]Google Scholar

Fay, M. P. (1999a), “Approximate confidence intervals for rate ratios from directly standardized rates with sparse data,” Communications in Statistics-Theory and Methods, 28, 2141–2160. [230, 234]Google Scholar

Fay, M. P.. (1999b), “Comparing several score tests for interval censored data (Corr: 1999V18 p2681),” Statistics in Medicine, 18, 273–285. [321]Google Scholar

Fay, M. P.. (2005), “Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement,” Biostatistics, 6, 171–180. [99, 100, 103]Google Scholar

Fay, M. P.. (2010b), “Two-sided exact tests and matching confidence intervals for discrete data,” R Journal, 2, 53–58. [22, 75, 80]Google Scholar

Fay, M. P. and Brittain, E. H. (2016), “Finite sample pointwise confidence intervals for a survival distribution with right-censored data,” Statistics in Medicine, 35, 2726–2740. [115, 305, 306, 325]Google Scholar

Fay, M. P., Brittain, E. H., and Proschan, M. A. (2013), “Pointwise confidence intervals for a survival distribution for right censored data with small samples or heavy censoring,” Biostatistics, 14, 723–736. [305, 306, 307, 322, 325]Google Scholar

Fay, M. P., Brittain, E. H., Shih, J. H., Follmann, D. A., and Gabriel, E. E. (2018a), “Causal estimands and confidence intervals associated with Wilcoxon-Mann-Whitney tests in randomized experiments,” Statistics in Medicine, 37, 2923–2937. [146, 157, 159, 160, 301]Google Scholar

Fay, M. P. and Feuer, E. J. (1997), “Confidence intervals for directly standardized rates: a method based on the gamma distribution,” Statistics in Medicine, 16, 791–801. [230]Google Scholar

Fay, M. P. and Follmann, D. A. (2002), “Designing Monte Carlo implementations of permutation or bootstrap hypothesis tests,” The American Statistician, 56, 63–70. [181]Google Scholar

Fay, M. P., Follmann, D. A., Lynn, F., et al. (2012), “Anthrax vaccine–induced antibodies provide cross-species prediction of survival to aerosol challenge,” Science Translational Medicine, 4, 151ra126. [47]CrossRef Google Scholar PubMed

Fay, M. P., Freedman, L. S., Clifford, C. K., and Midthune, D. N. (1997), “Effect of different types and amounts of fat on the development of mammary tumors in rodents: a review,” Cancer Research, 57, 3979–3988. [175]Google Scholar

Fay, M. P. and Graubard, B. I. (2001), “Small-sample adjustments for Wald-type tests using sandwich estimators,” Biometrics, 57, 1198–1206. [175, 315]Google Scholar

Fay, M. P., Graubard, B. I., Freedman, L. S., and Midthune, D. N. (1998), “Conditional logistic regression with sandwich estimators: application to a meta-analysis,” Biometrics, 54, 195–208. [273]CrossRef Google Scholar PubMed

Fay, M. P., Halloran, M. E., and Follmann, D. A. (2007), “Accounting for variability in sample size estimation with applications to nonadherence and estimation of variance and effect size,” Biometrics, 63, 465–474. [384, 385, 386, 387]Google Scholar

Fay, M. P. and Hunsberger, S. A. (2021), “Practical valid inferences for the two-sample binomial problem,” Statistics Surveys, 15, 72–110. [16, 22, 121, 122]Google Scholar

Fay, M. P. and Kim, S. (2017), “Confidence intervals for directly standardized rates using mid-p gamma intervals,” Biometrical Journal, 59, 377–387. [230, 232]Google Scholar

Fay, M. P. and Lumbard, K. (2021), “Confidence intervals for difference in proportions for matched pairs compatible with exact McNemar’s or sign tests,” Statistics in Medicine, 40, 1147–1159. [102]Google Scholar

Fay, M. P. and Malinovsky, Y. (2018), “Confidence intervals of the Mann-Whitney parameter that are compatible with the Wilcoxon-Mann-Whitney test,” Statistics in Medicine, 37, 3991–4006. [5, 128, 147, 149, 157, 160, 180, 234, 259, 316, 326, 387]CrossRef Google Scholar PubMed

Fay, M. P. and Proschan, M. A. (2010), “Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules,” Statistics Surveys, 4, 1–39. [82, 128, 157, 158, 160]Google Scholar

Fay, M. P., Proschan, M. A., and Brittain, E. (2015), “Combining one-sample confidence procedures for inference in the two-sample case,” Biometrics, 146–156. [81, 143, 190, 400, 401]Google Scholar

Fay, M. P., Proschan, M. A., Brittain, E. H., and Tiwari, R. (2021), “Interpreting p-values and confidence intervals using well-calibrated null preference priors,” Statistical Science. https://imstat.org/journals-and-publications/statistical-science/statistical-science-future-papers/ [397, 399, 400, 401, 402]Google Scholar

Fay, M. P., Sachs, M. C., and Miura, K. (2018b), “Measuring precision in bioassays: rethinking assay validation,” Statistics in Medicine, 37, 519–529. [79]Google Scholar

Fay, M. P. and Shaw, P. A. (2010), “Exact and asymptotic weighted logrank tests for interval censored data: the interval R package,” Journal of Statistical Software, 36, 1–34. [321, 325]Google Scholar

Fay, M. P. and Shih, J. H. (1998), “Permutation tests using estimated distribution functions,” Journal of the American Statistical Association, 93, 387–396. [214, 221]Google Scholar

Fay, M. P. and Shih, J. H.. (2012), “Weighted logrank tests for interval censored data when assessment times depend on treatment,” Statistics in Medicine, 31, 3760–3772. [321]Google Scholar

Fay, M. P., Tiwari, R. C., Feuer, E. J., and Zou, Z. (2006), “Estimating average annual percent change for disease rates without assuming constant change,” Biometrics, 62, 847–854. [234]Google Scholar

FDA. (2016), “Guidance for industry: non-inferiority clinical trials to establish effectiveness,” US Department of Health and Human Services and US Food and Drug Administration, Washington, DC. [369]Google Scholar

Finkelstein, D. M., Goggins, W. B., and Schoenfeld, D. A. (2002), “Analysis of failure time data with dependent interval censoring,” Biometrics, 58, 298–304. [323]Google Scholar

Finniss, D. G., Kaptchuk, T. J., Miller, F., and Benedetti, F. (2010), “Biological, clinical, and ethical advances of placebo effects,” The Lancet, 375, 686–695. [47]Google Scholar

Firth, D. (1993), “Bias reduction of maximum likelihood estimates,” Biometrika, 80, 27–38. [258, 273]CrossRef Google Scholar

Fitzmaurice, G. M., Laird, N. M., and Ware, J. H. (2004), Applied Longitudinal Analysis, Hoboken, NJ: John Wiley & Sons. [275]Google Scholar

Fleiss, J. L., Levin, B., and Paik, M. C. (2003), Statistical Methods for Rates and Proportions, 3rd ed., Hoboken, NJ: John Wiley & Sons. [76]Google Scholar

Fleming, T. and Harrington, D. (1991), Counting Processes and Survival Analysis, New York: Wiley. [316, 323]Google Scholar

Follmann, D., Brittain, E., and Powers, J. H. (2013), “Discordant minimum inhibitory concentration analysis: a new path to licensure for anti-infective drugs,” Clinical Trials, 10, 876–885. [369, 375]Google Scholar

Follmann, D. and Fay, M. (2010), “Exact inference for complex clustered data using within-cluster resampling,” Journal of Biopharmaceutical Statistics, 20, 850–869. [188]Google Scholar

Follmann, D., Proschan, M., and Leifer, E. (2003), “Multiple outputation: inference for complex clustered data by averaging analyses from independent data,” Biometrics, 59, 420–429. [188]Google Scholar

Freedman, L. S. (2008), “An analysis of the controversy over classical one-sided tests,” Clinical Trials, 5, 635–640. [395, 396]Google Scholar

Freeman, G. and Halton, J. H. (1951), “Note on an exact treatment of contingency, goodness of fit and other problems of significance,” Biometrika, 38, 141–149. [197]Google Scholar

Freidlin, B., Korn, E. L., Hunsberger, S., et al. (2007), “Proposal for the use of progression-free survival in unblinded randomized trials,” Journal of Clinical Oncology, 25, 2122–2126. [324]Google Scholar

Freireich, E. J., Gehan, E., Frei, E., et al. (1963), “The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia: a model for evaluation of other potentially useful therapy,” Blood, 21, 699–716. [310]Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, 33, 1. [273]Google Scholar

Gail, M. H. (1974). “Power computations for designing comparative Poisson trials,” Biometrics, 30, 2, 231–237. [388]Google Scholar

Gail, M. H., Lubin, J. H., and Rubinstein, L. V. (1981), “Likelihood calculations for matched case-control studies and survival studies with tied death times,” Biometrika, 68, 703–707. [273]Google Scholar

Galton, F. (1886), “Regression towards mediocrity in hereditary stature.” Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263. [33]Google Scholar

Gautret, P., Lagier, J.-C., Parola, P., et al. (2020), “Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial,” International Journal of Antimicrobial Agents, 56, 105949. [1, 2]Google Scholar

Gehan, E. A. (1965), “A generalized Wilcoxon test for comparing arbitrarily singly-censored samples,” Biometrika, 52, 203–224. [310, 311, 317]Google Scholar

Gelman, A., Carlin, J., Stern, H., et al. (2013), Bayesian Data Analysis, 3rd ed., New York: CRC Press. [40, 392, 400, 401]Google Scholar

Gelman, A. (2005), “Analysis of variancewhy it is more important than ever (with discussion),” The Annals of Statistics, 33, 1–53. [212]CrossRef Google Scholar

Gentleman, R. and Geyer, C. (1994), “Maximum likelihood for interval censored data: consistency and computation,” Biometrika, 81, 618–623. [320]Google Scholar

Gentleman, R. and Vandal, A. (2001), “Computational algorithms for censored-data problems using intersection graphs,” Journal of Compuational and Graphical Statistics, 10, 403–421. [324]Google Scholar

Gentleman, R. and Vandal, A.. (2002), “Nonparametric estimation of the bivariate CDF for arbitrarily censored data,” Canadian Journal of Statistics, 30, 557–571. [320]Google Scholar

Ghosh, B. (2006), “Sequential analysis”, in Encyclopedia of Statistical Sciences, eds. Kotz, S., Read, C. B., Balakrishnan, N., et al., Wiley Online Library. DOI:10.1002/0471667196.ess2398.pub2 [356]Google Scholar

Ghosh, M. (2011), “Objective priors: an introduction for frequentists,” Statistical Science, 26, 187–202. [401]Google Scholar

Gibbons, J. (1971), Nonparametric Statistical Inference, New York: McGraw-Hill Book Company. [95, 96, 97]Google Scholar

Goeman, J. J. and Solari, , A. (2014), “Multiple hypothesis testing in genomics,” Statistics in Medicine, 33, 1946–1978. [245, 246]Google Scholar

Gould, S. and Norris, S. L. (2021), “Contested effects and chaotic policies: the 2020 story of (hydroxy) chloroquine for treating COVID-19,” Cochrane Database of Systematic Reviews, 2021, 3 (ED000151), 1–5. [2]Google Scholar

Graybill, F. A. (1976), Theory and Application of the Linear Model, Pacific Grove, CA: Wadsworth Publishing Company. [190, 254]Google Scholar

Greenland, S. (2017), “Invited commentary: the need for cognitive science in methodology,” American Journal of Epidemiology, 186, 639–645. [6, 21]CrossRef Google Scholar PubMed

Greenland, . (2019), “Valid p-values behave exactly as they should: some misleading criticisms of p-values and their resolution with s-values,” American Statistician, 73, 106–114. [9]Google Scholar

Grimes, D. A. and Schulz, K. F. (2002), “Uses and abuses of screening tests,” The Lancet, 359, 881–884. [32]Google Scholar

Groeneboom, P., Jongbloed, G., and Wellner, J. (2008), “The support reduction algorithm for computing nonparametric function estimates in mixture models,” Scandinavian Journal of Statistics, 35, 385–399. [324]Google Scholar

Guo, X., Pan, W., Connett, J. E., Hannan, P. J., and French, S. A. (2005), “Small-sample performance of the robust score test and its modifications in generalized estimating equations,” Statistics in Medicine, 24, 3479–3495. [175]Google Scholar

Hall, W. J. and Wellner, J. A. (1980), “Confidence bands for a survival curve from censored data,” Biometrika, 67, 133–143. [306]Google Scholar

Halloran, M. E., Longini, I. M., and Struchiner, C. J. (2010), Design and Analysis of Vaccine Studies, New York: Springer. [41, 279, 281]Google Scholar

Hampel, F. R., Ronchetti, E. M., and Rousseeuw, P. J. (1986), Robust Statistics: The Approach Based on Influence Functions, New York: John Wiley & Sons. [153]Google Scholar

Hand, D. J. (1992), “On comparing two treatments,” The American Statistician, 46, 190–192. [160]Google Scholar

Hand, D. J.. (1994), “Deconstructing statistical questions,” Journal of the Royal Statistical Society: Series A (Statistics in Society), 157, 317–338. [47, 128]Google Scholar

Hanley, J. A. and McNeil, B. J. (1982), “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, 143, 29–36. [130]Google Scholar

Harrell, F., Lee, K. L., and Mark, D. B. (1996), “Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors,” Statistics in Medicine, 15, 361–387. [130]Google Scholar

Harrington, D. P. and Fleming, T. R. (1982), “A class of rank test procedures for censored survival data,” Biometrika, 69, 553–566. [317]Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., New York: Springer. [268, 273]Google Scholar

Haybittle, J. (1971), “Repeated assessment of results in clinical trials of cancer treatment,” The British Journal of Radiology, 44, 793–797. [348]Google Scholar

Hayes, R. J. and Moulton, L. H. (2009), Cluster Randomised Trials, Boca Raton, FL: Chapman and Hall/CRC. [216, 232]Google Scholar

Heinze, G. (2006), “A comparative investigation of methods for logistic regression with separated or nearly separated data,” Statistics in Medicine, 25, 4216–4226. [258]Google Scholar

Hennekens, C. H., Buring, J. E., Manson, J. E., et al. (1996), “Lack of effect of long-term supplementation with beta carotene on the incidence of malignant neoplasms and cardiovascular disease,” New England Journal of Medicine, 334, 1145–1149. [47]Google Scholar

Hepworth, G. (1996), “Exact confidence intervals for proportions estimated by group testing,” Biometrics, 1134–1146. [64, 65]Google Scholar

Hepworth, G.. (2005), “Confidence intervals for proportions estimated by group testing with groups of unequal size,” Journal of Agricultural, Biological, and Environmental Statistics, 10, 478–497. [64, 65]Google Scholar

Hernán, M. A., Alonso, A., Logan, R., et al. (2008), “Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease,” Epidemiology, 19, 766–779. [47]Google Scholar

Hernán, M.A.andHernández-Díaz, S. (2012), “Beyond the intention-to-treat in comparative effectiveness research,” Clinical Trials, 9, 48–55. [36]Google Scholar

Hernán, M. A. and Robins, J. M. (2020), Causal Inference: What If, Boca Raton, FL: Chapman and HallCRC. [292, 296, 300, 371]Google Scholar

Hirji, K. (2006), Exact Analysis of Discrete Data, New York: Chapman and Hall/CRC. [121]Google Scholar

Hochberg, Y. and Tamhane, A. C. (1987), Multiple Comparison Procedures, New York: Wiley. [201, 206, 207, 211, 250]Google Scholar

Hoffman, E. B., Sen, P. K., and Weinberg, C. R. (2001), “Within-cluster resampling,” Biometrika, 88, 420–429. [187]Google Scholar

Hollander, M., Wolfe, D. A., and Chicken, E. (2014), Nonparametric Statistical Methods, 3rd ed., Hoboken, NJ: John Wiley & Sons. [91, 97, 101, 102, 204, 363]Google Scholar

Hommel, G. (1988), “A stagewise rejective multiple test procedure based on a modified Bonferroni test,” Biometrika, 75, 383–386. [240]Google Scholar

Hosmer, D. and Lemeshow, S. (1980), “Goodness of fit statistics tests for the multiple regression model,” Communications in Statistics A, 9, 1043–1069. [365]Google Scholar

Hu, X., Jung, A., and Qin, G. (2020), “Interval estimation for the correlation coefficient,” The American Statistician, 74, 29–36. [97, 101]Google Scholar

Huang, J., Lee, C., and Yu, Q. (2008), “A generalized log-rank test for interval-censored failure time data via multiple imputation,” Statistics in Medicine, 27, 3217–3226. [321]Google Scholar

Hudgens, M. G. (2005), “On nonparametric maximum likelihood estimation with interval censoring and left truncation,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 573–587. [321]Google Scholar

Hudgens, M. G. and Halloran, M. E. (2008), “Toward causal inference with interference,” Journal of the American Statistical Association, 103, 832–842. [289]Google Scholar

Hyndman, R. J. and Fan, Y. (1996), “Sample quantiles in statistical packages,” The American Statistician, 50, 361–365. [361]Google Scholar

Ignatova, I., Deutsch, R. C., and Edwards, D. (2012), “Closed sequential and multistage inference on binary responses with or without replacement,” The American Statistician, 66, 163–172. [349]Google Scholar

Imbens, G. W. and Rubin, D. B. (2015), Causal Inference in Statistics, Social, and Biomedical Sciences, New York: Cambridge University Press. [277, 278, 287, 288, 300]Google Scholar

Ioannidis, J. P. (2005), “Why most published research findings are false,” PLoS Medicine, 2, e124. [7]Google Scholar

Irwin, J. (1935), “Tests of significance for differences between percentages based on small numbers,” Metron, 12, 84–94. [110]Google Scholar

Jefferys, W. H. (1990), “Bayesian analysis of random event generator data,” Journal of Scientific Exploration, 4, 153–169. [394]Google Scholar

Jennison, C. and Turnbull, B. W. (2000), Group Sequential Methods with Applications to Clinical Trials, Boca Raton, FL: Chapman and Hall/CRC. [349, 356]Google Scholar

Jennison, C. and Turnbull, B. W.. (2007), “Adaptive seamless designs: selection and prospective testing of hypotheses,” Journal of Biopharmaceutical Statistics, 17, 1135–1161. [354]Google Scholar

Johnson, N. L., Kemp, A. W., and Kotz, S. (2005), Univariate Discrete Distributions, 3rd edition, New York: John Wiley & Sons. [123]Google Scholar

Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995), Continuous Univariate Distributions, vol.2,New York: John Wiley & Sons. [97]Google Scholar

Kahneman, D. (2011), Thinking, Fast and Slow, New York: Farrar, Straus, and Giroux. [32]Google Scholar

Kalbfleisch, J. and Prentice, R. (2002), The Statistical Analysis of Failure Time Data, 2nd ed., NewYork: Wiley. [260, 306, 311, 315, 316, 323, 324]Google Scholar

Kang, J. D. and Schafer, J. L. (2007), “Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussion),” Statistical Science, 22, 523–539 (discussion: 540–580). [338]Google Scholar

Kaplan, E. and Meier, P. (1958), “Nonparametric estimation from incomplete observations,” Journal of the American Statistical Association, 53, 457–481. [305]Google Scholar

Karlin, S. and Taylor, H. (1975), A First Course in Stochastic Processes, 2nd ed., New York: Academic Press. [82, 357]Google Scholar

Kass, R. E. and Raftery, A. E. (1995), “Bayes factors,” Journal of the American Statistical Association, 90, 773–795. [393, 394, 401]Google Scholar

Kauermann, G. and Carroll, R. J. (2001), “A note on the efficiency of sandwich covariance matrix estimation,” Journal of the American Statistical Association, 96, 1387–1396. [175]Google Scholar

Kawaguchi, Atsushi, and Koch, Gary, G. (2015), “sanon: an R package for stratified analysis with nonparametric covariable adjustment,” Journal of Statistical Software, 67, 9. [226]Google Scholar

Kim, S., Fay, M. P., and Proschan, M. A. (2021), “Valid and approximately valid confidence intervals for current status data,” Journal of the Royal Statistical Society: Series B, DOI:10.1111/rssb.12422 [320]Google Scholar

Kirk, J. L. and Fay, M. P. (2014), “An introduction to practical sequential inferences via single-arm binary response studies using the binseqtest R package,” The American Statistician, 68, 230–242. [349]Google Scholar

Konietschke, F., Hothorn, L. A., Brunner, E., et al. (2012), “Rank-based multiple test procedures and simultaneous confidence intervals,” Electronic Journal of Statistics, 6, 738–759. [210, 212, 244]Google Scholar

Konietschke, F., Placzek, M., Schaarschmidt, F., and Hothorn, L. A. (2015), “nparcomp: an R software package for nonparametric multiple comparisons and simultaneous confidence intervals,” Journal of Statistical Software, 64, 9, 1–17. [212]Google Scholar

Koopmans, L. H., Owen, D. B., and Rosenblatt, J. (1964), “Confidence intervals for the coefficient of variation for the normal and log normal distributions,” Biometrika, 51, 25–32. [80]Google Scholar

Korn, E. L. and Graubard, B. I. (1999), Analysis of Health Surveys, vol. 323, New York: John Wiley & Sons. [47, 64]Google Scholar

Koziol, J. A. and Jia, Z. (2009), “The concordance index C and the Mann–Whitney parameter Pr (X¿ Y) with randomly censored data,” Biometrical Journal, 51, 467–474. [130]Google Scholar

Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2017), “lmerTest package: tests in linear mixed effects models,” Journal of Statistical Software, 82, 13, 1–26. [219]Google Scholar

Lachin, J. M. (1981), “Introduction to sample size determination and power analysis for clinical trials,” Controlled Clinical Trials, 2, 93–113. [386]Google Scholar

Lan, K. G. and DeMets, D. L. (1983), “Discrete sequential boundaries for clinical trials,” Biometrika, 70, 659–663. [346]Google Scholar

Lan, K. G. and Wittes, J. (1988), “The B-value: a tool for monitoring data,” Biometrics, 579–585. [347]Google Scholar

Lan, K. G. and Wittes, J. T. (2012), “Some thoughts on sample size: a Bayesian-frequentist hybrid approach,” Clinical Trials, 9, 561–569. [384]Google Scholar

Lang, Z. and Reiczigel, J. (2014), “Confidence limits for prevalence of disease adjusted for estimated sensitivity and specificity,” Preventive Veterinary Medicine, 113, 13–22. [64]Google Scholar

Lehmann, E. (1975), Nonparametrics: Statistical Methods Based on Ranks, Oakland, CA: Holden-Day. [203, 226]Google Scholar

Lehmann, E.. (1999), Elements of Large Sample Theory, New York: Springer. [83, 101, 153, 166, 172, 190]Google Scholar

Lehmann, E. and Romano, J. (2005), Testing Statistical Hypotheses, 3rd ed., New York: Springer. [xii, xiii, 9, 20, 21, 71, 77, 81, 82, 151, 158, 190, 191, 363, 374]Google Scholar

Lilliefors, H. W. (1967), “On the Kolmogorov-Smirnov test for normality with mean and variance unknown,” Journal of the American Statistical Association, 62, 399–402. [374]Google Scholar

Lin, D. Y. and Wei, L.-J. (1989), “The robust inference for the Cox proportional hazards model,” Journal of the American Statistical Association, 84, 1074–1078. [315]Google Scholar

Lin, L. I.-K. (1989), “A concordance correlation coefficient to evaluate reproducibility,” Biometrics, 45, 255–268. [99]Google Scholar

Lindley, D. V. and Phillips, L. (1976), “Inference for a Bernoulli process (a Bayesian view),” The American Statistician, 30, 112–119. [47]Google Scholar

Little, R. J. and Rubin, D. B. (2020), Statistical Analysis with Missing Data, 3rd ed., New York: John Wiley & Sons. [340]Google Scholar

Little, R. J., Wang, J., Sun, X., et al. (2016), “The treatment of missing data in a large cardiovascular clinical outcomes study,” Clinical Trials, 13, 344–351. [339]Google Scholar

Liublinska, V. and Rubin, D. B. (2014), “Sensitivity analysis for a partially missing binary outcome in a two-arm randomized clinical trial,” Statistics in Medicine, 33, 4170–4185. [332]Google Scholar

Lloyd, C. J. (2008), “Exact p-values for discrete models obtained by estimation and maximization,” Australian & New Zealand Journal of Statistics, 50, 329–345. [114]Google Scholar

Loughin, T. M. (2004), “A systematic comparison of methods for combining p-values from independent tests,” Computational Statistics & Data Analysis, 47, 467–485. [352]Google Scholar

Lunceford, J. K. and Davidian, M. (2004), “Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study,” Statistics in Medicine, 23, 2937–2960. [39, 292]Google Scholar

Lydersen, S., Pradhan, V., Senchaudhuri, P., and Laake, P. (2007), “Choice of test for association in small sample unordered r × ctables,”Statistics in Medicine, 26, 4328–4343. [365]Google Scholar

Mann, H. B. and Whitney, D. R. (1947), “On a test of whether one of two random variables is stochastically larger than the other,” The Annals of Mathematical Statistics, 18, 50–60. [143]Google Scholar

Mantel, N. (1966), “Evaluation of survival data and two new rank order statistics arising in its consideration,” Cancer Chemotherapy Reports, 50, 163–170. [316]Google Scholar

Marcus, R., Eric, P., and Gabriel, K. R. (1976), “On closed testing procedures with special reference to ordered analysis of variance,” Biometrika, 63, 655–660. [248]Google Scholar

Martín Andrés, A., Sánchez Quevedo, M., and Silva Mato, A. (1998), “Fisher’s mid-p-value arrangement in 2 × 2 comparative trials,” Computational Statistics & Data Analysis, 29, 107–115. [112]Google Scholar

Mayo, D. G. (1996), Error and the Growth of Experimental Knowledge, Chicago, IL: University of Chicago Press. [46]Google Scholar

McCullagh, P. (1980), “Regression models for ordinal data,” Journal of the Royal Statistical Society: Series B (Methodological), 42, 109–142. [151]Google Scholar

McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models, 2nd ed., London: Chapman and Hall. [133, 212, 217, 257, 259, 273]Google Scholar

Mee, R. W. (1990), “Confidence intervals for probabilities and tolerance regions based on a generalization of the Mann-Whitney statistic,” Journal of the American Statistical Association, 85, 793–800. [130]Google Scholar

Mehta, C. R. and Patel, N. R. (1995), “Exact logistic regression: theory and examples,” Statistics in Medicine, 14, 2143–2160. [258, 273]Google Scholar

Mehta, C. R., Patel, N. R., and Gray, R. (1985), “Computing an exact confidence interval for the common odds ratio in several 2 × 2 contingency tables,” Journal of the American Statistical Association, 80, 969–973. [123]Google Scholar

Mehta, J. and Srinivasan, R. (1970), “On the BehrensFisher problem,” Biometrika, 57, 649–655. [139]Google Scholar

Meng, X.-L. (1994), “Posterior predictive p-values,” The Annals of Statistics, 22, 1142–1160. [402]CrossRef Google Scholar

Michael, H., Thornton, S., Xie, M., and Tian, L. (2019), “Exact inference on the random-effects model for meta-analyses with few studies,” Biometrics, 75, 485–493. [227, 232, 233]Google Scholar

Miettinen, O. and Nurminen, M. (1985), “Comparative analysis of two rates,” Statistics in Medicine, 4, 213–226. [121]Google Scholar

Morgan, S. L. and Winship, C. (2015), Counterfactuals and Causal Inference 2nd ed., New York: Cambridge University Press. [298, 300, 301]Google Scholar

Moser, B. K., Stevens, G. R., and Watts, C. L. (1989), “The two-sample t test versus Satterthwaite’s approximate F test,” Communications in Statistics – Theory and Methods, 18, 3963–3975. [139]Google Scholar

Mullen, G. E., Ellis, R. D., Miura, K., et al. (2008), “Phase 1 trial of AMA1-C1/Alhydrogel plus CPG 7909: an asexual blood-stage vaccine for Plasmodium falciparum malaria,” PLoS One, 3, e2940. [131]Google Scholar

Murphy, S., Rossini, A., and van der Vaart, A. W. (1997), “Maximum likelihood estimation in the proportional odds model,” Journal of the American Statistical Association, 92, 968–976. [261, 315]Google Scholar

Murphy, S. A. and van der Vaart, A. W. (2000), “On profile likelihood,” Journal of the American Statistical Association, 95, 449–465. [45, 188, 315]Google Scholar

National Research Council. (2010), The Prevention and Treatment of Missing Data in Clinical Trials, Washington, DC: National Academies Press. [327, 329, 339, 340]Google Scholar

Nel, D. d., van der Merwe, C. A., and Moser, B. (1990), “The exact distributions of the univariate and multivariate Behrens-Fisher statistics with a comparison of several solutions in the univariate case,” Communications in Statistics – Theory and Methods, 19, 279–298. [139]Google Scholar

Neubert, K. and Brunner, E. (2007), “A studentized permutation test for the non-parametric Behrens–Fisher problem,” Computational Statistics & Data Analysis, 51, 5192–5204. [157]Google Scholar

Newcombe, R. G. (2006), “Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic methods and evaluation,” Statistics in Medicine, 25, 559–573. [157]Google Scholar

Neyman, J. and Scott, E. L. (1948), “Consistent estimates based on partially consistent observations,” Econometrica, 16, 1–32. [263]Google Scholar

Ng, H. K. T., Filardo, G., and Zheng, G. (2008), “Confidence interval estimating procedures for standardized incidence rates,” Computational Statistics & Data Analysis, 52, 3501–3516. [230]Google Scholar

Oakes, D. (2016), “On the win-ratio statistic in clinical trials with multiple types of event,” Biometrika, 103, 742–745. [260]Google Scholar

O’Brien, P. C. and Fleming, T. R. (1987), “A paired Prentice-Wilcoxon test for censored paired data,” Biometrics, 43, 169–180. [103]Google Scholar

Oller, R., Gómez, G., Calle, M. L. (2007), “Interval censoring: identifiability and the constant-sum property,” Biometrika, 94, 61–70. [319]Google Scholar

Owen, A. B. (2001), Empirical Likelihood, Boca Raton, FL: Chapman and Hall/CRC. [188]Google Scholar

Park, M. Y. and Hastie, T. (2007), “L1-regularization path algorithm for generalized linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69, 659–677. [273]Google Scholar

Paule, R. C. and Mandel, J. (1982), “Consensus values and weighting factors,” Journal of Research of the National Bureau of Standards, 87, 377–385. [227]Google Scholar

Pauly, M., Asendorf, T., and Konietschke, F. (2016), “Permutation-based inference for the AUC: a unified approach for continuous and discontinuous data,” Biometrical Journal, 58, 1319–1337. [157, 160]Google Scholar

Pearl, J. (2009a), “Causal inference in statistics: an overview,” Statistics Surveys, 3, 96–146. [300]Google Scholar

Pearl, J.. (2009b), Causality: Models, Reasoning, and Inference, 2nd ed., New York: Cambridge University Press. [24, 46, 222, 232, 277, 296, 300]Google Scholar

Pearl, J., Glymour, M., and Jewell, N. P. (2016), Causal Inference in Statistics: A Primer, Chichester: John Wiley & Sons. [270, 295, 296, 300]Google Scholar

Peikes, D. N., Moreno, L., and Orzol, S. M. (2008), “Propensity score matching: a note of caution for evaluators of social programs,” The American Statistician, 62, 222–231. [26]Google Scholar

Perlman, M. and Wu, L. (1999), “The emperor’s new tests (with discussion),” Statistical Science, 14, 355–381. [21]Google Scholar

Peto, R. and Peto, J. (1972), “Asymptotically efficient rank invariant test procedures,” Journal of the Royal Statistical Society A, 135, 185–207. [316, 317, 325]Google Scholar

Peto, R., Pike, M., Armitage, P., et al. (1976), “Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design,” British Journal of Cancer, 34, 585. [348]Google Scholar

Plesser, H. E. (2018), “Reproducibility vs. replicability: a brief history of a confused terminology,” Frontiers in Neuroinformatics, 11, 76. [xi]Google Scholar

Popper, K. (1963), Conjectures and Refutations: The Growth of Scientific Knowledge, London: Routledge. [28]Google Scholar

Posch, M. and Bauer, P. (1999), “Adaptive two stage designs and the conditional error function,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, 41, 689–696. [356]Google Scholar

Pratt, J. W. (1959), “Remarks on zeros and ties in the Wilcoxon signed rank procedures,” Journal of the American Statistical Association, 54, 655–667. [87, 89, 90]Google Scholar

Pratt, J. W.. (1964), “Robustness of some procedures for the two-sample location problem,” Journal of the American Statistical Association, 59, 665–680. [157]Google Scholar

Prentice, R. L. (1978), “Linear rank tests with right censored data,” Biometrika, 65, 167–179. [317]Google Scholar

Prentice, R. L., Langer, R., Stefanick, M. L., et al. (2005), “Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women’s Health Initiative clinical trial,” American Journal of Epidemiology, 162, 404–414. [25, 47]Google Scholar

Prentice, R. L. and Marek, P. (1979), “A qualitative discrepancy between censored data rank tests,” Biometrics, 35, 861–867. [317, 325]Google Scholar

PREVAIL II Writing Group. (2016), “A randomized, controlled trial of ZMapp for Ebola virus infection,” The New England Journal of Medicine, 375, 1448. [111]Google Scholar

Proschan, M. and Brittain, E. (2020), “A primer on strong versus weak control of familywise error rate,” Statistics in Medicine, 39, 1407–1413. [213]Google Scholar

Proschan, M., Brittain, E., and Kammerman, L. (2011), “Minimize the use of minimization with unequal allocation,” Biometrics, 67, 1135–1141. [34]Google Scholar

Proschan, M. and Follmann, D. (2008), “Cluster without fluster: the effect of correlated outcomes on inference in randomized clinical trials,” Statistics in Medicine, 27, 795–809. [41]Google Scholar

Proschan, M. A. (1999), “Miscellanea. Properties of spending function boundaries,” Biometrika, 86, 466–473. [357]Google Scholar

Proschan, M. A., Follmann, D. A., and Waclawiw, M. A. (1992), “Effects of assumption violations on type I error rate in group sequential monitoring,” Biometrics, 1131–1143. [348]Google Scholar

Proschan, M. A. and Hunsberger, S. A. (1995), “Designed extension of studies based on conditional power,” Biometrics, 51, 1315–1324. [352, 353, 355, 356]Google Scholar

Proschan, M. A., Lan, K. G., and Wittes, J. T. (2006), Statistical Monitoring of Clinical Trials: A Unified Approach, New York: Springer. [346, 347, 348, 349, 351, 356, 357]Google Scholar

Proschan, M. A., McMahon, R. P., Shih, J. H., et al. (2001), “Sensitivity analysis using an imputation method for missing binary data in clinical trials,” Journal of Statistical Planning and Inference, 96, 155–165. [330]Google Scholar

Reiczigel, J., Földi, J., and Ózsvári, L. (2010), “Exact confidence limits for prevalence of a disease with an imperfect diagnostic test,” Epidemiology and Infection, 138, 1674–1678. [64, 65]Google Scholar

Robins, J., Breslow, N., and Greenland, S. (1986), “Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models,” Biometrics, 42, 311–323. [225, 233]Google Scholar

Röhmel, J. (2005), “Problems with existing procedures to calculate exact unconditional p-values for non-inferiority/superiority and confidence intervals for two binomials and how to resolve them,” Biometrical Journal, 47, 37–47. [16]CrossRef Google Scholar

Röhmel, J. and Mansmann, U. (1999), “Unconditional non-asymptotic one-sided tests for independent binomial proportions when the interest lies in showing non-inferiority and/or superiority,” Biometrical Journal, 41, 149–170. [113]Google Scholar

Rosenbaum, P. R. (2002), Observational Studies, 2nd ed., New York: Springer. [47, 292]Google Scholar

Rosenbaum, P. R.. (2010), Design of Observational Studies, New York: Springer. [47, 298, 300]Google Scholar

Rosendaal, F. R. (2020), “Review of: ‘hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial Gautret et al 2010’,” International Journal of Antimicrobial Agents, 56, 106063. [2]Google Scholar

Rothmann, M. D., Wiens, B. L., and Chan, I. S. (2012), Design and Analysis of Non-Inferiority Trials,Boca Raton, FL: Chapman and Hall/CRC. [370, 373]Google Scholar

Rubin, D. (2006), Matched Sampling for Casual Effects, New York: Cambridge University Press. [47]Google Scholar

Rubin, D. B. (1997), “Estimating causal effects from large data sets using propensity scores,” Annals of Internal Medicine, 127, 757–763. [38]Google Scholar

Rubin, D. B. (1984), “Bayesianly justifiable and relevant frequency calculations for the applied statistician,” The Annals of Statistics, 12, 1151–1172. [402]Google Scholar

Sadoff, J., Gray, G., Vandebosch, A., et al. (2021), “Safety and efficacy of single-dose Ad26.COV2.S vaccine against Covid-19,” New England Journal of Medicine, 384, 2187–2201. [120]Google Scholar

Sagara, I., Ellis, R. D., Dicko, A., et al. (2009), “A randomized and controlled Phase 1 study of the safety and immunogenicity of the AMA1-C1/Alhydrogel R○+ CPG 7909 vaccine for Plasmodium falciparum malaria in semi-immune Malian adults,” Vaccine, 27, 7292–7298. [91, 92]Google Scholar

Samara, B. and Randles, R. H. (1988), “A test for correlation based on kendallfs tau,” Communications in Statistics – Theory and Methods, 17, 3191–3205. [97]Google Scholar

Samuelsen, S. O. (2003), “Exact inference in the proportional hazard model: possibilities and limitations,” Lifetime Data Analysis, 9, 239–260. [315]Google Scholar

Sarkar, S. K. and Chang, C.-K. (1997), “The Simes method for multiple hypothesis testing with positively dependent test statistics,” Journal of the American Statistical Association, 92, 1601–1608. [250]Google Scholar

Schenker, N. and Gentleman, J. F. (2001), “On judging the significance of differences by examining the overlap between confidence intervals,” The American Statistician, 55, 182–186. [194]Google Scholar

Schilling, M. and Doi, J. (2014), “A coverage probability approach to finding an optimal binomial confidence procedure,” American Statistician, 68, 133–145. [63]Google Scholar

Schoenfeld, D. (1981), “The asymptotic properties of nonparametric tests for comparing survival distributions,” Biometrika, 68, 316–319. [382]Google Scholar

Schouten, H. J. (1999), “Sample size formula with a continuous outcome for unequal group sizes and unequal variances,” Statistics in Medicine, 18, 87–91. [381]Google Scholar

Schweder, T. and Hjort, N. L. (2016), Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions, New York: Cambridge University Press. [190]Google Scholar

Seaman, S. R. and Vansteelandt, S. (2018), “Introduction to double robust methods for incomplete data,” Statistical Science, 33, 184–197. [338, 340]Google Scholar

Seber, G. A. (1984), Multivariate Observations, Hoboken, NJ: John Wiley & Sons. [192]Google Scholar

Self, S. G. and Liang, K.-Y. (1987), “Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions,” Journal of the American Statistical Association, 82, 605–610. [170, 273]Google Scholar

Sen, B. and Banerjee, M. (2007), “A pseudolikelihood method for analyzing interval censored data,” Biometrika, 94, 71–86. [320]Google Scholar

Sen, B. and Xu, G. (2015), “Model based bootstrap methods for interval censored data,” Computational Statistics & Data Analysis, 81, 121–129. [320]Google Scholar

Sen, P. (1985), “Permutational Central Limit Theorems,” in Encyclopedia of Statistics, eds.Kotz, S. and , Johnson N. L., Hoboken, NJ: Wiley, vol. 6, pp. 683–687. [192]Google Scholar

Serfling, R. and Mazumder, S. (2009), “Exponential probability inequality and convergence results for the median absolute deviation and its modifications,” Statistics & Probability Letters, 79, 1767–1773. [80]Google Scholar

Shao, J. and Tu, D. (1995), The Jackknife and Bootstrap, New York: Springer. [184]Google Scholar

Shapiro, S. S. and Wilk, M. B. (1965), “An analysis of variance test for normality (complete samples),” Biometrika, 52, 591–611. [374]Google Scholar

Shaw, P. A. (2018), “Use of composite outcomes to assess risk–benefit in clinical trials,” Clinical Trials, 15, 352–358. [327]Google Scholar

Simmons, J. P., Nelson, L. D., and Simonsohn, U. (2011), “False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant,” Psychological Science, 22, 1359–1366. [238]Google Scholar

Singal, A. G., Higgins, P. D., and Waljee, A. K. (2014), “A primer on effectiveness and efficacy trials,” Clinical and Translational Gastroenterology, 5, e45. [371]Google Scholar

Singh, B., Ryan, H., Kredo, T., Chaplin, M., and Fletcher, T. (2021), “Chloroquine or hydroxychloroquine for prevention and treatment of COVID-19,” The Cochrane Database of Systematic Reviews, 2, CD013587. [3]Google Scholar

Skou, S. T., Roos, E. M., Laursen, M. B., et al. (2015), “A randomized, controlled trial of total knee replacement,” New England Journal of Medicine, 373, 1597–1606. [237]Google Scholar

Snee, R. D. (1974), “Graphical display of two-way contingency tables,” The American Statistician, 28, 9–12. [198]Google Scholar

Sommer, A. and Zeger, S. L. (1991), “On estimating efficacy from clinical trials,” Statistics in Medicine, 10, 45–52. [286, 288]Google Scholar

Steering Committee for PHS. (1989), “Final report on the aspirin component of the ongoing Physicians’ Health Study,” New England Journal of Medicine, 321, 129–135. [30]Google Scholar

Sterne, T. E. (1954), “Some remarks on confidence or fiducial limits,” Biometrika, 41, 1–2, 275–278. [55, 63, 64]Google Scholar

Strassburger, K. and Bretz, F. (2008), “Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests,” Statistics in Medicine, 27, 4914–4927. [240]Google Scholar

Stuart, E. A. (2010), “Matching methods for causal inference: A review and a look forward,” Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 25, 1. [291]Google Scholar

Tamhane, A. C. and Gou, J. (2017), “Advances in p-value based multiple test procedures,” Journal of Biopharmaceutical Statistics, 1–18. [250]Google Scholar

Tan, W. (1982), “Sampling distributions and robustness of t, F and variance-ratio in two samples and ANOVA models with respect to departure from normality,” Communications in Statistics – Theory and Methods, 11, 2485–2511. [201]Google Scholar

Tang, R., Banerjee, M., Kosorok, M. R., et al. (2012), “Likelihood based inference for current status data on a grid: A boundary phenomenon and an adaptive inference procedure,” The Annals of Statistics, 40, 45–72. [320]Google Scholar

Tarone, R. E. and Gart, J. J. (1980), “On the robustness of combined tests for trends in proportions,” Journal of the American Statistical Association, 75, 110–116. [203]Google Scholar

Tchetgen Tchetgen, E. J. and VanderWeele, T. J. (2012), “On causal inference in the presence of interference,” Statistical Methods in Medical Research, 21, 55–75. [289]Google Scholar

Thangavelu, K. and Brunner, E. (2007), “Wilcoxon–Mann–Whitney test for stratified samples and Efron’s paradox dice,” Journal of Statistical Planning and Inference, 137, 720–737. [210]Google Scholar

Thas, O., Neve, J. D., Clement, L., and Ottoy, J.-P. (2012), “Probabilistic index models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74, 623–671. [130, 261]Google Scholar

The Open Science Collaboration. (2015), “Estimating the reproducibility of psychological science,” Science, 349, aac4716. [3]Google Scholar

Therneau, T. (2015), A Package for Survival Analysis in S, r package version 2.38. https://CRAN.R-project.org/package=survival [315]Google Scholar

Therneau, T. M. and Grambsch, P. M. (2000), Modeling Survival Data: Extending the Cox Model, New York: Springer. [308, 323]Google Scholar

Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288. [267]Google Scholar

Tibshirani, R.. (2011), “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 273–282. [273]Google Scholar

Tsiatis, A. (2006), Semiparametric Theory and Missing Data, New York: Springer. [340]Google Scholar

Tsiatis, A. A., Davidian, M., Zhang, M., and Lu, X. (2008), “Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach,” Statistics in Medicine, 27, 4658–4677. [289]Google Scholar

Væth, M. (1985), “On the use of Wald’s test in exponential families,” International Statistical Review, 53, 199–214. [172]Google Scholar

Vandal, A. C., Gentleman, R., and Liu, X. (2005), “Constrained estimation and likelihood intervals for censored data,” Canadian Journal of Statistics, 33, 71–83. [320]Google Scholar

VanderWeele, T. (2015), Explanation in Causal Inference: Methods for Mediation and Interaction, New York: Oxford University Press. [296]Google Scholar

Veronese, P. and Melilli, E. (2015), “Fiducial and confidence distributions for real exponential families,” Scandinavian Journal of Statistics, 42, 471–484. [399, 402]Google Scholar

Vonesh, E. and Chinchilli, V. M. (1997), Linear and Nonlinear Models for the Analysis of Repeated Measurements, New York: Marcel Dekker. [190]Google Scholar

Vos, P. and Hudson, S. (2008), “Problems with binomial two-sided tests and the associated confidence intervals,” Australian and New Zealand Journal of Statistics, 50, 81–89. [56]Google Scholar

Wacholder, S., McLaughlin, J. K., Silverman, D. T., and Mandel, J. S. (1992), “Selection of controls in case-control studies: I. Principles,” American Journal of Epidemiology, 135, 1019–1028. [122]Google Scholar

Wald, A. (1947), Sequential Analysis, New York: Dover. [343, 356]Google Scholar

Wang, R., Lagakos, S., and Gray, R. (2010), “Testing and interval estimation for two-sample survival comparisons with small sample sizes and unequal censoring,” Biostatistics, 11, 676–692. [122]Google Scholar

Wang, W. (2010), “On construction of the smallest one-sided confidence interval for the difference of two proportions,” The Annals of Statistics, 38, 1227–1243. [114]Google Scholar

Wang, W. and Shan, G. (2015), “Exact confidence intervals for the relative risk and the odds ratio,” Biometrics, 71, 985–995. [114]Google Scholar

Wasserstein, R. L. and Lazar, N. A. (2016), “The ASA’s statement on p-values: context, process, and purpose,” The American Statistician, 70, 129–133. [xi]Google Scholar

Wasserstein, R. L., Schirm, A. L., and Lazar, N. A. (2019), “Moving to a World Beyond ‘p <0.05’,” The American Statistician, 73, 1–19. [xi, 6]Google Scholar

Webster, W., Walsh, D., McEwen, S. E., and Lipson, A. (1983), “Some teratogenic properties of ethanol and acetaldehyde in C57BL/6J mice: implications for the study of the fetal alcohol syndrome,” Teratology, 27, 231–243. [215]Google Scholar

Welch, B. and Peers, H. (1963), “On formulae for confidence points based on integrals of weighted likelihoods,” Journal of the Royal Statistical Society: Series B (Methodological), 25, 318–329. [402]Google Scholar

Westfall, P. H. (1997), “Multiple testing of general contrasts using logical constraints and correlations,” Journal of the American Statistical Association, 92, 299–306. [248]Google Scholar

Westfall, P. H., Tobias, R. D., Rom, D., Wolfinger, R. D., and Hochberg, Y. (1999), Multiple Comparisons and Multiple Tests Using the SAS System, Cary, NC: SAS Institute. [206]Google Scholar

Westfall, P. H. and Troendle, J. F. (2008), “Multiple testing with minimal assumptions,” Biometrical Journal, 50, 745–755. [245]Google Scholar

Westfall, P. H. and Young, S. S. (1993), Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment, New York: John Wiley & Sons. [245, 250]Google Scholar

Whidden, C., Treleaven, E., Liu, J., et al. (2019), “Proactive community case management and child survival: protocol for a cluster randomised controlled trial,” BMJ Open, 9, e027487. [379]Google Scholar

Wilcoxon, F. (1945), “Individual comparisons by ranking methods,” Biometrics Bulletin, 1, 80–83. [143]Google Scholar

Wittes, J. (2002), “Sample size calculations for randomized controlled trials,” Epidemiologic Reviews, 24, 39–53. [382, 386]Google Scholar

Wittes, J., Barrett-Connor, E., Braunwald, E., et al. (2007), “Monitoring the randomized trials of the Women’s Health Initiative: the experience of the Data and Safety Monitoring Board,” Clinical Trials, 4, 218–234. [25]Google Scholar

Wittes, J. and Brittain, E. (1990), “The role of internal pilot studies in increasing the efficiency of clinical trials,” Statistics in Medicine, 9, 65–72. [386]Google Scholar

Wu, C. J. (1985), “Efficient sequential designs with binary data,” Journal of the American Statistical Association, 80, 974–984. [387]Google Scholar

Xie, M.-g. and Singh, K. (2013), “Confidence distribution, the frequentist distribution estimator of a parameter: a review (with discussion),” International Statistical Review, 81, 3–77. [190]Google Scholar

Yan, X., Lee, S., and Li, N. (2009), “Missing data handling methods in medical device clinical trials,” Journal of Biopharmaceutical Statistics, 19, 1085–1098. [331]Google Scholar

Yates, F. (1984), “Tests of significance for 2 × 2 contingency tables,” Journal of the Royal Statistical Society: Series A (General), 147, 426–463. [110, 111, 121]Google Scholar

Zeger, S. L., Liang, K.-Y., and Albert, P. S. (1988), “Models for longitudinal data: a generalized estimating equation approach,” Biometrics, 44, 1049–1060. [266, 275]Google Scholar

Zeileis, A. (2004), “Econometric computing with HC and HAC covariance matrix estimators,” Journal of Statistical Software, Articles, 11. [255, 273]Google Scholar

Zeileis, A., Kleiber, C., and Jackman, S. (2008), “Regression models for count data in R,” Journal of Statistical Software, 27, 1–25. [259]Google Scholar

Zhang, H., Lu, N., Feng, C., et al. (2011), “On fitting generalized linear mixed-effects models for binary responses using different statistical packages,” Statistics in Medicine, 30, 2562–2572. [218]Google Scholar

Book contents

References

Summary

Access options

Book purchase

Temporarily unavailable

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive