We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These run in both high- and low-dimensional environments and often have better results than traditional methods. Instead of a theoretical approach, it provides examples of how to apply these methods to real data using lithic and ceramic archaeological materials as case studies. A detailed explanation of how to process data in R (The R Project for Statistical Computing), as well as the respective code, are also provided in this Element.
The design of gas turbine combustors for optimal operation at different power ratings is a multifaceted engineering task, as it requires the consideration of several objectives that must be evaluated under different test conditions. We address this challenge by presenting a data-driven approach that uses multiple probabilistic surrogate models derived from Gaussian process regression to automatically select optimal combustor designs from a large parameter space, requiring only a few experimental data points. We present two strategies for surrogate model training that differ in terms of required experimental and computational efforts. Depending on the measurement time and cost for a target, one of the strategies may be preferred. We apply the methodology to train three surrogate models under operating conditions where the corresponding design objectives are critical: reduction of NOx emissions, prevention of lean flame extinction, and mitigation of thermoacoustic oscillations. Once trained, the models can be flexibly used for different forms of a posteriori design optimization, as we demonstrate in this study.
In this Element, the authors introduce Bayesian probability and inference for social science students and practitioners starting from the absolute beginning and walk readers steadily through the Element. No previous knowledge is required other than that in a basic statistics course. At the end of the process, readers will understand the core tenets of Bayesian theory and practice in a way that enables them to specify, implement, and understand models using practical social science data. Chapters will cover theoretical principles and real-world applications that provide motivation and intuition. Because Bayesian methods are intricately tied to software, code in both R and Python is provided throughout.
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These operate in both high- and low-dimensional environments and often have better results than traditional methods. The basic principles and main methods are introduced with recommendations for when to use them.
One of the best methods to investigate and calculate a desired quantity using available limited data is the Bayesian statistical method, which has been recently entered the field of nuclear astrophysics and can be used to evaluate the astrophysical S-factors, the cross sections and, as a result, the nuclear reaction rates of Big Bang Nucleosynthesis. This study tries to calculate the astrophysical S-factor and the rate of reaction T(d,n)4He as an important astrophysical reaction with the help of this method in energies lower that electron repulsive barrier, and for this purpose, it uses the R-Software, which leads to improved results in comparison with the non-Bayesian methods for the mentioned reaction rate.
Gaussian graphical models are useful tools for conditional independence structure inference of multivariate random variables. Unfortunately, Bayesian inference of latent graph structures is challenging due to exponential growth of $\mathcal{G}_n$, the set of all graphs in n vertices. One approach that has been proposed to tackle this problem is to limit search to subsets of $\mathcal{G}_n$. In this paper we study subsets that are vector subspaces with the cycle space $\mathcal{C}_n$ as the main example. We propose a novel prior on $\mathcal{C}_n$ based on linear combinations of cycle basis elements and present its theoretical properties. Using this prior, we implement a Markov chain Monte Carlo algorithm, and show that (i) posterior edge inclusion estimates computed with our technique are comparable to estimates from the standard technique despite searching a smaller graph space, and (ii) the vector space perspective enables straightforward implementation of MCMC algorithms.
Under mild assumptions, we show that the exact convergence rate in total variation is also exact in weaker Wasserstein distances for the Metropolis–Hastings independence sampler. We develop a new upper and lower bound on the worst-case Wasserstein distance when initialized from points. For an arbitrary point initialization, we show that the convergence rate is the same and matches the convergence rate in total variation. We derive exact convergence expressions for more general Wasserstein distances when initialization is at a specific point. Using optimization, we construct a novel centered independent proposal to develop exact convergence rates in Bayesian quantile regression and many generalized linear model settings. We show that the exact convergence rate can be upper bounded in Bayesian binary response regression (e.g. logistic and probit) when the sample size and dimension grow together.
The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
This new graduate textbook adopts a pedagogical approach to contemporary cosmology that enables readers to build an intuitive understanding of theory and data, and of how they interact, which is where the greatest advances in the field are currently being made. Using analogies, intuitive explanations of complex topics, worked examples and computational problems, the book begins with the physics of the early universe, and goes on to cover key concepts such as inflation, dark matter and dark energy, large‑scale structure, and cosmic microwave background. Computational and data analysis techniques, and statistics, are integrated throughout the text, particularly in the chapters on late-universe cosmology, while another chapter is entirely devoted to the basics of statistical methods. A solutions manual for end-of-chapter problems is available to instructors, and suggested syllabi, based on different course lengths and emphasis, can be found in the Preface. Online computer code and datasets enhance the student learning experience.
This chapter reviews statistics and data-analysis tools. Starting from basic statistical concepts such as mean, variance, and the Gaussian distribution, we introduce the principal tools required for data analysis. We discuss both Bayesian and frequentist statistical approaches, with emphasis on the former. This leads us to describe how to calculate the goodness of fit of data to theory, and how to constrain the parameters of a model. Finally, we introduce and explain, both intuitively and mathematically, two important statistical tools: Markov chain Monte Carlo (MCMC) and the Fisher information matrix.
The Bayesian approach is a way to interpret a study within the context of the entire literature. It is an important method to use so that a single study isn’t overestimated. It also allows for input of clinical experience.
Standard methods for measuring latent traits from categorical data assume that response functions are monotonic. This assumption is violated when individuals from both extremes respond identically, but for conflicting reasons. Two survey respondents may “disagree” with a statement for opposing motivations, liberal and conservative justices may dissent from the same Supreme Court decision but provide ideologically contradictory rationales, and in legislative settings, ideological opposites may join together to oppose moderate legislation in pursuit of antithetical goals. In this article, we introduce a scaling model that accommodates ends against the middle responses and provide a novel estimation approach that improves upon existing routines. We apply this method to survey data, voting data from the U.S. Supreme Court, and the 116th Congress, and show that it outperforms standard methods in terms of both congruence with qualitative insights and model fit. This suggests that our proposed method may offer improved one-dimensional estimates of latent traits in many important settings.
A link is made between epistemology – that is to say, the philosophy of knowledge – and statistics. Hume's criticism of induction is covered, as is Popper's. Various philosophies of statistics are described.
Archaeologists frequently use probability distributions and null hypothesis significance testing (NHST) to assess how well survey, excavation, or experimental data align with their hypotheses about the past. Bayesian inference is increasingly used as an alternative to NHST and, in archaeology, is most commonly applied to radiocarbon date estimation and chronology building. This article demonstrates that Bayesian statistics has broader applications. It begins by contrasting NHST and Bayesian statistical frameworks, before introducing and applying Bayes's theorem. In order to guide the reader through an elementary step-by-step Bayesian analysis, this article uses a fictional archaeological faunal assemblage from a single site. The fictional example is then expanded to demonstrate how Bayesian analyses can be applied to data with a range of properties, formally incorporating expert prior knowledge into the hypothesis evaluation process.
I propose a new model, ordered Beta regression, for continuous distributions with both lower and upper bounds, such as data arising from survey slider scales, visual analog scales, and dose–response relationships. This model employs the cut point technique popularized by ordered logit to fit a single linear model to both continuous (0,1) and degenerate [0,1] responses. The model can be estimated with or without observations at the bounds, and as such is a general solution for these types of data. Employing a Monte Carlo simulation, I show that the model is noticeably more efficient than ordinary least squares regression, zero-and-one-inflated Beta regression, rescaled Beta regression, and fractional logit while fully capturing nuances in the outcome. I apply the model to a replication of the Aidt and Jensen (2014, European Economic Review 72, 52–75) study of suffrage extensions in Europe. The model can be fit with the R package ordbetareg to facilitate hierarchical, dynamic, and multivariate modeling.
In this appendix, we review the major concepts, notation, and results from probability and statistics that are used in this book. We start with univariate random variables, their distributions, moments, and quantiles. We consider dependent random variables through conditional probabilities and joint density and distribution functions. We review some of the distributions that are most important in the text, including the normal, lognormal, Pareto, uniform, binomial, and Poisson distributions. We outline the maximum likelihood (ML) estimation process, and summarize key properties of ML estimators. We review Bayesian statistics, including the prior, posterior, and predictive distributions. We discuss Monte Carlo simulation, with a particular focus on estimation and uncertainty.
Presentamos una estrategia metodológica para la ubicación temporal de estilos alfareros que combina el análisis contextual y el tratamiento estadístico del conjunto de datos radiocarbónicos disponible. El análisis abordó tres grupos de alfarerías con una amplia expresión regional e importantes afinidades estilísticas: San José, Hualfín y Molinos, representativos de los inicios del período Intermedio Tardío en la región valliserrana del Noroeste argentino. Se trabajó con una muestra exhaustiva de 28 fechados, evaluando diferentes grados de certeza en la asociación muestra-evento, y discriminando la asociación entre los eventos datados y los conjuntos cerámicos. Este tratamiento de los fechados permitió jerarquizar aquellos más confiables y seleccionar una submuestra cualitativamente superior. Los datos depurados se emplearon para evaluar la hipótesis sobre la cronología tardía de los estilos mediante modelación estadística bayesiana. Como resultado se pudieron detectar datos anómalos y ubicar el lapso aproximado de la producción de los conjuntos estilísticos entre los siglos once y catorce dC, momentos que se asocian a desarrollos locales preincaicos. Asimismo, se establecieron tendencias diferenciales entre los tres grupos estilísticos considerados, observando antigüedades mayores para Hualfín y Molinos.
This paper proposes a Bayesian alternative to the synthetic control method for comparative case studies with a single or multiple treated units. We adopt a Bayesian posterior predictive approach to Rubin’s causal model, which allows researchers to make inferences about both individual and average treatment effects on treated observations based on the empirical posterior distributions of their counterfactuals. The prediction model we develop is a dynamic multilevel model with a latent factor term to correct biases induced by unit-specific time trends. It also considers heterogeneous and dynamic relationships between covariates and the outcome, thus improving precision of the causal estimates. To reduce model dependency, we adopt a Bayesian shrinkage method for model searching and factor selection. Monte Carlo exercises demonstrate that our method produces more precise causal estimates than existing approaches and achieves correct frequentist coverage rates even when sample sizes are small and rich heterogeneities are present in data. We illustrate the method with two empirical examples from political economy.