Model-Based Clustering and Classification for Data Science: With Applications in R

Charles Bouveyron; Gilles Celeux; T. Brendan Murphy; Adrian E. Raftery

doi:10.1017/9781108644181

Model-Based Clustering and Classification for Data Science

With Applications in R

- Get access
  
  Buy a print copy
  
  Check if you have access via personal or institutional login
  
  Log in Register
Cited by 158
Cited by
- 158
Crossref Citations

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Harada, Daisuke Asanoi, Hidetsugu Noto, Takahisa and Takagawa, Junya 2020. Different Pathophysiology and Outcomes of Heart Failure With Preserved Ejection Fraction Stratified by K-Means Clustering. Frontiers in Cardiovascular Medicine, Vol. 7, Issue. ,

CrossRef

Google Scholar

Pradana, I Gusti Made Teddy and Djatna, Taufik 2020. A Design of Traceability System in Coffee Supply Chain based on Hierarchical Cluster Analysis Approach. p. 1.

CrossRef

Google Scholar

Lazic, Stanley E and Williams, Dominic P 2020. Improving drug safety predictions by reducing poor analytical practices. Toxicology Research and Application, Vol. 4, Issue. ,

CrossRef

Google Scholar

Zagalo, Kevin Cucu-Grosjean, Liliana and Bar-Hen, Avner 2020. Identification of execution modes for real-time systems using cluster analysis. p. 1.

CrossRef

Google Scholar

Véstias, Mário P. 2020. Smart Systems Design, Applications, and Challenges. p. 23.

CrossRef

Google Scholar

Zhang, Wanli and Di, Yanming 2020. Model-Based Clustering with Measurement or Estimation Errors. Genes, Vol. 11, Issue. 2, p. 185.

CrossRef

Google Scholar

Jiang, Liupeng Jiang, He and Wang, Harry Haoxiang 2020. Soft computing model using cluster-PCA in port model for throughput forecasting. Soft Computing, Vol. 24, Issue. 18, p. 14167.

CrossRef

Google Scholar

Giordani, Paolo Ferraro, Maria Brigida and Martella, Francesca 2020. An Introduction to Clustering with R. Vol. 1, Issue. , p. 291.

CrossRef

Google Scholar

Lian, Qiuyu Xin, Hongyi Ma, Jianzhu Konnikova, Liza Chen, Wei Gu, Jin and Chen, Kong 2020. Artificial-cell-type aware cell-type classification in CITE-seq. Bioinformatics, Vol. 36, Issue. Supplement_1, p. i542.

CrossRef

Google Scholar

Waggoner, Philip D. 2020. Unsupervised Machine Learning for Clustering in Political and Social Research.

CrossRef

Google Scholar

Kalmin, O. V. and Kalmin, O. O. 2020. Mathematical Modeling of Morphometric Parameters of Thyroid Gland Structure. p. 1.

CrossRef

Google Scholar

Giordani, Paolo Ferraro, Maria Brigida and Martella, Francesca 2020. An Introduction to Clustering with R. Vol. 1, Issue. , p. 215.

CrossRef

Google Scholar

Schmutz, Amandine Jacques, Julien Bouveyron, Charles Chèze, Laurence and Martin, Pauline 2020. Clustering multivariate functional data in group-specific functional subspaces. Computational Statistics, Vol. 35, Issue. 3, p. 1101.

CrossRef

Google Scholar

Araújo, Ramon C. F. de Oliveira, Rodrigo M. S. Brasil, Fernando S. and Barros, Fabrício J. B. 2021. Novel Features and PRPD Image Denoising Method for Improved Single-Source Partial Discharges Classification in On-Line Hydro-Generators. Energies, Vol. 14, Issue. 11, p. 3267.

CrossRef

Google Scholar

Lin, Lin and Hejblum, Boris P 2021. Bayesian mixture models for cytometry data analysis. WIREs Computational Statistics, Vol. 13, Issue. 4,

CrossRef

Google Scholar

Fraix-Burnet, D. Bouveyron, C. and Moultaka, J. 2021. Unsupervised classification of SDSS galaxy spectra. Astronomy & Astrophysics, Vol. 649, Issue. , p. A53.

CrossRef

Google Scholar

王, 琳 2021. Time Series Clustering with MS-GARCH Mixtures. Statistics and Application, Vol. 10, Issue. 06, p. 1071.

CrossRef

Google Scholar

Jouvin, Nicolas Bouveyron, Charles and Latouche, Pierre 2021. A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering. Statistics and Computing, Vol. 31, Issue. 4,

CrossRef

Google Scholar

Hu, Zhengbing and Tyshchenko, Oleksii K. 2021. Advances in Computer Science for Engineering and Education III. Vol. 1247, Issue. , p. 419.

CrossRef

Google Scholar

Amine Atoui, M. and Cocquempot, Vincent 2021. Open set diagnosis: high-dimensional clustering. p. 1046.

CrossRef

Google Scholar

Download full list

Charles Bouveyron, Université Côte d’Azur, Gilles Celeux, Inria Saclay Île-de-France, T. Brendan Murphy, University College Dublin, Adrian E. Raftery, University of Washington

Publisher:: Cambridge University Press
Online publication date:: June 2019
Print publication year:: 2019
Online ISBN:: 9781108644181
DOI:: https://doi.org/10.1017/9781108644181

Subjects:: Computer Science, Pattern Recognition and Machine Learning, General Statistics and Probability, Statistical Theory and Methods, Statistics and Probability
Series:: Cambridge Series in Statistical and Probabilistic Mathematics (50)

69.99 (GBP)

Digital access for individuals
(PDF download and/or read online)
Add to cart

Added to cart

Digital access for individuals
(PDF download and/or read online)
View cart
Export citation
Buy a print copy

Information

Contents

Metrics

Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.

'Bouveyron, Celeux, Murphy, and Raftery pioneered the theory, computation, and application of modern model-based clustering and discriminant analysis. Here they have produced an exhaustive yet accessible text, covering both the field's state of the art as well as its intellectual development. The authors develop a unified vision of cluster analysis, rooted in the theory and computation of mixture models. Embedded R code points the way for applied readers, while graphical displays develop intuition about both model construction and the critical but often-neglected estimation process. Building on a series of running examples, the authors gradually and methodically extend their core insights into a variety of exciting data structures, including networks and functional data. This text will serve as a backbone for graduate study as well as an important reference for applied data scientists interested in working with cutting-edge tools in semi- and unsupervised machine learning.'

John S. Ahlquist - University of California, San Diego

'This book, written by authoritative experts in the field, gives a comprehensive and thorough introduction to model-based clustering and classification. The authors not only explain the statistical theory and methods, but also provide hands-on applications illustrating their use with the open-source statistical software R. The book also covers recent advances made for specific data structures (e.g. network data) or modeling strategies (e.g. variable selection techniques), making it a fantastic resource as an overview of the state of the field today.'

Bettina Grün - Johannes Kepler Universität Linz, Austria

'Four authors with diverse strengths nicely integrate their specialties to illustrate how clustering and classification methods are implemented in a wide selection of real-world applications. Their inclusion of how to use available software is an added benefit for students. The book covers foundations, challenging aspects, and some essential details of applications of clustering and classification. It is a fun and informative read!'

Naisyin Wang - University of Michigan

'This is a beautifully written book on a topic of fundamental importance in modern statistical science, by some of the leading researchers in the field. It is particularly effective in being an applied presentation - the reader will learn how to work with real data and at the same time clearly presenting the underlying statistical thinking. Fundamental statistical issues like model and variable selection are clearly covered as well as crucial issues in applied work such as outliers and ordinal data. The R code and graphics are particularly effective. The R code is there so you know how to do things, but it is presented in a way that does not disrupt the underlying narrative. This is not easy to do. The graphics are 'sophisticatedly simple' in that they convey complex messages without being too complex. For me, this is a 'must have' book.'

Rob McCulloch - Arizona State University

'This advanced text explains the underlying concepts clearly and is strong on theory … I congratulate the authors on the theoretical aspects of their book, it’s a fine achievement.'

Antony Unwin Source: International Statistical Review

‘In my opinion, the overall quality of this impactful and intriguing book can be expressed by concluding that it is a perfect fit to the Cambridge Series in Statistical and Probabilistic Mathematics, characterized as a series of high-quality upper-division textbooks and expository monographs containing applications and discussions of new techniques while emphasizing rigorous treatment of theoretical methods.’

Zdenek Hlavka Source: MathSciNet

‘… this book not only gives the big picture of the analysis of clustering and classification but also explains recent methodological advances. Extensive real-world data examples and R code for many methods are also well summarized. This book is highly recommended to students in data science, as well as researchers and data analysts.’

Li-Pang Chen Source: Biometrical Journal

‘Model-Based Clustering and Classification for Data Science: With Applications in R, written by leading statisticians in the field, provides academics and practitioners with a solid theoretical and practical foundation on the use of model-based clustering methods … this book will serve as an excellent resource for quantitative practitioners and theoreticians seeking to learn the current state of the field.’

C. M. Foley Source: Quarterly Review of Biology

‘This book frames cluster analysis and classiﬁcation in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions … Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.’

Hans-Jürgen Schmidt Source: zbMATH

Metrics

Altmetric attention score

Total number of HTML views: 0

Total number of PDF views: 0 *

Loading metrics...

Total views: 0 *

Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.

Model-Based Clustering and Classification for Data Science

With Applications in R

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

Reviews

Refine List

Actions for selected content:

Contents

Frontmatter
pp i-iv

Dedication
pp v-vi

Contents
pp vii-ix

Contents
pp x-xiv

Preface
pp xv-xviii

1 - Introduction
pp 1-14

2 - Model-based Clustering: Basic Ideas
pp 15-78

3 - Dealing with Difficulties
pp 79-108

4 - Model-based Classification
pp 109-133

5 - Semi-supervised Clustering and Classification
pp 134-162

6 - Discrete Data Clustering
pp 163-198

7 - Variable Selection
pp 199-216

8 - High-dimensional Data
pp 217-258

9 - Non-Gaussian Model-based Clustering
pp 259-291

10 - Network Data
pp 292-330

11 - Model-based Clustering with Covariates
pp 331-350

12 - Other Topics
pp 351-383

List of R Packages
pp 384-385

Bibliography
pp 386-414

Author Index
pp 415-422

Subject Index
pp 423-427

Metrics

Altmetric attention score

Full text views

Book summary page views

Model-Based Clustering and Classification for Data Science

With Applications in R

Book description

Reviews

Refine List

Actions for selected content:

Save Search

Contents

Metrics

Altmetric attention score

Full text views

Book summary page views