Benjamin S. Baumer is an assistant professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and won the 2016 Contemporary Baseball Analysis Award from the Society for American Baseball Research. Daniel T. Kaplan is the DeWitt Wallace professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing, and received the 2006 Macalester Excellence in Teaching award. Nicholas J. Horton is a professor of statistics at Amherst College. He is a Fellow of the American Statistical Association (ASA), member of the NRC Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in curricular reform to help students think with data.
Only about 60 of the book's 551 pages address the questions of uncertainty and inference that constitute the core of the statistics tradition. The remaining pages attend the other components of working with data-the import, wrangling, tidying, visualization, and storage-that are often the more prominent barriers to understanding modern datasets...Modern Data Science with R is a landmark: the first full textbook in data science. (It can serve) as the backbone of a semester-long course targeted at students with little background in statistics or computing. It is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics...By using the tidyverse, the textbook authors are able to seamlessly interweave a conceptual framework for data science with the corresponding implementation in R code....Even though this book is heavily dependent on R, readers come away with a more general natural language with which to talk and think about data. Indeed, if R were to cease to exist tomorrow, these readers would still be well-situated to be data scientists. In a nutshell, that approach is what makes this such a successful textbook. ~The American Statistician Baumer, Kaplan, and Horton have managed to write a book that will serve a huge variety of educators while being endlessly interesting and useful to students of a modern era. Modern Data Science in R is a compilation of ideas from both ends of the data science and statistics spectrum-tools for setting up databases and working with regular expressions are intermixed with fundamentals like regression analysis. Additionally, the authors pull together fantastic examples from the scientific community as well as the media at large. Their examples will engage today's students into understanding why data wrangling, reproducibility, and ethics are a fundamental part of any data analysis. Good visualization skills (Tukey) and ethical analyses (Hoff, How to Lie with Statistics ) are not new ideas. However, they have recently been lost in the drive for more sophisticated mathematical and computational methods for working with data. Baumer et al. modernize the need for good visualization and communication in ways that will resonate with today's practitioners. Like Wickham's ggplot2 and The Elements of Statistical Learning by Hastie et al., Modern Data Science in R promises to be a staple on every data analyst's bookshelf. Accessible to students and a valuable resource for those who have been in the field for many years, this book promises to be a treasure you will want to discover. ~ Jo Hardin, Pomona College This book would be an excellent text book for an introductory data science course. Many academic institutions are now trying to open data science programs. But, there is not a good text book available for data science courses. ~ Mahbubul Majumder, U. of Nebraska Omaha The book is unique. It is an encyclopedia of Data Science, and it covers a wide variety of modern topics; another positive aspect is that it contains lots of examples and code, and the layout is quite catchy. One can learn (and teach) subjects as diverse as: How to give talks, administrating databases, how to model spatial data, and even ethics---all in one book. ~ Miguel de Carvalho, The University of Edinburgh It would undoubtedly be useful to many postgraduate students of applied statistics. The handbook style will also be of use to statisticians who want to keep up to date in this area. In particular the book utilizes functions from many different R packages, and will be helpful for data analysts to keep their R skills up to date. Although one of the appendices covers an introduction to R (R Core Team 2017) and RStudio (RStudio Team 2017), realistically it is expected that the reader has some experience with R. Existing R users with no experience of RStudio might find the appendix useful, but RStudio is not required to work through this book. Overall the book is well written, well structured and the general writing style is both objective and entertaining . . . The book is divided into three major parts, Introduction to Data Science, Statistics and Modeling, and Topics in Data Science, followed by six appendices . . . In conclusion, I recommend this book as a course companion to a master's level course in data analysis and to statisticians who want to keep their skills in the field of data science up to date. ~ Tim Downie, Journal of Statistical Software Modern Data Science with R is different . . .as it presents an abundance of R codes, functions and packages clearly with several useful examples. For people with a statistical background, the book covers computational topics like simulation and also includes appropriate computer science topics such as Data Wrangling, Database Querying using SQL and Text as Data. The book is well-structured and is presented in an easy-to-understand manner, making it suitable for a wide range of readers. . . This book is unique because it incorporates theoretical fundamentals such as statistical learning and regression modelling with the modern, practical elements of data science, including setting up databases and debugging . . . This book is a valuable resource to all those studying and interested in data science. ~Shuangzhe Liu, University of Canberra