A practical and innovative textbook detailing how to build real-world software products with machine learning components, not just models.
A practical and innovative textbook detailing how to build real-world software products with machine learning components, not just models.
Traditional machine learning texts focus on how to train and evaluate the machine learning model, while MLOps books focus on how to streamline model development and deployment. But neither focus on how to build actual products that deliver value to users. This practical textbook, by contrast, details how to responsibly build products with machine learning components, covering the entire development lifecycle from requirements and design to quality assurance and operations. Machine Learning in Production brings an engineering mindset to the challenge of building systems that are usable, reliable, scalable, and safe within the context of real-world conditions of uncertainty, incomplete information, and resource constraints. Based on the author's popular class at Carnegie Mellon, this pioneering book integrates foundational knowledge in software engineering and machine learning to provide the holistic view needed to create not only prototype models but production-ready systems.
.
Integrates coverage of cutting-edge research, existing tools, and real-world applications .
Provides students and professionals with an engineering view for production-ready machine learning systems .
Proven in the classroom .
Offers supplemental resources including slides, videos, exams, and further readings
By:
Christian Kastner
Imprint: MIT Press
Country of Publication: United States
Dimensions:
Height: 229mm,
Width: 178mm,
Weight: 567g
ISBN: 9780262049726
ISBN 10: 0262049724
Pages: 624
Publication Date: 06 May 2025
Audience:
General/trade
,
ELT Advanced
Format: Hardback
Publisher's Status: Active
I SETTING THE STAGE 2 1 Introduction 3 1.1 Motivating Example: An Automated Transcription Startup 4 1.2 Data Scientists and Software Engineers 6 1.3 Machine-Learning Challenges in Software Projects 8 1.4 A Foundation for MLOps and Responsible Engineering 13 1.5 Summary 15 1.6 Further Readings 16 2 From Models to Systems 19 2.1 ML and Non-ML Components in a System 19 2.2 Beyond the Model 24 2.3 On Terminology 29 2.4 Summary 30 2.5 Further Readings 30 3 Machine Learning for Software Engineers, in a Nutshell 33 3.1 Basic Terms: Machine Learning, Models, Predictions 33 3.2 Technical Concepts: Model Parameters, Hyperparameters, Model Storage 34 3.3 Machine Learning Pipelines 35 3.4 Foundation Models and Prompting 36 3.5 On Terminology 37 3.6 Summary 38 3.7 Further Readings 38 II REQUIREMENTS ENGINEERING 39 4 When to use Machine Learning 41 4.1 Problems that Benefit from Machine Learning 41 4.2 Tolerating Mistakes and ML Risk 42 4.3 Continuous Learning 43 4.4 Costs and Benefits 43 4.5 The Business Case: Machine Learning as Predictions 44 4.6 Summary 45 4.7 Further Readings 45 5 Setting and Measuring Goals 47 5.1 Scenario: Self-help legal chatbot 47 5.2 Setting Goals 48 5.3 Measurement in a Nutshell 51 5.4 Summary 57 5.5 Further Readings 57 6 Gathering Requirements 59 6.1 Scenario: Fall Detection with a Smart Watch 60 6.2 Untangling Requirements 60 6.3 Eliciting Requirements 66 6.4 How Much Requirements Engineering and When? 71 6.5 Summary 72 6.6 Further Readings 73 7 Planning for Mistakes 75 7.1 Mistakes Will Happen 76 7.2 Designing for Failures 78 7.3 Hazard Analysis and Risk Analysis 84 7.4 Summary 89 7.5 Further Readings 90 III ARCHITECTURE AND DESIGN 92 8 Thinking like a Software Architect 93 8.1 Quality Requirements Drive Architecture Design 94 8.2 The Role of Abstraction 97 8.3 Common Architectural Design Challenges for ML-Enabled Systems 97 8.4 Codifying Design Knowledge 100 8.5 Summary 105 8.6 Further Readings 105 9 Quality Attributes of ML Components 109 9.1 Scenario: Detecting Credit Card Fraud 109 9.2 From System Quality to Model and Pipeline Quality 109 9.3 Common Quality Attributes 111 9.4 Constraints and Tradeoffs 115 9.5 Summary 117 9.6 Further Readings 118 10 Deploying a Model 119 10.1 Scenario: Augmented Reality Translation 119 10.2 Model Inference Function 120 10.3 Feature Encoding 120 10.4 Model Serving Infrastructure 123 10.5 Deployment Architecture Tradeoffs 126 10.6 Model Inference in a System 131 10.7 Documenting Model-Inference Interfaces 135 10.8 Summary 136 10.9 Further Readings 138 11 Automating the Pipeline 141 11.1 Scenario: Home Value Prediction 141 11.2 Supporting Evolution and Experimentation by Designing for Change 142 11.3 Pipeline Thinking 143 11.4 Stages of Machine-Learning Pipelines 144 11.5 Automation and Infrastructure Design 149 11.6 Summary 151 11.7 Further Readings 152 12 Scaling the System 155 12.1 Scenario: Google-Scale Photo Hosting and Search 155 12.2 Scaling by Distributing Work 156 12.3 Data Storage at Scale 157 12.4 Distributed Data Processing 166 12.5 Distributed Machine-Learning Algorithms 176 12.6 Performance Planning and Monitoring 178 12.7 Summary 178 12.8 Further Readings 179 13 Planning for Operations 181 13.1 Scenario: Blogging Platform with Spam Filter 182 13.2 Service Level Objectives 182 13.3 Observability 183 13.4 Automating Deployments 185 13.5 Infrastructure as Code and Virtualization 186 13.6 Orchestrating and Scaling Deployments 188 13.7 Elevating Data Engineering 189 13.8 Incident Response Planning 190 13.9 DevOps and MLOps Principles 191 13.10 DevOps and MLOps Tooling 192 13.11 Summary 195 13.12 Further Readings 195 IV QUALITY ASSURANCE 197 14 Quality Assurance Basics 199 14.1 Testing 200 14.2 Code Review 204 14.3 Static Analysis 205 14.4 Other Quality Assurance Approaches 206 14.5 Planning and Process Integration 207 14.6 Summary 209 14.7 Further Readings 209 15 Model Quality 211 15.1 Scenario: Cancer Prognosis 211 15.2 Defining Correctness and Fit 212 15.3 Measuring Prediction Accuracy 217 15.4 Model Evaluation Beyond Accuracy 231 15.5 Test Data Adequacy 244 15.6 Model Inspection 245 15.7 Summary 245 15.8 Further Readings 246 16 Data Quality 251 16.1 Scenario: Inventory Management 251 16.2 Data Quality Challenges 252 16.3 Data Quality Checks 255 16.4 Drift and Data Quality Monitoring 260 16.5 Data Quality is a System-Wide Concern 264 16.6 Summary 268 16.7 Further Readings 269 17 Pipeline Quality 273 17.1 Silent Mistakes in ML Pipelines 273 17.2 Code Review for ML Pipelines 274 17.3 Testing Pipeline Components 275 17.4 Static Analysis of ML Pipelines 284 17.5 Process Integration and Test Maturity 284 17.6 Summary 285 17.7 Further Readings 286 18 System Quality 287 18.1 Limits of Modular Reasoning 287 18.2 System Testing 289 18.3 Testing Component Interactions and Safeguards 291 18.4 Testing Operations (Deployment, Monitoring) 293 18.5 Summary 293 18.6 Further Readings 294 19 Testing and Experimenting in Production 295 19.1 A Brief History of Testing in Production 295 19.2 Scenario: Meeting Minutes for Video Calls 297 19.3 Measuring System Success in Production 297 19.4 Measuring Model Quality in Production 298 19.5 Designing and Implementing Quality Measures with Telemetry 302 19.6 Experimenting in Production 306 19.7 Summary 311 19.8 Further Readings 312 V PROCESS AND TEAMS 314 20 Data Science and Software Engineering Process Models 315 20.1 Data-Science Process 315 20.2 Software-Engineering Process 318 20.3 Tensions between Data Science and Software Engineering Processes 321 20.4 Integrated Processes for AI-Enabled Systems 323 20.5 Summary 327 20.6 Further Readings 327 21 Interdisciplinary Teams 329 21.1 Scenario: Fighting Depression on Social Media 329 21.2 Unicorns are not Enough 330 21.3 Conflicts Within and Between Teams are Common 331 21.4 Coordination Costs 332 21.5 Conflicting Goals and T-Shaped People 337 21.6 Groupthink 339 21.7 Team Structure and Allocating Experts 340 21.8 Learning from DevOps and MLOps Culture 342 21.9 Summary 345 21.10 Further Readings 346 22 Technical Debt 349 22.1 Scenario: Automated Delivery Robots 349 22.2 Deliberate and Prudent Technical Debt 349 22.3 Technical Debt in Machine Learning Projects 351 22.4 Managing Technical Debt 353 22.5 Summary 354 22.6 Further Readings 355 VI RESPONSIBLE ML ENGINEERING 356 23 Responsible Engineering 357 23.1 Legal and Ethical Responsibilities 357 23.2 Why Responsible Engineering Matters for ML-Enabled Systems 359 23.3 Facets of Responsible ML Engineering 362 23.4 Regulation is Coming 363 23.5 Summary 366 23.6 Further Readings 366 24 Versioning, Provenance, and Reproducibility 369 24.1 Scenario: Debugging a Loan Decision 370 24.2 Versioning 370 24.3 Data Provenance and Lineage 375 24.4 Reproducibility 378 24.5 Putting the Pieces Together 380 24.6 Summary 381 24.7 Further Readings 382 25 Explainability 385 25.1 Scenario: Proprietary Opaque Models for Recidivism Risk Assessment 385 25.2 Defining Explainability 386 25.3 Explaining a Model 389 25.4 Explaining a Prediction 392 25.5 Explaining Data and Training 397 25.6 The Dark Side of Explanations 397 25.7 Summary 398 25.8 Further Readings 398 26 Fairness 401 26.1 Scenario: Mortgage Applications 402 26.2 Fairness Concepts 403 26.3 Measuring and Improving Fairness at the Model Level 410 26.4 Fairness is a System-Wide Concern 416 26.5 Summary 428 26.6 Further Readings 429 27 Safety 433 27.1 Safety and Reliability 433 27.2 Improving Model Reliability 434 27.3 Building Safer Systems 438 27.4 The AI Alignment Problem 442 27.5 Summary 444 27.6 Further Readings 444 28 Security and Privacy 447 28.1 Scenario: Content Moderation 447 28.2 Security Requirements 448 28.3 Attacks and Defenses 449 28.4 ML-Specific Attacks 450 28.5 Threat Modeling 459 28.6 Designing for Security 462 28.7 Data Privacy 466 28.8 Summary 470 28.9 Further Readings 470 29 Transparency and Accountability 473 29.1 Transparency of the Model’s Existence 473 29.2 Transparency of How the Model Works 474 29.3 Human Oversight and Appeals 477 29.4 Accountability and Culpability 478 29.5 Summary 479 29.6 Further Readings 479
Christian K stner is associate professor of computer science at Carnegie Mellon University.