LOW FLAT RATE AUST-WIDE $9.90 DELIVERY INFO

Close Notification

Your cart does not contain any items

$165

Paperback

Not in-store but you can order this
How long will it take?

QTY:

English
MIT Press
05 December 2023
A description of perturbation-based methods developed in machine learning to augment novel optimization methods with strong statistical guarantees.

A description of perturbation-based methods developed in machine learning to augment novel optimization methods with strong statistical guarantees.

In nearly all machine learning, decisions must be made given current knowledge. Surprisingly, making what is believed to be the best decision is not always the best strategy, even when learning in a supervised learning setting. An emerging body of work on learning under different rules applies perturbations to decision and learning procedures. These methods provide simple and highly efficient learning rules with improved theoretical guarantees. This book describes perturbation-based methods developed in machine learning to augment novel optimization methods with strong statistical guarantees, offering readers a state-of-the-art overview.

Chapters address recent modeling ideas that have arisen within the perturbations framework, including Perturb & MAP, herding, and the use of neural networks to map generic noise to distribution over highly structured data. They describe new learning procedures for perturbation models, including an improved EM algorithm and a learning algorithm that aims to match moments of model samples to moments of data. They discuss understanding the relation of perturbation models to their traditional counterparts, with one chapter showing that the perturbations viewpoint can lead to new algorithms in the traditional setting. And they consider perturbation-based regularization in neural networks, offering a more complete understanding of dropout and studying perturbations in the context of deep neural networks.

Edited by:   , , , , ,
Imprint:   MIT Press
Country of Publication:   United States
Dimensions:   Height: 254mm,  Width: 203mm, 
Weight:   369g
ISBN:   9780262549943
ISBN 10:   0262549948
Series:   Neural Information Processing series
Pages:   412
Publication Date:  
Recommended Age:   From 18 years
Audience:   Professional and scholarly ,  Undergraduate
Format:   Paperback
Publisher's Status:   Active
Preface ix 1 Introduction 1 Tamir Hazan, George Papandreou, and Daniel Tarlow 1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Perturb-and-MAP Random Fields 17 George Papandreou and Alan L. Yuille 2.1 Energy-Based Models: Deterministic vs. Probabilistic Approaches . . . . . . . . . . . . . . . .  19 2.2 Perturb-and-MAP for Gaussian and Sparse Continuous MRFs 23 2.3 Perturb-and-MAP for MRFs with Discrete Labels . . . . . . . 28 2.4 On the Representation Power of the Perturb-and-MAP Model 35 2.5 Related Work and Recent Developments . . . . . . . . . . . . 38 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3 Factorizing Shortest Paths with Randomized Optimum Models 45 Daniel Tarlow, Alexander Gaunt, Ryan Adams, and Richard S. Zemel 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Building Structured Models: Design Considerations . . . . . . 47 3.3 Randomized Optimum Models (RandOMs) . . . . . . . . . . . 48 3.4 Learning RandOMs . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5 RandOMs for Image Registration . . . . . . . . . . . . . . . . 56 3.6 Shortest Path Factorization . . . . . . . . . . . . . . . . . . . 56 3.7 Shortest Path Factorization with RandOMs . . . . . . . . . . 58 3.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4 Herding as a Learning System with Edge-of-Chaos Dynamics 73 Yutian Chen and Max Welling 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 Herding Model Parameters . . . . . . . . . . . . . . . . . . . . 77 4.3 Generalized Herding . . . . . . . . . . . . . . . . . . . . . . . 99 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 Learning Maximum A-Posteriori Perturbation Models 127 Andreea Gane, Tamir Hazan, and Tommi Jaakkola 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.2 Background and Notation . . . . . . . . . . . . . . . . . . . . 130 5.3 Expressive Power of Perturbation Models . . . . . . . . . . . . 131 5.4 Higher Order Dependencies . . . . . . . . . . . . . . . . . . . 132 5.5 Markov Properties and Perturbation Models . . . . . . . . . . 134 5.6 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 136 5.7 Learning Perturbation Models . . . . . . . . . . . . . . . . . . 141 5.8 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.9 Perturbation Models and Stability . . . . . . . . . . . . . . . . 152 5.10 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 155 5.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6 On the Expected Value of Random Maximum A-Posteriori Perturbations 161 Tamir Hazan and Tommi Jaakkola 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.2 Inference and Random Perturbations . . . . . . . . . . . . . . 164 6.3 Low-Dimensional Perturbations . . . . . . . . . . . . . . . . . 169 6.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 182 6.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7 A Poisson Process Model for Monte Carlo 193 Chris J. Maddison 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7.2 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.3 Exponential Races . . . . . . . . . . . . . . . . . . . . . . . . 203 7.4 Gumbel Processes . . . . . . . . . . . . . . . . . . . . . . . . . 210 7.5 Monte Carlo Methods That Use Bounds . . . . . . . . . . . . 216 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 7.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 8 Perturbation Techniques in Online Learning and Optimization 233 Jacob Abernethy, Chansoo Lee, and Ambuj Tewari 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 8.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 8.3 Gradient-Based Prediction Algorithm . . . . . . . . . . . . . . 237 8.4 Generic Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 245 8.5 Experts Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 247 8.6 Euclidean Balls Setting . . . . . . . . . . . . . . . . . . . . . . 252 8.7 The Multi-Armed Bandit Setting . . . . . . . . . . . . . . . . 254 8.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 9 Probabilistic Inference by Hashing and Optimization 265 Stefano Ermon 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 9.2 Problem Statement and Assumptions . . . . . . . . . . . . . . 268 9.3 Approximate Model Counting via Randomized Hashing . . . . 270 9.4 Probabilistic Models and Approximate Inference: The WISH Algorithm . . . . . . . . . . .  274 9.5 Optimization Subject to Parity Constraints . . . . . . . . . . 279 9.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 9.7 Open Problems and Research Challenges . . . . . . . . . . . . 282 9.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 9.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 10 Perturbation Models and PAC-Bayesian Generalization Bounds 289 Joseph Keshet, Subhransu Maji, Tamir Hazan, and Tommi Jaakkola 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 10.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 10.3 PAC-Bayesian Generalization Bounds . . . . . . . . . . . . . 294 10.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 10.5 The Bayesian Perspective . . . . . . . . . . . . . . . . . . . . 298 10.6 Approximate Inference . . . . . . . . . . . . . . . . . . . . . 301 10.7 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . 302 10.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 10.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 11 Adversarial Perturbations of Deep Neural Networks 311 David Warde-Farley and Ian Goodfellow 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 11.2 Adversarial Examples . . . . . . . . . . . . . . . . . . . . . . 312 11.3 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . 329 11.4 Generative Adversarial Networks . . . . . . . . . . . . . . . . 330 11.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 11.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 12 Data Augmentation via L´evy Processes 343 Stefan Wager, William Fithian, and Percy Liang 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 12.2 L´evy Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . 349 12.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 12.4 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . 365 12.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 12.6 Appendix: Proof of Theorem 12.4 . . . . . . . . . . . . . . . 369 12.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 13 Bilu-Linial Stability 375 Konstantin Makarychev and Yury Makarychev 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 13.2 Stable Instances of Graph Partitioning Problems . . . . . . . 380 13.3 Stable Instances of Clustering Problems . . . . . . . . . . . . 391 13.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

Tamir Hazan is Assistant Professor at Technion, Israel Institute of Technology. George Papandreou is a Research Scientist for Google, Inc. Daniel Tarlow is a Researcher at Microsoft Research Cambridge, UK.

See Inside

See Also