Reconciling modern machine learning and the bias-variance trade-off
It turns out that the classic view of generalization and overfitting is incomplete! If you add parameters beyond the number of points in your dataset, genera...
🔥 Related Trending Topics
LIVE TRENDSThis video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!
THIS VIDEO IS TRENDING!
This video is currently trending in Thailand under the topic 'สภาพอากาศ'.
About this video
It turns out that the classic view of generalization and overfitting is incomplete! If you add parameters beyond the number of points in your dataset, generalization performance might increase again due to the increased smoothness of overparameterized functions.
Abstract:
The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology.
The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample.
However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data.
How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice?
In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation.
Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models.
The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.
Authors: Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal
https://arxiv.org/abs/1812.11118
Video Information
Views
13.5K
Total views since publication
Likes
528
User likes and reactions
Duration
18:54
Video length
Published
Aug 5, 2019
Release date
Quality
hd
Video definition
About the Channel
Tags and Topics
This video is tagged with the following topics. Click any tag to explore more related content and discover similar videos:
#machine learning #bias #variance #tradeoff #generalization #overfitting #interpolation #parameters #model class #complexity #deep learning #neural networks #overparameterization #erm #random fourier features
Tags help categorize content and make it easier to find related videos. Browse our collection to discover more content in these categories.