Reconciling modern machine learning and the bias-variance trade-off

It turns out that the classic view of generalization and overfitting is incomplete! If you add parameters beyond the number of points in your dataset, genera...

Yannic Kilcher13.5K views18:54

🔥 Related Trending Topics

LIVE TRENDS

This video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!

THIS VIDEO IS TRENDING!

This video is currently trending in Thailand under the topic 'สภาพอากาศ'.

About this video

It turns out that the classic view of generalization and overfitting is incomplete! If you add parameters beyond the number of points in your dataset, generalization performance might increase again due to the increased smoothness of overparameterized functions. Abstract: The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited. Authors: Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal https://arxiv.org/abs/1812.11118

Video Information

Views
13.5K

Total views since publication

Likes
528

User likes and reactions

Duration
18:54

Video length

Published
Aug 5, 2019

Release date

Quality
hd

Video definition

Tags and Topics

This video is tagged with the following topics. Click any tag to explore more related content and discover similar videos:

Tags help categorize content and make it easier to find related videos. Browse our collection to discover more content in these categories.