Data Augmentation Techniques for Text Classification in NLP: A Research Paper Overview
This overview explores various data augmentation strategies used in NLP for text classification, highlighting their effectiveness and implementation insights.

TechViz - The Data Science Guy
3.5K views • Aug 2, 2020

About this video
#dataaugmentation #textclassification #researchpaperwalkthrough #nlp
Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. In this video, I discuss Data Augmentation Techniques for Text Classification in NLP
⏩ Abstract: We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.
⏩ OUTLINE:
0:00 - Intro and Overview
3:00 - Data Augmentation Rules
7:35 - Results
10:35 - Ablation Study
13:29 - My thoughts and takeaways
⏩ Paper Title: EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
⏩ Paper: https://arxiv.org/abs/1901.11196
⏩ Author: Jason Wei, Kai Zou
⏩ Official Code: https://github.com/jasonwei20/eda_nlp
⏩ Organisation: Protago Labs Research, Tysons Corner, Virginia, USA, Department of Computer Science, Dartmouth College, Department of Mathematics and Statistics, Georgetown University
Enjoy reading articles? then consider subscribing to Medium membership, it just 5$ a month for unlimited access to all free/paid content. Subscribe now - https://prakhar-mishra.medium.com/membership
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at https://www.buymeacoffee.com/TechvizCoffee
*********************************************
⏩ Youtube - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA
⏩ Blog - https://prakhartechviz.blogspot.com
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #nlp #textclassification #naturallanguageprocessing
Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. In this video, I discuss Data Augmentation Techniques for Text Classification in NLP
⏩ Abstract: We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.
⏩ OUTLINE:
0:00 - Intro and Overview
3:00 - Data Augmentation Rules
7:35 - Results
10:35 - Ablation Study
13:29 - My thoughts and takeaways
⏩ Paper Title: EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
⏩ Paper: https://arxiv.org/abs/1901.11196
⏩ Author: Jason Wei, Kai Zou
⏩ Official Code: https://github.com/jasonwei20/eda_nlp
⏩ Organisation: Protago Labs Research, Tysons Corner, Virginia, USA, Department of Computer Science, Dartmouth College, Department of Mathematics and Statistics, Georgetown University
Enjoy reading articles? then consider subscribing to Medium membership, it just 5$ a month for unlimited access to all free/paid content. Subscribe now - https://prakhar-mishra.medium.com/membership
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at https://www.buymeacoffee.com/TechvizCoffee
*********************************************
⏩ Youtube - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA
⏩ Blog - https://prakhartechviz.blogspot.com
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #nlp #textclassification #naturallanguageprocessing
Tags and Topics
Browse our collection to discover more content in these categories.
Video Information
Views
3.5K
Likes
61
Duration
14:33
Published
Aug 2, 2020
User Reviews
4.6
(3) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
Trending Now