Why One-Hot Encoding Can Be Problematic in Data Science
One-hot encoding for categorical data may not always be ideal, especially with features like week of the year or high cardinality.

Data Science Garage
1.4K views • May 24, 2021

About this video
One Hot Encoding with Dummy variables for categorical features in Data Science is not a good solution sometimes. What if you have features such as Week of Days, or Timestamps which occurs in time frequency, cyclical, by season, or by specific time patter (as like time series data).
One Hot Encoding is good with relatively small amount of categorical values. What if you feature has many different values? For example - hundreds? Then you Dummy variable table will have 100x100 table with encoded values of zero and ones. In this specific Machine Learning (ML) Modelling scenario, the computations will be not efficiency.
The better and the best solution instead of One Hot Encoding is to use converting categorical values into numerical values by math formulas and functions, based on trigonometry. In this video we will convert Week of Days as Categorical values into numerical values based on sinus and cosine transformation.
These transformations could be implemented on any of your cyclical occurring in time data for you Data Science and Machine Learning project. This solution is belong to Feature Engineering part in Data Science project lifecycle, and also can be recognized as Machine Learning Engineering. I suggest to pay an extra attention to this part once you have a real life Data Science or Machine Learning project.
The content of the video:
0:00 - Intro and Problem Definition
1:54 - Real Example with Python (in Jupyter Notebook)
Have fun and happy learning!
#onehotencoding #machinelearningengineering #featureengineering
One Hot Encoding is good with relatively small amount of categorical values. What if you feature has many different values? For example - hundreds? Then you Dummy variable table will have 100x100 table with encoded values of zero and ones. In this specific Machine Learning (ML) Modelling scenario, the computations will be not efficiency.
The better and the best solution instead of One Hot Encoding is to use converting categorical values into numerical values by math formulas and functions, based on trigonometry. In this video we will convert Week of Days as Categorical values into numerical values based on sinus and cosine transformation.
These transformations could be implemented on any of your cyclical occurring in time data for you Data Science and Machine Learning project. This solution is belong to Feature Engineering part in Data Science project lifecycle, and also can be recognized as Machine Learning Engineering. I suggest to pay an extra attention to this part once you have a real life Data Science or Machine Learning project.
The content of the video:
0:00 - Intro and Problem Definition
1:54 - Real Example with Python (in Jupyter Notebook)
Have fun and happy learning!
#onehotencoding #machinelearningengineering #featureengineering
Tags and Topics
Browse our collection to discover more content in these categories.
Video Information
Views
1.4K
Likes
55
Duration
5:52
Published
May 24, 2021
User Reviews
4.5
(1) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.