Reinforce Algorithm Explained: The Foundation of Reinforcement Learning 🧠
Discover the basics of the Reinforce algorithm, a key method in reinforcement learning. Perfect for beginners eager to understand how agents learn through rewards and actions. Support me on Patreon for more tutorials!

Andriy Drozdyuk
15.2K views • Aug 16, 2021

About this video
If you would like to see more videos like this please consider supporting me on Patreon -https://www.patreon.com/andriydrozdyuk
Reinforcement Learning: An Introduction, 2nd Ed, Sutton & Barto
For REINFORCE algorithm see Section "13.3 REINFORCE: Monte Carlo Policy Gradient":
http://incompleteideas.net/book/the-book-2nd.html
Complete code used in the video can be found here:
https://github.com/drozzy/reinforce
0:00 - Introduction
0:15 - Intro to RL
0:38 - Problem with Environment
1:02 - Why is this a problem for RL?
1:41 - Puppy treats (low level of abstraction)
2:14 - Good actions (middle level of abstraction)
3:22 - Reward as a signal (high level of abstraction)
4:04 - REINFORCE Algorithm Overview
5:11 - Collected Trajectory
6:01 - Product of G and Policy Gradient
6:34 - Two key concepts: sample and evaluate
6:48 - Sampling an action
7:22 - Sampling in REINFORCE
7:38 - Evaluating an action
8:24 - Sampling vs. Evaluating
8:41 - Sampling using torch.distributions.Categorical
9:12 - Evaluating using torch.distributions.Categorical
9:50 - Env/NN/Optim
10:07 - Collect One Episode of Experience
10:53 - Compute Discounted Returns
11:44 - Update the Policy
12:41 - Executing Trained Policy
13:04 - Demo Cart Pole Balancing
Reinforcement Learning: An Introduction, 2nd Ed, Sutton & Barto
For REINFORCE algorithm see Section "13.3 REINFORCE: Monte Carlo Policy Gradient":
http://incompleteideas.net/book/the-book-2nd.html
Complete code used in the video can be found here:
https://github.com/drozzy/reinforce
0:00 - Introduction
0:15 - Intro to RL
0:38 - Problem with Environment
1:02 - Why is this a problem for RL?
1:41 - Puppy treats (low level of abstraction)
2:14 - Good actions (middle level of abstraction)
3:22 - Reward as a signal (high level of abstraction)
4:04 - REINFORCE Algorithm Overview
5:11 - Collected Trajectory
6:01 - Product of G and Policy Gradient
6:34 - Two key concepts: sample and evaluate
6:48 - Sampling an action
7:22 - Sampling in REINFORCE
7:38 - Evaluating an action
8:24 - Sampling vs. Evaluating
8:41 - Sampling using torch.distributions.Categorical
9:12 - Evaluating using torch.distributions.Categorical
9:50 - Env/NN/Optim
10:07 - Collect One Episode of Experience
10:53 - Compute Discounted Returns
11:44 - Update the Policy
12:41 - Executing Trained Policy
13:04 - Demo Cart Pole Balancing
Video Information
Views
15.2K
Likes
738
Duration
13:42
Published
Aug 16, 2021
User Reviews
4.6
(3) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
No specific trending topics match this video yet.
Explore All Trends