REINFORCE: Reinforcement Learning Most Fundamental Algorithm
If you would like to see more videos like this please consider supporting me on Patreon -https://www.patreon.com/andriydrozdyuk Reinforcement Learning: An I...
🔥 Related Trending Topics
LIVE TRENDSThis video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!
THIS VIDEO IS TRENDING!
This video is currently trending in Singapore under the topic 'itoto system 12'.
About this video
If you would like to see more videos like this please consider supporting me on Patreon -https://www.patreon.com/andriydrozdyuk
Reinforcement Learning: An Introduction, 2nd Ed, Sutton & Barto
For REINFORCE algorithm see Section "13.3 REINFORCE: Monte Carlo Policy Gradient":
http://incompleteideas.net/book/the-book-2nd.html
Complete code used in the video can be found here:
https://github.com/drozzy/reinforce
0:00 - Introduction
0:15 - Intro to RL
0:38 - Problem with Environment
1:02 - Why is this a problem for RL?
1:41 - Puppy treats (low level of abstraction)
2:14 - Good actions (middle level of abstraction)
3:22 - Reward as a signal (high level of abstraction)
4:04 - REINFORCE Algorithm Overview
5:11 - Collected Trajectory
6:01 - Product of G and Policy Gradient
6:34 - Two key concepts: sample and evaluate
6:48 - Sampling an action
7:22 - Sampling in REINFORCE
7:38 - Evaluating an action
8:24 - Sampling vs. Evaluating
8:41 - Sampling using torch.distributions.Categorical
9:12 - Evaluating using torch.distributions.Categorical
9:50 - Env/NN/Optim
10:07 - Collect One Episode of Experience
10:53 - Compute Discounted Returns
11:44 - Update the Policy
12:41 - Executing Trained Policy
13:04 - Demo Cart Pole Balancing
Video Information
Views
15.2K
Total views since publication
Likes
738
User likes and reactions
Duration
13:42
Video length
Published
Aug 16, 2021
Release date
Quality
hd
Video definition