Deep Reinforcement Learning

Name: Deep Reinforcement Learning Nanodegree Program
Rating: 4.6 (328 reviews)

Nanodegree Program

The Deep Reinforcement Learning Nanodegree has four courses: Introduction to Deep Reinforcement Learning, Value-Based Methods, Policy-Based Methods, and Multi-Agent RL. Students learn to implement classical solution methods, define Markov decision processes, policies, and value functions, and derive Bellman equations. They learn dynamic programming, Monte Carlo methods, temporal-difference methods, deep RL, and apply these techniques to solve real-world problems. They learn to train agents to navigate virtual worlds, generate optimal financial trading strategies, and apply RL to multiple interacting agents.

Advanced

2 months

Real-world Projects

Completion Certificate

Last Updated July 12, 2024

Skills you'll learn:

Value-based reinforcement learning • Stochastic policy gradients • Reinforce algorithm • Exploration-exploitation dilemma

Prerequisites:

Intermediate Python • Deep learning framework proficiency • Neural network basics

Courses In This Program

Course 1 • 1 day

Introduction to Deep Reinforcement Learning

Lesson 1

Welcome to Deep Reinforcement Learning

Welcome to the Deep Reinforcement Learning Nanodegree program!

Lesson 2

Getting Help

You are starting a challenging but rewarding journey! Take 5 minutes to read how to get help with projects and content.

Lesson 3

Get Help with Your Account

What to do if you have questions about your account or general questions about the program.

Lesson 4

Learning Plan

Obtain helpful resources to accelerate your learning in this first part of the Nanodegree program.

Lesson 5

Introduction to RL

Reinforcement learning is a type of machine learning where the machine or software agent learns how to maximize its performance at a task.

Lesson 6

The RL Framework: The Problem

Learn how to mathematically formulate tasks as Markov Decision Processes.

Lesson 7

The RL Framework: The Solution

In reinforcement learning, agents learn to prioritize different decisions based on the rewards and punishments associated with different outcomes.

Lesson 8

Monte Carlo Methods

Write your own implementation of Monte Carlo control to teach an agent to play Blackjack!

Lesson 9

Temporal-Difference Methods

Learn about how to apply temporal-difference methods such as SARSA, Q-Learning, and Expected SARSA to solve both episodic and continuing tasks.

Lesson 10

Solve OpenAI Gym's Taxi-v2 Task

With reinforcement learning now in your toolbox, you're ready to explore a mini project using OpenAI Gym!

Lesson 11

RL in Continuous Spaces

Learn how to adapt traditional algorithms to work with continuous spaces.

Lesson 12

What's Next?

In the next parts of the Nanodegree program, you'll learn all about how to use neural networks as powerful function approximators in reinforcement learning.

Lesson 1

Welcome to Deep Reinforcement Learning

Welcome to the Deep Reinforcement Learning Nanodegree program!

Lesson 2

Getting Help

You are starting a challenging but rewarding journey! Take 5 minutes to read how to get help with projects and content.

Lesson 3

Get Help with Your Account

What to do if you have questions about your account or general questions about the program.

Lesson 4

Learning Plan

Obtain helpful resources to accelerate your learning in this first part of the Nanodegree program.

Lesson 5

Introduction to RL

Reinforcement learning is a type of machine learning where the machine or software agent learns how to maximize its performance at a task.

Lesson 6

The RL Framework: The Problem

Learn how to mathematically formulate tasks as Markov Decision Processes.

Lesson 7

The RL Framework: The Solution

In reinforcement learning, agents learn to prioritize different decisions based on the rewards and punishments associated with different outcomes.

Lesson 8

Monte Carlo Methods

Write your own implementation of Monte Carlo control to teach an agent to play Blackjack!

Lesson 9

Temporal-Difference Methods

Learn about how to apply temporal-difference methods such as SARSA, Q-Learning, and Expected SARSA to solve both episodic and continuing tasks.

Lesson 10

Solve OpenAI Gym's Taxi-v2 Task

With reinforcement learning now in your toolbox, you're ready to explore a mini project using OpenAI Gym!

Lesson 11

RL in Continuous Spaces

Learn how to adapt traditional algorithms to work with continuous spaces.

Lesson 12

What's Next?

In the next parts of the Nanodegree program, you'll learn all about how to use neural networks as powerful function approximators in reinforcement learning.

Course 2 • 4 weeks

Value-Based Methods

Apply deep learning architectures to reinforcement learning tasks. Train your own agent that navigates a virtual world from sensory data.

Lesson 1

Study Plan

This lesson covers the study plan and prerequisites for this course.

Lesson 2

Deep Q-Networks

Extend value-based reinforcement learning methods to complex problems using deep neural networks.

Lesson 3 • Project

Project: Navigation

Train an agent to navigate a large world and collect yellow bananas, while avoiding blue bananas.

Lesson 1

Study Plan

This lesson covers the study plan and prerequisites for this course.

Lesson 2

Deep Q-Networks

Extend value-based reinforcement learning methods to complex problems using deep neural networks.

Lesson 3 • Project

Project: Navigation

Train an agent to navigate a large world and collect yellow bananas, while avoiding blue bananas.

Course 3 • 4 weeks

Policy-Based Methods

Lesson 1

Study Plan

Obtain helpful resources to accelerate your learning in the third part of the Nanodegree program.

Lesson 2

Introduction to Policy-Based Methods

Policy-based methods try to directly optimize for the optimal policy.

Lesson 3

Policy Gradient Methods

Policy gradient methods search for the optimal policy through gradient ascent.

Lesson 4

Proximal Policy Optimization

Learn what Proximal Policy Optimization (PPO) is and how it can improve policy gradients. Also learn how to implement the algorithm by training a computer to play the Atari Pong game.

Lesson 5

Actor-Critic Methods

Miguel Morales explains how to combine value-based and policy-based methods, bringing together the best of both worlds, to solve challenging reinforcement learning problems.

Lesson 6

Deep RL for Finance (Optional)

Learn how to apply deep reinforcement learning techniques for optimal execution of portfolio transactions.

Lesson 7 • Project

Continuous Control

Train a double-jointed arm to reach target locations.

Lesson 1

Study Plan

Obtain helpful resources to accelerate your learning in the third part of the Nanodegree program.

Lesson 2

Introduction to Policy-Based Methods

Policy-based methods try to directly optimize for the optimal policy.

Lesson 3

Policy Gradient Methods

Policy gradient methods search for the optimal policy through gradient ascent.

Lesson 4

Proximal Policy Optimization

Learn what Proximal Policy Optimization (PPO) is and how it can improve policy gradients. Also learn how to implement the algorithm by training a computer to play the Atari Pong game.

Lesson 5

Actor-Critic Methods

Miguel Morales explains how to combine value-based and policy-based methods, bringing together the best of both worlds, to solve challenging reinforcement learning problems.

Lesson 6

Deep RL for Finance (Optional)

Learn how to apply deep reinforcement learning techniques for optimal execution of portfolio transactions.

Lesson 7 • Project

Continuous Control

Train a double-jointed arm to reach target locations.

Course 4 • 3 weeks

Multi-Agent Reinforcement Learning

Lesson 1

Study Plan

Obtain helpful resources to accelerate your learning in the fourth part of the Nanodegree program.

Lesson 2

Introduction to Multi-Agent RL

Lesson 3

Case Study: AlphaZero

Lesson 4 • Project

Collaboration and Competition

Train a pair of agents to play tennis.

Lesson 1

Study Plan

Obtain helpful resources to accelerate your learning in the fourth part of the Nanodegree program.

Lesson 2

Introduction to Multi-Agent RL

Lesson 3

Case Study: AlphaZero

Lesson 4 • Project

Collaboration and Competition

Train a pair of agents to play tennis.

Taught By The Best

Mat Leonard

Content Developer

Mat is a former physicist, research neuroscientist, and data scientist. He did his PhD and Postdoctoral Fellowship at the University of California, Berkeley.

Miguel Morales

Content Developer

Miguel is a software engineer at Lockheed Martin. He earned a Masters in Computer Science at Georgia Tech and is an Instructional Associate for the Reinforcement Learning and Decision Making course. He's the author of Grokking Deep Reinforcement Learning.

Chhavi Yadav

Content Developer

Chhavi is a Computer Science graduate student at New York University, where she researches machine learning algorithms. She is also an electronics engineer and has worked on wireless systems.

Dana Sheahan

Content Developer

Dana is an electrical engineer with a Masters in Computer Science from Georgia Tech. Her work experience includes software development for embedded systems in the Automotive Group at Motorola, where she was awarded a patent for an onboard operating system.

Cezanne Camacho

Curriculum Lead

Cezanne is an expert in computer vision with a Masters in Electrical Engineering from Stanford University. As a former researcher in genomics and biomedical imaging, she's applied computer vision and deep learning to medical diagnostic applications.

Alexis Cook

Curriculum Lead

Alexis is an applied mathematician with a Masters in Computer Science from Brown University and a Masters in Applied Mathematics from the University of Michigan. She was formerly a National Science Foundation Graduate Research Fellow.

Arpan Chakraborty

Instructor

Arpan is a computer scientist with a PhD from North Carolina State University. He teaches at Georgia Tech (within the Masters in Computer Science program), and is a coauthor of the book Practical Graph Mining with R.

Luis Serrano

Instructor

Luis was formerly a Machine Learning Engineer at Google. He holds a PhD in mathematics from the University of Michigan, and a Postdoctoral Fellowship at the University of Quebec at Montreal.

Juan Delgado

Content Developer

Juan is a computational physicist with a Masters in Astronomy. He is finishing his PhD in Biophysics. He previously worked at NASA developing space instruments and writing software to analyze large amounts of scientific data using machine learning techniques.

Ratings & Reviews

Average Rating: 4.6 Stars

328 Reviews

Lucas Sabbatini de Barros F.

April 11, 2023

Even though the content is great, I enrolled in this program 5 years ago and it was the same content. They should‚Äôve updated it and made the videos and lessons better.

Manjeet Singh N.

March 17, 2023

It covers the topic in sufficient detail and is not just a cursory introduction. Coding exercises and projects are pretty intensive.

Jairo M.

March 5, 2023

The program was much better than I expected.

Anthony Leonardo S.

November 18, 2022

It's Perfect

Greg N.

October 19, 2022

the jump between tuition and projects is quite large but it is hugely rewarding! I think RL is the only way to train agents

The Udacity Difference

Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.

Demonstrate proficiency with practical projects

Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.

Gain proven experience
Retain knowledge longer
Apply new skills immediately

Top-tier services to ensure learner success

Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.