Udacity part of Accenture logo

Fine-Tuning AI Agents with Reinforcement Learning

This course teaches you to build adaptive AI agents. You'll learn to transform static, "frozen" LLMs into dynamic systems that can learn, reason, and act. We cover two key fine-tuning methods: Supervised Fine-Tuning (SFT) for reliable, structured outputs and Parameter-Efficient Fine-Tuning (PEFT) for building specialized models efficiently. You'll design agent "brains" using ReAct reasoning loops and learn to generate training data using a "Teacher-Student" workflow. Finally, you'll tackle advanced AI alignment, learning to prevent "specification gaming" and use Direct Preference Optimization (DPO) to teach agents complex human preferences.

  • Course
  • Intermediate
  • 15 hours
  • Updated: Nov 18, 2025

Subscription · Monthly

  • Cancel Anytime
  • Unlimited access to hundreds of top-rated courses
  • Hands-on projects with expert feedback
  • Personalized career coaching and interview prep
  • Program Certificates

Skills you'll learn

5 skills

  • Reinforcement learning
  • Supervised Fine-Tuning (SFT) with PEFT Basics
  • Foundation Model Alignment
  • Agentic Reasoning Frameworks
  • Parameter-Efficient Fine-Tuning

Prerequisites

5 prerequisites

Prior to enrolling, you should have the following knowledge:

  • Intermediate Python
  • Agentic AI Awareness
  • Generative AI Fluency
  • Basic supervised machine learning
  • Basic PyTorch

You will also need to be able to communicate fluently and professionally in written and spoken English.

Course Outline

  • 16 lessons
  • 1 project

Program Instructors

1 instructor

Unlike typical professors, our instructors come from Fortune 500 and Global 2000 companies and have demonstrated leadership and expertise in their professions:

Christopher Agostino

Founder and Research Scientist at NPC Worldwide

Christopher Agostino

Founder and Research Scientist at NPC Worldwide

About this program

Go beyond prompting. Learn to train and align LLM agents using reinforcement learning and DPO for real-world, multi-step tasks.

Subscription · Monthly

  • Cancel Anytime
  • Unlimited access to hundreds of top-rated courses
  • Hands-on projects with expert feedback
  • Personalized career coaching and interview prep
  • Program Certificates

Other programs you might like:

Udacity Accenture logo

Company

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

© 2011-2026 Udacity, Inc. "Nanodegree" is a registered trademark of Udacity. © 2011-2026 Udacity, Inc.
We use cookies and other data collection technologies to provide the best experience for our customers.