Udacity part of Accenture logo

Multimodal AI Applications

Learn how computers process and understand image data, then harness the power of the latest Generative AI models to create new images.

  • Course
  • Intermediate
  • 18 hours
  • Updated: Nov 19, 2025

Subscription · Monthly

  • Cancel Anytime
  • Unlimited access to hundreds of top-rated courses
  • Hands-on projects with expert feedback
  • Personalized career coaching and interview prep
  • Program Certificates

Skills you'll learn

17 skills

  • Image pre-processing
  • Word embeddings
  • Ethical AI
  • AI Audio and Speech Analysis
  • Yolo algorithm

Prerequisites

8 prerequisites

Prior to enrolling, you should have the following knowledge:

  • Transformer neural networks
  • Hugging Face
  • Deep learning
  • Prompt Engineering
  • PyTorch

You will also need to be able to communicate fluently and professionally in written and spoken English.

Course Outline

  • 45 lessons
  • 1 project

Program Instructors

1 instructor

Unlike typical professors, our instructors come from Fortune 500 and Global 2000 companies and have demonstrated leadership and expertise in their professions:

Giacomo Vianello

Director, Machine Learning Engineer

Giacomo Vianello

Director, Machine Learning Engineer

About this program

Learn how AI creates and interprets speech, images, and video. Build AI systems that create and understand content in multiple modalities.

Subscription · Monthly

  • Cancel Anytime
  • Unlimited access to hundreds of top-rated courses
  • Hands-on projects with expert feedback
  • Personalized career coaching and interview prep
  • Program Certificates

Other programs you might like:

Udacity Accenture logo

Company

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

© 2011-2026 Udacity, Inc. "Nanodegree" is a registered trademark of Udacity. © 2011-2026 Udacity, Inc.
We use cookies and other data collection technologies to provide the best experience for our customers.