Skip to content

Advanced Computer Vision & Deep Learning


Apply deep learning architectures to computer vision tasks.

Enroll Now
  • Estimated time
    1 month

  • Enroll by
    May 31, 2023

    Get access to classroom immediately on enrollment

  • Skills acquired
    Recurrent Neural Networks, Image Caption generation

What You Will Learn

  1. Advanced Computer Vision and Deep Learning

    1 month to complete

    Apply deep learning architectures to computer vision tasks. Discover how to combine CNN and RNN networks to build an automatic image captioning application.

    Prerequisite knowledge

    Intermediate Python, Object-Oriented Programming Basics

    1. Advanced CNN Architectures

      Describe advances in CNN architectures. Understand region-based CNN’s, Fast R-CNN, and Faster-R-CNN, which allow for fast, localized object recognition in images.

      • YOLO

        Understand grid, sliding windows, and bounding boxes in object detection. Implement YOLO, a real-time object detection algorithm.

        • RNN’s

          Understand how recurrent neural networks learn from ordered sequences of data. Identify RNN applications in deep learning. Understand how feedforward and backpropagation through time work as well as RNN unfolded model.

          • Long Short-Term Memory Networks (LSTMs)

            Explore how memory can be incorporated into a deep learning model and implement long short-term memory networks.

            • Hyperparameters

              Refresh important hyperparameters such as learning rate, epochs, and layer and understand hyperparameters in RNN.

              • Optional: Attention Mechanisms

                Understand how attention allows models to focus on a specific piece of input data. Describe where attention is useful in natural language and computer vision applications. Describe attention and its encoder and decoder that empower applications like text translation. Understand basic attention methods like additive attention, Bahdanau and Luong attention

                • Image Captioning

                  Understand image captioning and tokenize captions and words. Combine CNNs and RNNs to build a complex captioning model.

                  • Course Project: Image Captioning

                    Combine CNN and RNN knowledge to build a deep learning model that produces captions given an input image. Image captioning requires that learners create a complex deep learning model with two components: a CNN that transforms an input image into a set of features, and an RNN that turns those features into rich, descriptive language. In this project, you will implement these cutting-edge deep learning architectures.

                  All Our Courses Include

                  • Real-world projects from industry experts

                    With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.

                  • Real-time support

                    On demand help. Receive instant help with your learning directly in the classroom. Stay on track and get unstuck.

                  • Workspaces

                    Validate your understanding of concepts learned by checking the output and quality of your code in real-time.

                  • Flexible learning program

                    Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.

                  Course offerings

                  • Class content

                    • Real-world projects
                    • Project reviews
                    • Project feedback from experienced reviewers
                  • Student services

                    • Student community
                    • Real-time support

                  Succeed with personalized services.

                  We provide services customized for your needs at every step of your learning journey to ensure your success.

                  Get timely feedback on your projects.

                  • Personalized feedback
                  • Unlimited submissions and feedback loops
                  • Practical tips and industry best practices
                  • Additional suggested resources to improve
                  • 1,400+

                    project reviewers

                  • 2.7M

                    projects reviewed

                  • 88/100

                    reviewer rating

                  • 1.1 hours

                    avg project review turnaround time

                  Learn with the best.

                  Learn with the best.

                  • Cezanne Camacho

                    Curriculum Lead

                    Cezanne is an expert in computer vision with a Masters in Electrical Engineering from Stanford University. As a former researcher in genomics and biomedical imaging, she’s applied computer vision and deep learning to medical diagnostic applications.

                  • Luis Serrano


                    Luis was formerly a Machine Learning Engineer at Google. He holds a PhD in mathematics from the University of Michigan, and a Postdoctoral Fellowship at the University of Quebec at Montreal.

                  • Jay Alammar


                    Jay has a degree in computer science, loves visualizing machine learning concepts, and is the Investment Principal at STV, a $500 million venture capital fund focused on high-technology startups.

                  • Ortal Arel

                    Curriculum Lead

                    Ortal Arel is a former computer engineering professor. She holds a Ph.D. in Computer Engineering from the University of Tennessee. Her doctoral research work was in the area of applied cryptography.

                  • Kelvin Lwin

                    AI | Knowledge Architect

                    Kelvin had taught in US Academia and Industry within highly technical subjects of CS and AI/DL for a decade. He expanded into building AI Fullstack in China to have a broader global perspective for 3 years. Now he is combining AI, Empathy & Ethics informed by his 18 years of meditation to build new Educational AI for all.

                  Advanced Computer Vision and Deep Learning

                  Get started today

                    • Learn

                      Apply deep learning architectures to computer vision tasks.

                    • Average Time

                      On average, successful students take 1 month to complete this program.

                    • Benefits include

                      • Real-world projects from industry experts
                      • Real-time support

                    Program Details

                    • Do I need to apply? What are the admission criteria?

                      No. This Course accepts all applicants regardless of experience and specific background.

                    • What are the prerequisites for enrollment?

                      To be successful in this program, learners should have knowledge of probability basics, deep learning frameworks, intermediate Python, neural network basics, and object-oriented programming basics.

                    • How is this course structured?

                      This course is comprised of content and curriculum to support one project. We estimate that students can complete the program in one month.

                      The project will be reviewed by the Udacity reviewer network and platform. Feedback will be provided and if you do not pass the project, you will be asked to resubmit the project until it passes.

                    • How long is this course?

                      Access to this course runs for the length of time specified in the payment card above. If you do not graduate within that time period, you will continue learning with month to month payments. See the Terms of Use and FAQs for other policies regarding the terms of access to our programs.

                    • Can I switch my start date? Can I get a refund?

                      Please see the Udacity Program Terms of Use and FAQs for policies on enrollment in our programs.

                    • What software and versions will I need in this course?

                      Learners will need Python and the following Python libraries: numpy==1.12.1, torch==0.4.0, matplotlib==2.1.0, and torchvision==0.2.1. Plus, they’ll need a computer running a 64-bit operating system with at least 8GB of RAM and with administrator account permissions.

                    Advanced Computer Vision and Deep Learning

                    Enroll Now