Real-world projects from industry experts
With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.
Apply deep learning architectures to computer vision tasks.
Get access to classroom immediately on enrollment
Apply deep learning architectures to computer vision tasks. Discover how to combine CNN and RNN networks to build an automatic image captioning application.
Intermediate Python, Object-Oriented Programming Basics
Describe advances in CNN architectures. Understand region-based CNN’s, Fast R-CNN, and Faster-R-CNN, which allow for fast, localized object recognition in images.
Understand grid, sliding windows, and bounding boxes in object detection. Implement YOLO, a real-time object detection algorithm.
Understand how recurrent neural networks learn from ordered sequences of data. Identify RNN applications in deep learning. Understand how feedforward and backpropagation through time work as well as RNN unfolded model.
Explore how memory can be incorporated into a deep learning model and implement long short-term memory networks.
Refresh important hyperparameters such as learning rate, epochs, and layer and understand hyperparameters in RNN.
Understand how attention allows models to focus on a specific piece of input data. Describe where attention is useful in natural language and computer vision applications. Describe attention and its encoder and decoder that empower applications like text translation. Understand basic attention methods like additive attention, Bahdanau and Luong attention
Understand image captioning and tokenize captions and words. Combine CNNs and RNNs to build a complex captioning model.
Combine CNN and RNN knowledge to build a deep learning model that produces captions given an input image. Image captioning requires that learners create a complex deep learning model with two components: a CNN that transforms an input image into a set of features, and an RNN that turns those features into rich, descriptive language. In this project, you will implement these cutting-edge deep learning architectures.
With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.
On demand help. Receive instant help with your learning directly in the classroom. Stay on track and get unstuck.
Validate your understanding of concepts learned by checking the output and quality of your code in real-time.
Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.
We provide services customized for your needs at every step of your learning journey to ensure your success.
project reviewers
projects reviewed
reviewer rating
avg project review turnaround time
Cezanne is an expert in computer vision with a Masters in Electrical Engineering from Stanford University. As a former researcher in genomics and biomedical imaging, she’s applied computer vision and deep learning to medical diagnostic applications.
Luis was formerly a Machine Learning Engineer at Google. He holds a PhD in mathematics from the University of Michigan, and a Postdoctoral Fellowship at the University of Quebec at Montreal.
Jay has a degree in computer science, loves visualizing machine learning concepts, and is the Investment Principal at STV, a $500 million venture capital fund focused on high-technology startups.
Ortal Arel is a former computer engineering professor. She holds a Ph.D. in Computer Engineering from the University of Tennessee. Her doctoral research work was in the area of applied cryptography.
Kelvin had taught in US Academia and Industry within highly technical subjects of CS and AI/DL for a decade. He expanded into building AI Fullstack in China to have a broader global perspective for 3 years. Now he is combining AI, Empathy & Ethics informed by his 18 years of meditation to build new Educational AI for all.
Apply deep learning architectures to computer vision tasks.
On average, successful students take 1 month to complete this program.
No. This Course accepts all applicants regardless of experience and specific background.
To be successful in this program, learners should have knowledge of probability basics, deep learning frameworks, intermediate Python, neural network basics, and object-oriented programming basics.
This course is comprised of content and curriculum to support one project. We estimate that students can complete the program in one month.
The project will be reviewed by the Udacity reviewer network and platform. Feedback will be provided and if you do not pass the project, you will be asked to resubmit the project until it passes.
Access to this course runs for the length of time specified in the payment card above. If you do not graduate within that time period, you will continue learning with month to month payments. See the Terms of Use and FAQs for other policies regarding the terms of access to our programs.
Please see the Udacity Program Terms of Use and FAQs for policies on enrollment in our programs.
Learners will need Python and the following Python libraries: numpy==1.12.1, torch==0.4.0, matplotlib==2.1.0, and torchvision==0.2.1. Plus, they’ll need a computer running a 64-bit operating system with at least 8GB of RAM and with administrator account permissions.