Select Page

How would our world look if computers could see and understand the objects around them? Now, what if they were able to do so better than we could?

That is precisely what computer vision attempts to achieve. Read on to find out why many of the field’s experts use Python and its machine learning libraries to bring about the widespread adoption of computer vision applications.

What Is Computer Vision?

Computer vision as a field focuses on giving machines a way to visually perceive real-life objects and make decisions based on what they “see.” The field’s end goal is to automate burdensome tasks, from navigating a car on a busy highway to categorizing medical imagery, all while achieving higher accuracy and speed of execution compared to humans.

Functionality powered by computer vision is already present in many devices and systems that we use daily. For example, a smartphone’s camera app uses a computer vision algorithm to detect faces and adjust the camera’s settings to focus on them.

Internally, the algorithm finds locations in the camera’s view that may indicate a human face by scanning the image and finding groups of pixels that match a human-face pattern. If a face is found, the algorithm draws a visible box around it — so that the user knows that the human in the picture is now in focus — and adjusts the camera’s settings to get the best possible shot.

While finding faces in a camera’s view can seem trivial, the computer vision algorithm also has to do a good job filtering out unrelated objects like vacuum cleaners (easy), pets (harder), or paintings of people hung on walls (quite difficult). In a camera app, mistaking a corgi for a human is tolerable. But in an autonomous driving system, wrongly recognizing a person as a leaf on the road can be life-threatening.

Let’s look at some more concrete examples of how computer vision is taking our lives by storm. 

Computer Vision in Practice

Computer vision is crucial to augmented reality (AR), where barriers between the physical and online worlds are blurry. 

AR is how you can “bring to life” your own mythical creatures and watch them jump around your kitchen cabinets. In a computer vision-powered AR application, algorithms can recognize objects such as tables, floors, and other solid surfaces, and place virtual models on top of those surfaces. Inside a game, each surface might have its own properties based on color or texture. For example, an algorithm could check for categories of objects that are perceived as solid, non-solid, or even slippery, and prevent you from seeing a dragon walk across water or slide across a carpet.

Similarly, developers use computer vision for facial recognition. This could work to improve photo and video quality, as in our example at the beginning of the article, or it could also serve more advanced use cases. Facebook famously uses facial recognition to let you review pictures in which your friends might have forgotten to tag you. Likewise, Apple’s Face ID feature tracks facial features to let you seamlessly unlock your iPhone.

On the other hand, a home automation device that’s starting to leverage computer vision is the robot vacuum. Unlike the older robot vacuum models that use sensors to detect when they bump into objects to map a room, some of the newer models now incorporate object detection to classify furniture, pets, and people. Not only does this prevent damage to fragile furniture or harm to smaller pets, but it also serves as a potential method of checking for any changes in the environment being cleaned. For example, a person or pet is unlikely to stay in one location for too long, so by seeing that something in its environment has changed, a new-generation robot vacuum can adjust and reach a place that may otherwise have been inaccessible.

Let’s have a look at how you can use the Python language to dive into the applications of computer vision and learn more about the field.

Using Python in Computer Vision

Python is a mainstay when it comes to computer vision or artificial intelligence in general. This is mainly thanks to its readability and an extensive collection of community-maintained libraries for simple tasks like reading CSV files all the way to complex deep learning methods. We recommend getting started with Python if you’re new to computer vision. If you take the leap, here are the pros and cons you can expect.


The official Python documentation contains a vast amount of resources and guides that can help you with almost any Python task you might have.

Additionally, a lot of effort has been put into creating machine learning-oriented packages and libraries to ease the product development process. Free access to these libraries also reduces the time and need to develop new solutions.

Overall, Python is quite simple when it comes to understanding the code, even if you’re a beginner, or even if a Python application you’re reviewing was written by a beginner. Most computer vision code is quite complex, so more readable code means that developers are free to focus on the aspects that require more attention.


One of the biggest disadvantages of Python is its execution speed. Python’s ease of use comes at the price of performance: Its interpreted nature makes it slower than compiled languages such as C++.

To mitigate performance bottlenecks, some of Python’s libraries, like NumPy, implement a lower-level API to improve performance. If you decide to work on a production-level computer vision system, make sure that you review the documentation for any frameworks or libraries you choose and understand their performance implications.

Computer Vision Libraries

As we’ve mentioned, one of Python’s strong suits is its library availability. Let’s look at a few useful libraries for computer vision tasks.

PyTorch and TensorFlow

PyTorch and TensorFlow are very popular general deep learning libraries. While they aren’t specific to computer vision, you can still learn a lot by accomplishing computer vision tasks with these tools. For example, PyTorch documentation offers a list of pre-trained neural networks — some of which are quite advanced — that are being used in production computer vision systems.


Pytesseract is a tool that recognizes text within images. Developers can choose to either send output to a file or to simply print the text on a console. Pytesseract supports numerous file types, including common ones like JPEG, PNG and GIF. You can even detect text in different languages.


Face-recognition is a library that allows developers to process images and videos containing faces. If you’re getting started with facial tracking for instance, this library can help you determine the location of specific facial landmarks within an image.

The library can also perform more advanced tasks like matching a person’s face to the list of known faces that you provide ahead of time, for a simple identity detection system.


Compared to the previous two, imutils provides quite simple, yet useful functions for processing images. It allows you to resize and rotate images, and detect edges, among other tools. When put into practice it is a vital library for computer vision in Python.

The Future of Computer Vision

We can only speculate as to what the future holds in store. Yet as computer vision continues to advance towards its goals, we may hope that these will one day be accomplished. One such key objective is to attain a level of information processing through images and other visuals that’s comparable to that of humans.

The manifestation of computer vision’s potential could mean the saving of countless lives through anomaly detection in medical settings. It can also aid in stopping armed individuals before they enter a crowded public space with an intent to cause harm.

Another foreseen advancement is the steady fall in the difficulty of training and maintaining computer vision systems. At the moment, larger computer vision applications can be expensive to run as they require lots of compute power for optimal performance. As the algorithms get more efficient and hardware more advanced, we’ll likely see more computer vision uses at lower cost and lower latencies. This would enable more real-time computer vision systems, such as for autonomous driving and navigation, as well as uses of computer vision algorithms for broader sets of tasks currently done by humans, like sorting fruits and vegetables at a supermarket.

Want To Become a Computer Vision Expert?

In this article, we explored just a fraction of the capabilities that computer vision has to offer. From recognizing objects and faces, to tracking and manipulating images, the field aims to reach human-like visual processing abilities.

With Udacity’s specialized Computer Vision Nanodegree program, you too can start leveraging your Python skills to develop computer vision applications to add to your portfolio.

Enroll in our Computer Vision Nanodegree program today! 

Start Learning