Chances are you’ve encountered deep learning in your everyday life. Be it driverless cars that seemingly use actual vision, browser applications that translate your texts into near-perfect French, or silly yet impressive mobile apps that age you by decades in a matter of seconds — neural networks and deep learning are ubiquitous. In this article, we’ll cover all the important aspects to get you started on deep learning neural networks.
What Is Deep Learning?
Given all the newfound hype surrounding deep learning, you’d be forgiven for thinking that neural networks are a recent innovation. As a matter of fact, we can trace them back over sixty years. In 1957, pioneer psychologist Frank Rosenblatt built a classifier that could distinguish between pictures of different shapes and letters. That monster of a machine was called “the perceptron.”
The perceptron had implemented a mathematical model to mimic the activity of the human brain’s nerve cells (i.e., neurons). A neuron gets activated whenever the accumulated activity of its neighboring cells reaches a certain threshold. In a neural network, this is mimicked through the use of activation functions and weighted connections between cells.
Unlike brain cells, however, the perceptron consisted of only one layer of neurons and therefore had limited abilities. It wasn’t until the 1970s that researchers started building neural nets with several layers that fed information to each other. The stacking of layers leads to our notion of these networks as having “depth.” We now know that deep networks are far more powerful than shallow ones.
However, AI enthusiasts had to wait until the new millennium to truly harness these deep networks’ abilities. The advent of cheap computing power and the huge data repository that is the internet provided fertile ground for neural networks to finally deliver what they had promised for so long. And that’s when we entered the golden age of deep learning.
What’s So Great About Deep Learning?
Our little history lesson aside, what’s actually so great about neural networks? Put simply, they solve problems that other AI models cannot.
Natural language processing (NLP) is a prime example. Producing and understanding language is something that comes natural to us humans. It’s only when we try to devise a formal model of our own language, that we realize how complex linguistic systems really are. How could a rule-based AI account for all those sentences and words that we have never used before?
And most machine-learning applications such as tree-based classifiers or support vector machines are really only fit to deal with tabular data. Language data, being unstructured, is an entirely different beast.
Enter deep learning language models. Coupled with ample training data, complex learning algorithms and hardware capable of handling millions of computations per second, machines luckily are now able to produce language that sounds, well, natural. Today, neural nets produce translations accurate enough to cause a massive restructuring of the translation industry.
NLP is just one of many areas where neural networks have raised the bar considerably. Today, areas as disparate as cancer diagnostics, stock market analysis and the entertainment industry rely heavily on deep learning algorithms. While neural networks and deep learning are not a one-size-fits-all, they offer a pretty decent catalogue of architectures for a broad range of applications.
What Kinds of Neural Network Architectures Are There?
In an area that’s so promising for various industries, it should come as no surprise that research is evolving at light-speed, coming up with smarter, faster implementations by the minute. All the large tech companies now employ deep learning experts who seem to be in neverending — albeit highly productive — competition. Let’s look at a few neural network architectures that have made it to the deep learning hall of fame.
But why do we even refer to deep learning models as “architectures”? The reason is that we conceptualize these models as graphs: layers of interconnected neurons that are stacked upon each other. Different types of neural network models employ certain patterns of layers and connections. We can think of them as different architectural paradigms.
Convolutional Neural Networks
The convolutional neural network (CNN) is the prototypical network for computer vision with deep learning. It was conceived by Yann LeCun et al. in 1998, towards the end of “the second winter of AI.” During that era, trust in deep learning, as well as funding for research in the field, were scarce. Today, CNNs feature in virtually every beginner’s class on neural networks. In fact, most deep learning initiates start out by implementing some sort of convolutional neural net.
CNNs work by scanning an image — or another type of two dimensional input — using various filters. These filters detect patterns in the data, which they pass on to the network’s higher-level layers. The hidden layers then modify and aggregate the information fed to them, with the goal of reducing the complexity of the original input.
For example, in image recognition, complex input consisting of lots of different pixels needs to be mapped to a single output, such as the class that an image belongs to. Such a task requires a great deal of abstraction — precisely what CNNs excel at.
Recurrent Neural Networks
Recurrent neural networks (RNNs) are typically applied to process sequences, such as written language or DNA strings. In an RNN, a single neuron performs the task of processing the entire sequence, word by word, letter by letter, or nucleotide by nucleotide. What’s important is that it remembers the previous state and includes it in its computation of the current state.
This sequential nature of RNNs brings about several problems. For example, during training, the optimization function might become useless by converging towards high or low extremes (known as the vanishing or exploding gradient problem).
Another problem common with recurrent nets is a condition akin to amnesia. As the net processes each element of a long sequence, previously processed elements lose their impact over time. Once the network reads the final element of the sequence, it tends to forget the elements that it processed at the beginning.
Over the years, deep learning researchers introduced several updated versions of RNNs to make up for those problems. Nowadays, however, recurrent neural nets are often replaced by attention-based networks like the transformer.
The Transformer
It sounds like an invincible superhero, and some people would argue that this is an accurate description of the transformer. This architecture is based on a technique called “attention.” Other than processing a string sequentially, the attention mechanism focuses only on those aspects of the input that actually carry information. This is analogous to how humans process data: Rather than taking in all the information at once, we’ve learned to place our attention on select parts, and ignore the rest.
In 2017, Google’s AI research team published a now legendary paper titled “Attention Is All You Need.” True to its title, the transformer architecture does away with computationally expensive RNN cells, replacing them with a mechanism called “self-attention.” This results in a neural net that’s both faster to train and more accurate in its predictions. Nowadays, transformers power most NLP applications.
Generative Adversarial Networks
What if you could build two networks that train each other, and instead of feeding them data, they would generate it? That’s the idea behind adversarial learning and the corresponding neural architecture, generative adversarial networks (GANs). During a Quora session with Yann LeCun, the head of Facebook’s AI group called adversarial learning “the most interesting idea in the last 10 years in ML.”
How Does a Neural Network Learn?
Let’s say we’ve got our data and our architecture. But how do neural networks actually learn? By using math, of course. Every machine learning application learns through the combination of a loss or error function and an optimization algorithm. The loss function tells the model just how far off its prediction was from the true values, while the optimization function nudges it towards learning a more accurate representation of the data’s underlying distribution.
Now remember that a deep neural network consists of many layers of interconnected nodes. If our network misclassifies a given picture, how do we tell the network’s various nodes and edges — all of which might be complicit in the error — that they have collectively messed up? The answer to this question lies in the backpropagation algorithm (often called “backprop” for short).
Like other optimization functions, backprop uses differential calculus to minimize a model’s error and update its weights. Its ingenuity lies in the fact that it proceeds backwards through the network, using the outcome of each layer to update the layer that lies beneath it.
How Do I Build My Own Neural Network?
To implement a neural net architecture, you’ll need just a few things: a framework, computing power and data. Once you want to train neural networks that can go beyond simple tasks like handwritten digit recognition, you’ll need access to a graphical processing unit (GPU). A free alternative is Google’s Colab, a cloud-based platform that lets you use a GPU (or an even more powerful TPU) to train your models for as long as 12 hours at a time.
On the other hand, data sets are widely available, for instance on the Kaggle platform. Remember that when it comes to neural networks, more is more. For building your networks, you most likely won’t want to design your own neurons and layers — that’s what frameworks are for. Check out our blog post on PyTorch vs. Tensorflow to help you decide which library better fits your purposes.
Start Your Deep Learning Journey
Mastery of neural networks and deep learning is indispensable to becoming a machine learning engineer. This exciting field will continue producing applications to make our lives even easier. Enroll in our Deep Learning Nanodegree to start making history.