“Machine Learning is everywhere.” This is a phrase we see often these days, and it’s pretty close to a genuine truism. Netflix, Amazon, Siri, Pandora, the list goes on. But it’s not just entertainment and media. It’s also everything from the post office to healthcare to traffic to security. Really close analysis suggests that, for a great many of us, virtually every moment of our lives is touched at some point by Machine Learning.
Is this a good thing?
There was a time when anything that conjured the spectre of “Big Brother” would be understood as a clear negative. The mantra “Big Brother is watching you” was meant to suggest a tyrannical power watching over and controlling every aspect of our lives. But in today’s world of innovative Big Data, Machine Learning, Data Science, and Artificial Intelligence, it is clear there are incredible positives associated with the rise of these technologies.
Let’s look at some examples of Machine Learning in action, to get a sense of just what the impact is, and what the future might hold. These examples run a wide gamut, and together they will hopefully paint a clearer picture of where we’re heading with these technologies.
We’ll begin with Netflix, for two reasons. One, it’s a REALLY obvious example of Machine Learning in action, and one that we’re pretty much all familiar with. And two, Netflix has actually had a really important impact on the development of Machine Learning technology. The story begins back in 2006, with the announcement of The Netflix Prize:
“In 2006 we announced the Netflix Prize, a machine learning and data mining competition for movie rating prediction. We offered $1 million to whoever improved the accuracy of our existing system called Cinematch by 10%. We conducted this competition to find new ways to improve the recommendations we provide to our members, which is a key part of our business.”
The $1M prize was awarded to a team called “BellKor’s Pragmatic Chaos” on September 21, 2009. Interestingly enough, however, the new system was never implemented, because the Netflix business model was already changing too fast—DVDs were on the way out, and streaming was fast taking over.
Nowadays, Netflix most definitely utilizes Machine Learning technology, but their approach today is far more complex, nuanced, and layered, primarily because of the vast increases in available data. Here is just a partial list (excerpted from an article on The Netflix Tech Blog entitled Netflix Recommendations: Beyond the 5 stars) of the Machine Learning methods Netflix employs to continually improve their personalization technology:
- Linear regression
- Logistic regression
- Elastic nets
- Singular Value Decomposition
- Restricted Boltzmann Machines
- Markov Chains
- Latent Dirichlet Allocation
- Association Rules
- Gradient Boosted Decision Trees
- Random Forests
- Clustering techniques from the simple k-means to novel graphical approaches such as Affinity Propagation
- Matrix factorization
The various Machine Learning workflows that Netflix runs every day are now managed by something called Meson:
Meson is a general purpose workflow orchestration and scheduling framework that we built to manage ML pipelines that execute workloads across heterogeneous systems. It manages the lifecycle of several ML pipelines that build, train and validate personalization algorithms that drive video recommendations.
Incredible technology, right? And all in the service of achieving Netflix’s stated goal of predicting “what you want to watch before you watch it!”
The rise of what is termed “Personalized Medicine” is almost directly due to new advances in data management. Drawing on a report by The Academy of Medical Sciences, Wikipedia gives us the following definition of Personalized Medicine:
Personalized medicine is a medical model that separates patients into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on their predicted response or risk of disease.
And an abstract from a recent conference on the subject—the JSM 2015 conference—lays out a vision of the role Machine Learning plays:
“The overall goal is to target treatment specifically to each individual so that clinical outcomes for that individual are optimized. One direction of attack is to use patient data to discover decision rules which specify the treatment to use as a function of a vector of features from the patient. Regression and classification are important statistical tools for estimating such rules based on either observational data or data from a randomized trial, and machine learning can help with this because of its ability to artfully handle high dimensional feature spaces with potentially complex interactions.”
The potential implications for modern healthcare are almost literally staggering. A look at some of the talks from another recent conference—“Machine Learning for Personalized Medicine” (which is held as a satellite meeting of the European Human Genetics Conference, and which took place last week in Barcelona, Spain)—gives a bit of a window into the kinds of things already underway:
- Identifying drug-targetable key drivers of disease
- Integrative and quantitative analysis of disease mutations
- A Network Biology Approach to Epigenetic Regulation
The MLPM (Machine Learning for Personalized Medicine) organization is committed to growing a new generation of Machine Learning scientists who will “develop and employ the computational and statistical tools that are necessary to enable personalized medical treatment of patients according to their genetic and molecular properties and who are aware of the scientific, clinical and industrial implications of this research.” The ultimate goal they are working towards is an advanced union of Machine Learning and Statistical Genetics.
One of the talks at this conference focused specifically on key technical challenges associated with employing Machine Learning techniques in the service of Personalized Medicine. The talk—entitled “Removing Unwanted Variation in Machine Learning for Personalized Medicine”—was delivered by Terry Speed, from the Bioinformatics Division at the Walter and Eliza Hall Institute of Medical Research, in Australia.
“Unwanted variation can reduce precision and add bias (via confounding), leading to false positives and false negatives, poor classifiers and artificial clusters.”
Given that we’re talking healthcare and human lives, it’s obviously critical that we get the technology right if we’re going to deploy it. This requires a deep understanding of the full measure of Machine Learning’s implications. Pedro Domingos, a professor at The University of Washington, and a leading researcher in Machine Learning, is the author of The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. In that book, Domingos speaks directly to this need to deeply understand Machine Learning:
“When a new technology is as pervasive and game changing as machine learning, it’s not wise to let it remain a black box. Opacity opens the door to error and misuse. Amazon’s algorithm, more than any one person, determines what books are read in the world today. The NSA’s algorithms decide whether you’re a potential terrorist. Climate models decide what’s a safe level of carbon dioxide in the atmosphere. Stock-picking models drive the economy more than most of us do. You can’t control what you don’t understand, and that’s why you need to understand machine learning — as a citizen, a professional, and a human being engaged in the pursuit of happiness.”
Fraud Prevention and Enterprise Security
It follows that if those of us on the right side of the law are rapidly advancing our Machine Learning capabilities, the criminal element is doing the same, and probably a bit faster than we are, given that they’re not fettered by things like the law! Accordingly, the application of technology to the problems of fraud and security is of critical importance, and Machine Learning is playing an increasingly important role.
PayPal is one high-profile example of a financial company that is on the true bleeding edge of fraud prevention technology, and they continue to refine and advance their use of complex algorithms to manage extraordinary amounts of data. In a 2015 article from InfoWorld, Dr. Hui Wang, Senior Director of Risk Sciences for PayPal, offered a detailed explanation of how neural net algorithms have given way to Deep Learning techniques:
“A neural net tries to mimic a human’s way of processing information. We take ABC and try to create a relationship among them, and we take a CDE and create another relationship, and then on a higher level abstract the intermediate mini-model. So it’s kind of mimicking the human thought process. But in deep learning you’re basically taking it to many, many layers. It’s not just ABCDE, there are like 3,000 features out there and then within that 3,000 there are a lot of mini-classes of features. They have all kinds of relationships and we’re just adding layers and layers of these intermediary mini-models or mini-abstractions of the information — and in the end come up with the top level.”
Used in combination with linear and neural network Machine Learning algorithms, Paypal is able to collect and analyze “gargantuan amounts of data about buyers and sellers, including their network information, machine information, and financial data.” And this in turn allows the company to pursue their ultimate brand objective: trust.
Credit card companies face similar challenges, and are pursuing similar strategies. As noted in a recent article on The Conversation by Jungwoo Ryoo, Associate Professor of Information Sciences and Technology at Pennsylvania State University, “Americans used credit cards to pay for 26.2 billion purchases in 2012. The estimated loss due to unauthorized transactions that year was US $6.1 billion.” Ryoo goes on to note that because the federal Fair Credit Billing Act limits the maximum liability of a credit card owner to $50 for unauthorized transactions, credit card companies are responsible for the balance. So needless to say, these companies have a pressing need to prevent fraud at every opportunity. They are now doing so by employing Machine Learning algorithms that can assign fraud probability scores in real time to vast amounts of transactions, drawing on real-time analysis of equally vast amounts of data.
Similar initiatives are ongoing in the field of Enterprise Security, where companies are using increasingly sophisticated behavioral analytics tools to prevent security breaches. Karl Motey, writing recently on his BitNavi blog, says “companies today are developing tools that incorporate 5 key technologies.” He defines this as follows:
- Data analytics: machine learning and statistical analysis is increasingly being incorporated into the latest tools
- Data integration: some of the latest tools enable the detection and analysis of both structured and unstructured data
- Data presentation/visualization: trend analysis and other presentation tools are being introduced as visual tools for the end user
- Source system analysis: can be deployed on premise or could be cloud-based, with emphasis on the vendor’s knowledge of the source systems (e.g. SIEM or DLP)
- Service delivery method: either on premise or cloud-based, many vendors require the installation of an appliance into the company network
A recent article on the Hewlett Packard Enterprise blog nicely sums up the brave new world of Machine Learning’s burgeoning impact on Enterprise Security:
“Without development of the existing Big Data analytics stack over the last several years, this degree of automation would likely not have been possible. Coupled with these innovations in data ingestion, transformation, and analysis are several more recent developments in monitoring and automation. Combined with the power of machine learning algorithms (many of which aren’t fundamentally new but have been tailored to suit different needs) it’s finally time for true application security, unmanaged and automated, to emerge for companies of all sizes.”
As these examples make clear, Machine Learning really IS nearly everywhere. It informs everything from how we entertain ourselves, to how we heal ourselves, to how we protect ourselves. What is encouraging in all of this, is the depth of concern for the human condition. Whether it’s something as pleasurable as movie watching, or as serious as healthcare, the motivating agenda is the same—how can we improve things for people? There is a flip side to this of course. “The Ethics of Artificial Intelligence” is a topic currently igniting tremendous passions throughout the field, and there are justifiable concerns being raised.
But, as teachers of Machine Learning, we know firsthand that the collective hearts of our students are in the right place. And as they are the Machine Learning Engineers of the future, we are confident that the technology is in safe hands!
Look around you, can you see an example of Machine Learning in action?