While a single data point doesn’t have much significance, a collection of data may reveal trends and patterns. Those with the ability to read data can make sense of the past and predict the future. That, in short, is the job of data scientists. Equipped with a diverse set of skills, these professionals find and tell stories hidden in data. In this article, we’ll show you how to learn data science.
What is Data Science?
Every day, humanity produces more and more data — billions of byproducts of online activity are stored in large databases. Anyone with a business has data to collect. It’s now accepted wisdom that, in the right hands, data generates value, which is why it’s often called “the new oil.”
Making sense of (big) data is the purpose of data science. Being a young discipline, its area of application isn’t clearly defined. However, we’re increasingly seeing the separation of data science from related disciplines such as data analysis and machine learning. And what’s certain: Data scientists are now in high demand, a trend that shows no signs of slowing down.
Who Can be a Data Scientist?
As we’ve mentioned, data scientists need interdisciplinary competence. They should have some knowledge of an SQL flavor — to query relational databases — as well as one of the major programming languages for data science, such as Python.
In addition, a data scientist should know how to use tools from statistics and machine learning. However, a programmer, statistician and machine learning engineer all rolled into one still doesn’t make a data scientist. Data scientists typically exhibit two key traits that set them apart from their peers: a natural curiosity for understanding data, and a talent for relating findings to a larger audience.
Data scientists often work in close collaboration with corporate decision-makers. The latter often don’t have the technical background that’s necessary for interpreting the numbers. Therefore, it’s important to represent data in the clearest way possible with appealing and expressive visualizations. That’s the purpose of “data viz“, another cornerstone skill of the data scientist.
Where Do Data Scientists Work?
Try searching the term “data science” on a job platform, and you’ll find a wide range of different work areas — and not only at the big tech companies. It’s not surprising that data scientists are busy refining Netflix’s recommendation algorithm, or hacking social networks at Facebook. But did you know that comparatively traditional companies employ data scientists, too?
Many companies have come to understand their data’s value and are eager to turn that knowledge into profit. Nowadays, everyone from large supermarket chains to the BBC is looking to employ data science experts.
How to Learn Data Science
Learn the Basics
We’ve already established a few prerequisites to being a data scientist. In fact, brushing up on at least some of your basics is the best way to learn data science.
Don’t fret — you don’t need to be a massive Python nerd to become a data scientist. But you should know the tools to get the job done. If you’re new to programming, look into some online resources. There’s a multitude of free resources to help you learn.
If you’re already comfortable with functions and classes, you can acquaint yourself with the relevant libraries. Data-crunching packages like pandas and numpy come to mind, as well as Python’s machine learning library scikit-learn.
We’ve mentioned the importance of visualizing your data-driven insights. Luckily Python offers some great tools in that arena, too. Check out matplotlib and seaborn libraries for a few examples of such tooling.
Another cornerstone of data science is Structured Query Language, or SQL for short. SQL statements are short and expressive, and allow you to pull data from one or more databases with a single line of code. As with anything in programming, the easiest way to learn SQL is to use it. Install a free version, join a course and start querying!
While it’s important to know your way around scikit-learn’s implementations of support vector machines and decision-tree-based algorithms, you should also study the theory of machine learning. When you know a bit about the underlying mathematical models, it’s much easier to pick the right algorithm for your applications — and understand why one algorithm might exhibit a certain behavior.
Of course, statistics is important, too. Most of us have encountered stats in our formal education. Check if you still have the basics, such as probability distributions and aggregates. It’s even more key to hone your statistical instinct. Seeing a statistical observation in a larger context is something we don’t really learn at school, but which is a vital skill in data science.
Work with an Interesting Dataset
What’s that, you’re bored by theory and ready to get your hands dirty? To start practicing, find a freely available dataset to answer a question that seems interesting to you personally. Perhaps you want to investigate crime in Chicago? Or find out which factors lead to a good Yelp review? If you’re feeling ambitious, you could even join a Kaggle competition.
Once you’ve collected your data, you’re gotten to one of the most enjoyable phases of any data science project. During exploratory data analysis (EDA), you’ll investigate the data: Does it have any missing values? Is it possible to impute those? What do the columns mean and how are they related? While you might be itching to jump to the next phase and try out some fancy machine learning models, it’s crucial that you spend some time on EDA to understand your data.
This might be a good time to talk about data quality. If you’ve only worked with educational data sets in the past, like those included in scikit-learn, you might be shocked to see that data-in-the-wild is a lot different. Real-world data is full of missing and wrong values. Sometimes all you have is just a bunch of obscurely named columns. Here, you’ll learn the very important lesson to never trust your data too much.
Practice Communicating Your Findings
Since we’ve established that communication is essential to your job as a data scientist, practicing this skill can’t hurt. You could write up a blog post and illustrate it with data visualizations of your most interesting findings. Be very clear about who your intended readers are. Are they techies or not? Do they know how to interpret data? You might find it difficult at first to convey what makes your results relevant. Try placing your findings within a larger context that your audience can relate to.
Look at Data-Driven Projects Critically
A great way to train your own data literacy is by looking at other projects and taking notes of what works — and what doesn’t. For example, you could look at some data-driven articles that cover the ongoing coronavirus pandemic. Ask yourself: What story does this article tell? Do the visualizations support it? How intuitive are the graphs and are the axis labels correct? You might be surprised by the amount of stories that’s unclear or even misleading.
Become an Insider
There’s a lively community out there that dispenses valuable knowledge on a regular basis. Consider joining a mailing list or a Slack channel related to data science and data viz. Consider following data science blogs such as the Nightingale journal on Medium, or or subscribing to a data science podcast where professionals discuss their work.
Do an Online Course
While a lot of programmers are self-taught, data science is an area in which you’d definitely benefit from a professionally-taught course. As we’ve outlined earlier, data science is an extremely rich area requiring you to combine many different skills and talents. A typical data science course teaches you how to analyze and clean your data and build a machine learning pipeline. Look for courses that let you practice your communication skills and provide feedback on your work.
Start Your Data Science Journey
Data science is one of the most versatile and exciting areas in tech today. If you’re truly after the best way to learn data science, join our specialized and highly-acclaimed Data Science nanodegree. There, you’ll go through all the steps of a data science project — from EDA to dealing with stakeholders — to gain hands-on experience and build your portfolio.