This blog post was last updated on July 27, 2021.
Regardless of your previous experience or skills, there exists a path for you to pursue a career in data science. I’m here to help you know what skills you need to develop, and where you can learn them.
Specifically, my team and I have worked with industry leaders to identify a core set of eight data science competencies you should develop. I’ve outlined them below, and you can find additional detail and learning resources in the Ultimate Data Skills Checklist at the conclusion of this post.
Let’s get started!
The 8 Data Science Skills That Will Get You Hired
No matter what type of company or role you’re interviewing for, you’re likely going to be expected to know how to use the tools of the trade — and that includes several programming languages. You’ll be expected to know a statistical programming language, like R or Python, and a database querying language like SQL.
A good understanding of statistics is vital as a data scientist. You should be familiar with statistical tests, distributions, maximum likelihood estimators, etc. One of the more important aspects of your statistics knowledge will be understanding when different techniques are (or aren’t) a valid approach.
Statistics is important at all company types, but especially data-driven companies where stakeholders will depend on your help to make decisions and design / evaluate experiments.
If you’re at a large company with huge amounts of data or working at a company where the product itself is especially data-driven (e.g. Netflix, Google Maps, Uber), it may be the case that you’ll want to be familiar with machine learning methods.
This can mean things like k-nearest neighbors, random forests, ensemble methods, and more. A lot of these techniques can be implemented using R or Python libraries so it’s not necessary to become an expert on how the algorithms work. Your goal is to understand the broad strokes and when it’s appropriate to use different techniques.
Multivariable Calculus & Linear Algebra
Understanding these concepts is most important at companies where the product is defined by the data, and small improvements in predictive performance or algorithm optimization can lead to huge wins for the company.
In an interview for a data science role, you may be asked to derive some of the machine learning or statistics results you employ elsewhere. Or, your interviewer may ask you some basic multivariable calculus or linear algebra questions, since they form the basis of a lot of these techniques.
You may wonder why a data scientist would need to understand this when there are so many out-of-the-box implementations in Python or R. The answer is that at a certain point, it can become worth it for a data science team to build out their own implementations in house.
Often, the data you’re analyzing is going to be messy and difficult to work with. Because of this, it’s really important to know how to deal with imperfections in data — aka data wrangling.
Some examples of data imperfections include missing values, inconsistent string formatting (e.g., ‘New York’ versus ‘new york’ versus ‘ny’), and date formatting (‘2021-01-01’ vs. ‘01/01/2021’, unix time vs. timestamps, etc.).
This will be most important at small companies where you’re an early data hire, or data-driven companies where the product is not data-related (particularly because the latter has often grown quickly with not much attention to data cleanliness), but this skill is important for everyone to have.
Data Visualization & Communication
Visualizing and communicating data is incredibly important, especially with young companies that are making data-driven decisions for the first time, or companies where data scientists are viewed as people who help others make data-driven decisions.
When it comes to communicating, this means describing your findings, or the way techniques work to audiences, both technical and non-technical. Visualization-wise, it can be immensely helpful to be familiar with data visualization tools like matplotlib, ggplot, or d3.js. Tableau has become a popular data visualization and dashboarding tool as well.
It is important to not just be familiar with the tools necessary to visualize data, but also the principles behind visually encoding data and communicating information.
If you’re interviewing at a smaller company and are one of the first data science hires, it can be important to have a strong software engineering background. You’ll be responsible for handling a lot of data logging, and potentially the development of data-driven products.
Companies want to see that you’re a data-driven problem-solver. At some point during the interview process, you’ll probably be asked about some high-level problem, possibly about a test the company may want to run, or a data-driven product it may want to develop.
It’s important to think about what things are key to the process and what things aren’t. How should you, as the data scientist, interact with the engineers and product managers? What methods should you use? When do approximations make sense?
Learning Data Science Skills, Landing The Job
Depending on the type of data science jobs you’re looking to land, some of these skills may be more necessary than others. However, being well-rounded in your field is always a desirable attribute when interviewing for positions, so taking the time to expand your suite of data science skills is definitely worth the time invested.
Udacity’s Nanodegree programs offer an excellent way to learn all the data science skills discussed above.
For beginners, the Business Analytics Nanodegree program is a great place to start learning Excel, SQL, and Tableau.
If you have some experience, you could start with the Data Analyst Nanodegree program, where you’ll use Python, R, and SQL to tackle data projects.
The Ultimate Data Skills Checklist
Here is another valuable resource you can utilize to ensure you’re learning the skills that will lead to a successful data science career. It’s an amazing time to advance in this field. Here’s to your future in Data Science!