What do Data Analysts do?

A Data Analysts main responsibilities includes finding, retrieving, wrangling, and delivering insights from data. Data Analyst also help to report and uncover meaningful insights from the data underlying products. Specifically, they are responsible for obtaining, analyzing, and reporting on data ranging from business metrics to user behavior and product performance.

For example, responsibilities may entail:

  • Writing queries to retrieve data from a database and share them with right stakeholders
  • Looking through user behaviors to find insights or trends that can be used to improve their company's product's performance
  • Interpreting results of A/B tests and make product recommendations based on results

How to become one with or without formal education

As a Data Analysts it's important to have a strong combination of analytical (math/stats and programming), communication (presentation/data visualization) skills, systematic approach to problem solving with a high attention to detail, and the ability to apply them in a business context. Below we've outlined a few ways where you can learn some new skills.

There are a number of publicly available datasets on the web—they can be a great resource and provide you with opportunities to build up a portfolio of interesting independent projects. Our friends at Mortar have curated a master list of interesting data sets found by some of the best and well-known data scientists in the field today.

If machine learning is more your style, Kaggle competitions can be a great arena to hone your skills and prove yourself (some companies search the Kaggle leaderboards when hiring!).

If you want to present your findings through data visualization, you can create and share interesting visualization with others on sites like Many Eyes, Plot.ly, or Blocks.io.

To showcase your new skills and projects you can create your own website through GitHub pages, WordPress, Medium, or other webpage or personal blog platforms.

The portfolio that gets you an interview

A good portfolio should showcase a series of projects and show the range of skills that you've learned.

Ideally these projects will demonstrate your:

  • Hands-on experience with R, Pandas, Numpy, Scipy, Scikit-Learn or related data analysis tools
  • Experience working with and wrangling very large (too big to fit into one spreadsheet), disparate and/or unstructured data sets
  • Knowledge of machine learning and data-mining techniques
  • Strong problem solving, math, statistics and quantitative reasoning skills

Most importantly, these projects should demonstrate your outstanding communication skills. Specifically, showing that you can analyze complex data sets, find interesting insights, and present them in a clear and simple manner in the right business context.

Concepts of Data Analysts

What is a Data Scientist

Learn about the skills a Data Scientist should have.

Machine Learning and Self-Driving Cars

Learn about how Google's Self Driving car uses Machine Learning.

How Memes Spread Through Facebook

Learn about memes and how they spread in social media.

What is Data Wrangling

Learn what it means and how it fits into data analysis.

What to learn in what order?


If you are interested in becoming a data scientist, you should be competent in and able to apply the following skills in your day to day work.


It is important to have programming skills as a Data Analyst. There are times where you may use a non-programming tool like Excel, but some of the best and most common tools, like Pandas, Numpy, and other libraries, are programming based. With these programming based tools you'll be able to do a more in-depth analysis and much more efficiently. Both Python are R are good programming languages to start with because of its popularity.


At the very minimum, you should be able to understand the fundamentals of descriptive and inferential statistics. You should understand the different types of distributions, which statistical test are applicable to what context, and be able to explain the basics of linear regression in an interview.

Machine Learning

The techniques in machine learning are incredibly powerful especially if you have huge amounts of data, and you need to use this data to predict the future or make calculated suggestions. You should know a few of the most common algorithms from supervised and unsupervised learning (they are two different classes of machine learning algorithms), such as k-nearest neighbor, support vector machines, and k-means clustering. You may not need to know the theory and implementation details behind these algorithms, but it's important to understand when to use these algorithms.

Data Munging

In an ideal world, the data sets that you work with will be cleaned and ready to be analyzed. However, in the real world, that's rarely the case. It's very likely that your data sets will be missing values, be ill-formatted, or are entered incorrectly. Let's talk about dates, for example, some systems will represent September 1st, 2014 as 9.1.2014 and some other systems may represent it as 09/01/2014. In situations like this your data munging skills will come in handy.

Communications and Data Visualization

As a Data Analyst, your job is to not only interpret the data but to also effectively communicate your findings to other stakeholders, so you can help them make a data informed decision. Many stakeholders will not be interested in the technical details behind your analysis, that's why it's very important for you to be able to communicate and present your findings in a way that is easy to understand.


To get you started here are some of the most popular programming languages and tools to become acquainted with.

  • Python or R: Not only are these programming languages easy to learn (compared to say C), some of the most popular data science libraries, from doing data analysis to data visualization, are build on top of these two programming languages.
  • Pandas/Numpy/Scipy: The trifecta of Python data science libraries work really well together. Pandas help to structure numerical or time series data in a way that is easy for both analysis and manipulation. Numpy helps to implement many commonly used scientific and mathematical operations, like matrix multiplication, so you don't have to reinvent the wheel. Scipy extends upon Numpy and contains many more fully-featured versions of the mathematical operations that you can find in Numpy.
  • Scikit-Learn: Machine learning algorithms are hard to implement efficiently and correctly. Scitkit-Learn is a tool that has been battle-tested, it's a Python library that has implemented common machine learning algorithms for you, from ensembled methods to k-means to SVM, it has it at all.
  • When you're ready to create a scatter plot with log scales and ten of thousands of data points, Mattplotib and Ggplot2 should be your go to libraries. They are the de-facto plotting visualization libraries for Python and R respectively.
  • This is the only JavaScript library in this list. Matplotlib and Ggplot2 are great if you want to create static visualizations or graphs. However, if you want to create interactive visualizations, i.e. something pops up or changes shape when your mouse cursor hovers over the graph, D3.js is your library. However, you will use some HTML, CSS and JavaScript so make sure to brush up your Front-End web development skills before trying out D3.js.


Our data analyst Nanodegree program will help you learn all of the skills listed, but there are other great resources as well. Here are some of our favorites from our friends: