How To Build A Data Analyst Portfolio

How do you prepare for a data analyst interview?

At Udacity, we strive to be as responsive as possible to student queries of all kinds, and virtually every member of every team gets the opportunity to speak directly with students at one time or another. One subject that has definitely come up a great deal lately is the question of how to prepare for a data analyst interview. To speak to this matter, our own Mat Leonard—a Udacity course developer—is here to offer some thoughts and experience to nail your data analyst interview! His first tip? Prepare a data analyst portfolio by getting your projects online for all to see.

First, a bit of “official” background on Mat:

Mat Leonard earned a PhD in Physics from UC Berkeley, where he wrote his dissertation on neural activity related to short term memory. When it came time to make sense of his data, he turned to Python and the science stack including Numpy, Scikit-learn, and Pandas. He created his personal blog, Matatat.org, to publish small data projects online. For example, he explored linear regression models for predicting body fat percentage and a Bayesian approach to A/B testing.

And with all that said, here is Mat on our subject for today!

Putting Your Small Data Projects Online

At our recent Intersect summit, a student asked me how to gain the data analysis experience needed to land a job. My suggestion was to work on small data analysis projects then put them online as a data analyst portfolio, as I did with my blog. Small projects let you deepen your understanding of analysis methods or learn new techniques. Publishing them online builds a portfolio of your work, showing potential employers that you can successfully answer questions with data.

There are a few bits of technology that make getting your projects online a simple process. Firstly, Jupyter notebooks are an excellent tool for combining text, code, and images. Notebooks can be converted to Markdown files for use with web frameworks, such as Pelican. Finally, you can host your blog for free on GitHub. For the information to follow, you’ll need to know basic usage of git and GitHub: how to stage, commit, and push changes. You can learn about git and GitHub in our excellent course on version control. You’ll also need to be comfortable working from the command line, which you can learn about here.

Building Your Blog

To build the blog itself, you can use Pelican, a static site generator written in Python. Pelican uses the Markdown files as blog posts and automatically creates an archive, categories, and tags. There are multiple themes to use with Pelican or you can make your own for a unique and personal touch.

Mat's sample

Also, since Pelican creates a static site, you can host it on GitHub for free.

I’ll lead you through setting up your own blog and give you some suggestions for interesting projects. Here, I’m assuming you have experience with Git and GitHub, as well as Python and shell commands.

Firstly, you’ll want to install Jupyter and Pelican, just follow the installation instructions for both packages. I suggest installing these in a virtualenv or Anaconda environment. In your Pelican site folder, files in the content folder will be used as blog posts so this is where you’ll place the notebook Markdown files. To create the Markdown file, in your terminal:

$ jupyter nbconvert --to markdown /path/to/notebook.ipynb ~/projects/yoursite/content

This will create a file notebook.md in the content folder. If there are images in the notebook, a folder will be created containing the image files, called notebook_files. These files need to be moved to the images folder, where Pelican expects to find image files. The image links in notebook.md also need to be changed to the appropriate location. To do this, I wrote a short script which you can find in this gist. Simply copy the script to the content folder and run it, passing a file as an argument,

$ ./process_notebook.sh notebook.md

You’ll also want to edit notebook.md and add metadata to the beginning of the file, something like:

Title: My First Project
Date: 2016-02-03 10:20
Category: Regression

You can learn more about file metadata here. Now that you have the notebook file in the content folder, and everything is in it’s right place, you can create the site files with

$ pelican content

The site files are written to the output directory, all the HTML and CSS files needed to view your site are located there. Time to get your site out there for the world to see!

Hosting the site on GitHub is quite simple with GitHub Pages. Create a repository named username.github.io, where username is – pretty simply – your GitHub username. After creation, clone the repository to your computer, your projects folder is a good place to keep this. Now you need to copy the files from the output directory to the username.github.io repository folder. You can do this with

$ cp -r ~/projects/blog/output/* ~/projects/username.github.io

Then, stage and commit all the files in ~/projects/username.github.io. To publish your website, push the repository to GitHub. Check out your new blog at http://username.github.io! (And make sure to include this link on your resume before you start looking for that first data analyst interview.)

Your Data Projects: Where To Start

To flesh out your new website with content, you’ll want to do some small data projects. Kaggle hosts data science competitions that are a great place to start. The data is typically already cleaned and formatted, so you can focus on building the model. Kaggle also hosts a bunch of data sets not associated with competitions for you to explore, such as Hillary Clinton’s emails. Other resources include the datasets subreddit, Data.gov, and many more that you can find in this Quora thread. Many cities also have open data sets, such as San Francisco and New York City. You can also go web scraping to collect data like I did with Yelp.

The most important thing is to find data you are interested in that creates questions you want to answer. Take this time to learn new techniques and dive deeper into methods you already know. Working on small data projects and putting them online can really support your career objectives, especially since it shows employers that you love working with data and want to continue improving yourself. You’ll have lots of interesting items to speak about when you land your data analyst interview. Hopefully, from my guidance above, you can create your own online portfolio and get your excellent work out there for the world to see!

Thanks to Mat for taking the time to offer his insights on this subject, and thanks especially to our students for raising all the right questions, all the time. Keep ‘em coming!

Enroll Now

Schools

Popular

Featured

How To Build A Data Analysis Portfolio That Will Get You Hired

How do you prepare for a data analyst interview?

Putting Your Small Data Projects Online

Building Your Blog

Your Data Projects: Where To Start

Popular Nanodegrees

Programming for Data Science with Python

Data Scientist Nanodegree

Self-Driving Car Engineer

Data Analyst Nanodegree

Android Basics Nanodegree

Intro to Programming Nanodegree

AI for Trading

Predictive Analytics for Business Nanodegree

AI For Business Leaders

Data Structures & Algorithms

School of Artificial Intelligence

School of Cyber Security

School of Data Science

School of Business

School of Autonomous Systems

School of Executive Leadership

School of Programming and Development

Top Online Tableau Training Courses to Master Data Visualization in 2025

Data Cleaning Techniques: How to Prepare Clean Data for Analysis

Data Modeling 101: How to Structure Your Data for Maximum Insight

Jupyter Notebook Tutorial: A Beginner’s Guide for the Data Science Tool

Click below to download your preferred Career Guide

Schools

Popular

Featured

How To Build A Data Analysis Portfolio That Will Get You Hired

How do you prepare for a data analyst interview?

Putting Your Small Data Projects Online

Building Your Blog

Your Data Projects: Where To Start

Popular Nanodegrees

Programming for Data Science with Python

Data Scientist Nanodegree

Self-Driving Car Engineer

Data Analyst Nanodegree

Android Basics Nanodegree

Intro to Programming Nanodegree

AI for Trading

Predictive Analytics for Business Nanodegree

AI For Business Leaders

Data Structures & Algorithms

School of Artificial Intelligence

School of Cyber Security

School of Data Science

School of Business

School of Autonomous Systems

School of Executive Leadership

School of Programming and Development

Related Articles

Top Online Tableau Training Courses to Master Data Visualization in 2025

Data Cleaning Techniques: How to Prepare Clean Data for Analysis

Data Modeling 101: How to Structure Your Data for Maximum Insight

Jupyter Notebook Tutorial: A Beginner’s Guide for the Data Science Tool