Intro to Data Analysis: Data Analysis using NumPy and Pandas

Thank you for signing up for the course! We look forward to working with you and hearing your feedback in our forums. Let's get started!

Need help getting started?


Course Resources

Necessary packages

To take this course, you will need the following Python packages:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • unicodecsv or csv
  • IPython notebook

We recommend installing Anaconda, which comes with all of these packages except for seaborn. Once you have Anaconda, you can install seaborn using the command conda install seaborn.


You can find the datasets used in the course in the Downloadables section for each lesson.

Course Syllabus

Prerequisite Knowledge

To take this course, you need to be comfortable programming in Python.

  • You should be familiar with if statements, loops, functions, lists, sets, and dictionaries. To learn about any of these topics, take the course Intro to Computer Science.
  • You should also be familiar with classes, objects, and modules. To learn about these topics, take the course Programming Foundations with Python.

Lesson 1: Data Analysis Process

In this lesson, you will learn about the data analysis process, which includes posing a question, wrangling and exploring your data, drawing conclusions and/or making predictions, and communicating your findings. You will complete an analysis of Udacity student data using pure Python, with minimal reliance on additional libraries.

Lesson 2: NumPy and Pandas for 1D Data

In this lesson, you will start learning to use NumPy and Pandas to make the data analysis process easier. This lesson focuses on features that apply to one-dimensional data. You'll learn to use NumPy arrays, Pandas Series, and vectorized operations.

Lesson 3: NumPy and Pandas for 2D Data

In this lesson, you'll continue learning about NumPy and Pandas, this time focusing on two-dimensional data. You'll learn to use two-dimensional NumPy arrays and Pandas DataFrames. You'll also learn to group your data and to combine data from multiple files.

Final Project: Investigate a Dataset

In the project, you will use NumPy and Pandas to go through the data analysis process on one of a list of recommended datasets.


I'd like to thank Trish McCallister for producing and editing the course, and Nathaly Machatius and Liz Keleher for managing the project keeping everything running on time. I couldn't have made this course without them.

-- Caroline