Data Analyst Nanodegree

Thank you for signing up for this Nanodegree! For important program information refer to the Nanodegree Student Handbook, also available for download in your Nanodegree Portal. You're part of a cohort - a community of students who will work at about the same pace and interact in Udacity Discussions, our forum system. We look forward to working with you and hearing your feedback in the forum!


Need help getting started?


Contents


Course Resources

What is a Nanodegree?

A Nanodegree is a new type of credential, designed to prepare you for a job. It is built with industry for you to master skills that employers truly seek in a Data Analyst. The Nanodegree is project-based: you'll complete several projects, with guidance and code reviews from our Coaches, to learn and show off your skills. It offers a personalized learning roadmap: take only the courses you need to ace projects! We'll customize your path to be as efficient and effective as possible. See how it works.

Downloadable Materials

You can download Supplemental Materials, Lesson Videos and Transcripts from Downloadables (bottom right corner of the Classroom) or from the Dashboard (first option on the navigation bar on the left hand side).

Nanodegree Projects

Project 0: Analyze Chopstick Length (Optional)

This project is connected to Lesson 1 of Statistics, but depending on your background knowledge of statistics, you may not need to take the entire lesson to complete this project.

How do I complete this project?

  1. Navigate to http://continuum.io/downloads in your broswer.
  2. Install Python using the Anaconda distribution, which comes with IPython. We recommend this installation since it comes with many useful packages and is easy to install. Don't install Python 3.4 as the Nanodegree Program uses Python 2.7 throughout the courses and projects.
    • On a PC, click the Windows icon and select "Windows 64-Bit Python 2.7 Graphical Installer". You can also select the 32-bit installer if you have a 32-bit machine. Then run the installer and follow the instructions on the screen.
    • On Mac or Linux, follow the same process but select the appropriate installer for your platform.
  3. Download the IPython notebook from either the Downloadables section or this link.
  4. Open your Command Prompt (PC) or terminal (Mac or Linux).
    • On a PC click the Start button and search for "Command Prompt".
    • On a Mac type command + spacebar. Then, type ''terminal" in the Spotlight Search. You can also search for "terminal" in finder.
  5. Navigate to the directory where you downloaded the IPython notebook file.
  6. Run the command ipython notebook Data_Analyst_ND_Project0.ipynb in your terminal.
  7. Read through the notebook and answer the questions. Keep in mind you can refer to the statistics course, search on google, or head to the discussion forums if you get stuck or have a question.
  8. Once you are finished, download the notebook as an HTML file. Click on File -> Download.As -> HTML (.html) in the IPython notebook. If you get an error about "No module name <module_name here="">", then open a terminal and try installing the missing module using pip install <module_name> (don't include the "<" or ">" or any words following a period in the module name).

Project 1: Test a Perceptual Phenomenon

This project is connected to the Statistics course, but depending on your background knowledge of statistics, you may not need to take the whole course to complete this project. If you would like, you can use the Statistics Placement Advisor to determine what material you need to review.

How do I complete this Project?
Follow these instructions and create a pdf or html document answering the questions. These document formats are compatible across a broad range of computers and browsers and are one of the surest ways of making sure that your intents are received properly. If you are using a word processing program such as Microsoft Word or LibreOffice, make sure that you save your document as a pdf and include the pdf in your project submission.

There is no need for specific software for this project, but it will be useful to have at least a spreadsheet program for reading in the data. You are free to select an alternative software program you are more familiar with or are more comfortable using to analyze the data.

Project 2: Analyzing the NYC Subway Dataset

This project is connected to the Intro to Data Science course, but depending on your background knowledge of data science, you may not need to take the whole thing to complete this project.

How do I complete this project?
If you would like, you can download the data set used for the Intro to Data Science course and explore Problem Set 2 to 5 independently on your own computer.
The download links are below:

  1. Complete all of the questions in Problem Sets 2 through 5 in the Intro to Data Science course.

  2. Answer these short questions in a pdf or html document. Please do not use doc or docx format because there are compatibility issues across browsers. If you are using a word processing program such as Microsoft Word or LibreOffice, once you are done, save the file as pdf and include it in your submission.

Project 3: Wrangle OpenStreetMap Data

This project is connected to the Data Wrangling with MongoDB course, but depending on your background knowledge of data wrangling, you may not need to take the whole thing to complete this project.

How do I complete this project?

  1. Finish lesson 6: Make sure all Lesson 6 programming exercises are solved correctly.
  2. Review the rubric and sample project: The Project Rubic will be used to evaluate your project. It will need to Meet Specifications for all the criteria listed. The Sample Project is an example of what your final report could look like.
  3. Choose your map area: Choose any area of the world from https://www.openstreetmap.org, and download a XML OSM dataset. The dataset should be at least 50MB in size (uncompressed). We recommend using one of following methods of downloading a dataset:
  • Download a preselected metro area from Map Zen
  • Use the Overpass API to download a custom square area. Explanation of the syntax can found in the wiki.
    In general you will want to use the following query: (node(minimum_latitude, minimum_longitude, maximum_latitude, maximum_longitude);<;);out; e.g. (node(51.249,7.148,51.251,7.152);<;);out; You can use the Open Street Map Export Tool to find the coordinates of your bounding box. Note: You will not be able to use the Export Tool to actually download the data, the area required for this project is too large.

Project 4: Explore and Summarize Data

This project is connected to the Data Analysis With R course, but depending on your background knowledge of exploratory data analysis, you may not need to take the whole class to complete this project.

How do I complete this project?

  1. Choose a data set from the Data set options document. You should choose a data set based on your prior experiences in programming and working with data. The data set you choose will not increase or decrease your chances of passing the final project.  In general, tidy data sets are easier to work with since each variable is a column and each row is an observation; there’s no data cleaning or wrangling involved. We offer guidance below for choosing your data set. Time estimates include reading all of the project instructions and rubric, conducting the analysis, and submitting the final project.
  2. Get organized by creating a single folder on your desktop that will eventually contain:
  • The RMD file that contains the analysis, final plots and summary, and reflection (in that order)
  • The HTML file that will be knitted from your RMD file
  • The data set you used (which you will only submit if you found your own data set)

3.Explore your data and keep track of your thoughts as you go (in an RMD file). Please refer to the Example Project that we have provided. Your report should look similar!
4. Document your exploration and analysis in an RMD file which you will submit. That file should be formatted in markdown and should contain (in order):

  • A stream-of-consciousness analysis and exploration of the data.
    a. Headings and text should organize your thoughts and reflect your analysis as you explored the data.
    b. Plots in this analysis do not need to be polished with labels, units, and titles; these plots are exploratory (quick and dirty). They should, however, be of the appropriate type and effectively convey the information you glean from them.
    c. You can iterate on a plot in the same R chunk, but you don’t need to show every plot iteration in your analysis.

  • A section at the end called “Final Plots and Summary”
    You will select three plots from your analysis to polish and share in this section. The three plots should show different trends and should be polished with appropriate labels, units, and titles (see the Project Rubric for more information).

  • A final section called “Reflection”
    This should contain a few sentences about your struggles, successes, and ideas for future exploration on the data set (see the Project Rubric for more information).

  1. Knit your RMD file. Your file should not be one long chunk of R code. It should contain text and plots interspersed throughout. The goal is to give the person reading the file insight into what you were thinking as you explored your data.
  2. Document your Data (if you chose your own data set.) The data set you submit (only if you chose your own) should include a text file, like those in the R documentation (e.g. ?diamonds) that describes the source of your data and an explanation of the variables in the data set (definition of any variables, units, levels of categorical variables, and the data generating process, such as how data was collected if possible).

Project 5: Identifying Fraud from Enron Email

This project is connected to the Intro to Machine Learning course, but depending on your background knowledge of machine learning, you may not need to take the whole thing to complete this project.

How do I complete this project?
A note before you begin: the projects in the Intro to Machine Learning class were mostly designed to have lots of data points, give intuitive results, and otherwise behave nicely.  This project is significantly tougher in that we're now using the real data, which can be messy and doesn't have as many data points as we usually hope for when doing machine learning.  Don't get discouraged--imperfect data is something you need to be used to as a data analyst! If you encounter something you haven't seen before, take a step back and think about a smart way around.  You can do it!

  1. Using the starter code you're given, engineer the features, pick and tune an algorithm, test, and evaluate your identifier. Several of the mini-projects in the course were designed with this final project in mind, so be on the lookout for ways to use the work you’ve already done. The features in the data fall into three major types, namely financial features, email features and POI labels.
  2. financial features: ['salary', 'deferral_payments', 'total_payments', 'loan_advances', 'bonus', 'restricted_stock_deferred', 'deferred_income', 'total_stock_value', 'expenses', 'exercised_stock_options', 'other', 'long_term_incentive', 'restricted_stock', 'director_fees'] (all units are in US dollars)
  3. email features: ['to_messages', 'email_address', 'from_poi_to_this_person', 'from_messages', 'from_this_person_to_poi', 'poi', 'shared_receipt_with_poi'] (units are generally number of emails messages; notable exception is ‘email_address’, which is a text string) POI label: [‘poi’] (boolean, represented as integer)
  4. Consider making, transforming or rescaling new features from the starter features.  If you do this, you should store the new feature to my_dataset, and if you use the new feature in the final algorithm, you should also add the feature name to my_feature_list, so your coach can access it during testing.  For a concrete example of a new feature that you could add to the dataset, refer to the lesson on Feature Selection.

Project 6: Make Effective Data Visualization

This project is connected to the Data Visualization course, but depending on your background knowledge of data visualization, dimple.js, and d3.js you may not need to take the whole course to complete this project.

There are three difficulty levels to this project, and you should choose an appropriate level depending on your experience with data munging and exploratory data analysis. The difficulty level you choose will not affect the evaluation of the project.

Find more information about the difficulty levels and the rest of the details about this project in this Project Description document.

Project 7: Design an A/B Test

This project is connected to the A/B Testing course course, but depending on your background knowledge of data analysis, you may not need to take the whole course to complete this project.

How do I complete this project?

Complete the final project as described in the final project lesson of the A/B Testing course.

You can also find the final project instructions, submission template, and rubric here for reference.

Nanodegree Cohorts and Community

Every month, a new cohort will start in the Data Analyst Nanodegree program. Each of these cohorts will have its own Udacity Discussions forum, where students can connect with one another and Udacity coaches to ask questions, share experiences, and celebrate each other's success. Students will be encouraged to form smaller study groups and also connect with one another over the group chat feature built into Udacity's website.