If you’re interested in a career in data, and you’re familiar with the set of skills you’ll need to master, you know that Python and R are two of the most popular languages for data analysis. If you’re weighing Python vs. R for your first language, read on for some tips.
When it comes to data analysis, both Python and R are simple (and free) to install and relatively easy to get started with. If you’re a newcomer to the world of data science and don’t have experience in either language, or with programming in general, it makes sense to be unsure whether to learn R or Python first.
Luckily, you can’t really go wrong with either.
The Case for R
R has a long and trusted history and a robust supporting community in the data industry. Together, those facts mean that you can rely on online support from others in the field if you need assistance or have questions about using the language. Plus, there are plenty of publicly released packages, more than 5,000 in fact, that you can download to use in tandem with R to extend its capabilities to new heights. That makes R great for conducting complex exploratory data analysis. R also integrates well with other computer languages like C++, Java, and C.
When you need to do heavy statistical analysis or graphing, R’s your go-to. Common mathematical operations like matrix multiplication work straight out of the box, and the language’s array-oriented syntax makes it easier to translate from math to code, especially for someone with no or minimal programming background.
So you want to be a data analyst? Congrats! You’ve chosen a lucrative, geographically flexible, and super-secure career in a field that’s only going to continue to blossom in the years to come. Of course, you’ve got to do the up-front work of learning and sharpening the necessary skills before you can reap the benefits. Follow this step-by-step, from-the-ground-up guide to acquiring the tools to become an ultra-hireable data analyst.
To start, you need to know what skills are required for a data analytics career. The main areas of expertise needed are:
Statistics and Mathematics
Intuition and problem solving
No matter where you are on your path to a career in data, it probably seems daunting to consider all the skills you still need to be recruiter-ready. Typically, data workers come from three different backgrounds, and the path to becoming a data analyst depends on where you are coming from.
Given your starting point, what is your best path to your first data science job? What skills can you use to build your foundations in the most efficient and effective way?
That’s where we come in. It’s helpful to examine each of those three scenarios—zero experience, programming but no math, math but no programming—in terms of the building blocks you’ll need to build your ultimate data skill set.
How to Become a Data Analyst with No Experience
Programming is an integral aspect of data analysis. It’s the core skill that sets data analysts apart from business analysts. You’ll need to be able to program well in one or more programming languages—start with Python or R—and to have a good grasp of the landscape of the most commonly used data science libraries and packages (such as ggplot2, reshape2, numpy, pandas, and scipy).
What good is all that programming prowess without the ability to interpret the data? An understanding of statistics, including statistical tests, distributions, and maximum likelihood estimators, is essential in data analysis.
Acquaint yourself with both descriptive and inferential statistics. The former refers to quantitative measures that describe the properties of a sample; the latter, to predictive measures that infer properties of the larger population by interpreting the sample. You’ll need to know the basics, many of which will sound familiar from high school or college (mean, median, mode; standard deviation and variance; hypothesis testing), onto which you will layer more complex statistical skills as well (different types of data distribution: standard normal, exponential/poisson, binomial, chi-square; and tests for significance: Z-test, t-test, Mann-Whitney U, chi-squared, ANOVA).
Beyond descriptive and inferential stats, data analysts need to be adept at statistical experimental design. That’s the systematic process of selecting parameters in order to make results both valid and significant. For example, you’ll need to determine how many samples to collect, how different factors should be interwoven, how to choose good control and testing groups, and the like. To execute strong experimental design using tools like A/B testing and concepts like power law, best practice is to use as a barometer the idea of “SMART (Specific, Measurable, Actionable, Realistic, Timely) experiments.”
The language of data analysts is numbers, so it follows that a strong foundation in math is an essential building block on the path to becoming a data analyst.
At a basic level, you should be comfortable with college algebra. You’ll have to translate what you once knew as as “word problems” (real-world equivalent: business problems) into mathematical expressions; you’ll need to be able to manipulate algebraic expressions and solve equations; and you’ll need to be able to graph different types of functions, with a deep understanding of the relationship between a function’s graph and its equation.
Beyond that, a solid grasp of multivariable calculus and linear algebra will serve you well as a data analyst. Think: matrix manipulations, dot product, eigenvalues and eigenvectors, and multivariable derivatives.
Multivariable calculus and linear algebra, along with statistics, make up the basic foundation of machine learning, which enables data professionals to make predictions or calculated suggestions based on huge amounts of data. For a career as a data analyst, you won’t need to invent new machine-learning algorithms (advanced skills like that qualify you to become a data scientist), but you should know the most common of them. A few examples include principal component analysis, neural networks, support vector machines, and k-means clustering. Note that you may not need to know the theory and implementation details behind these algorithms, but you should understand the pros and cons, as well as when to (and when not to) apply them to a dataset.
In supervised learning, the “learner” (computer program) is provided with two sets of data, a training set and a test set. The computer “learns” from a set of labeled examples in the training set so that it can identify unlabeled examples in the test set accurately. The goal is for the learner to develop a rule that can identify the elements in the test set. It is supervised learning that makes it possible for your phone to recognize your voice, and your email to filter spam. Specific tools you’ll use include:
Naive Bayes classification
Ordinary Least Squares regression
support vector machines
and ensemble methods.
Unsupervised learning is what you’ll use when faced with the challenge of discovering implicit relationships, and thus hidden structure, in a given “unlabeled” dataset. Unsupervised learning makes it possible for Netflix to recommend movies you’d enjoy, and Amazon to predict products you’ll like. Specific tools you’ll use include:
Principal Component Analysis (PCA)
Singular Value Decomposition (SVD)
and Independent Component Analysis (ICA).
Lastly, reinforcement learning applies to situations that fall between the two extremes of supervised and unsupervised, i.e., when there is some form of feedback available for each predictive step or action, but no precise label or error measure. You can apply reinforcement learning when you want to figure out how to maximize rewards, for instance in arenas like robot control, chess, backgammon, checkers, and other activities that a software agent can learn. Specific tools you’ll use include:
and genetic algorithms.
Still with us? The last three abilities crucial to your development as a data analyst pertain to manipulating, displaying, and interpreting data. To transform raw material into a useful, organized datasets, data wrangling (also known as “data munging”) comes into play. This is the process of collecting and cleaning data so it can be easily explored and analyzed.
You’ll need to equip yourself with knowledge of database systems (both SQL-based and NoSQL-based) that act as a central hub to store information. It’ll be useful to be familiar with relational databases such as PostgreSQL, mySQL, Netezza, and Oracle, as well as Hadoop, Spark, and MongoDB.
Other concepts and tools essential to data wrangling include regular expressions, mathematical transformations, and Python String library for string manipulations. You’ll also need to know how to parse common file formats such as csv and xml files and how to convert non-normal distribution to normal with log-10 transformation.
It may all sound overwhelming right now, especially if you’re brand new not only to the skills involved, but to some of the terms themselves. Remember that all of these skills are stackable: each one you master will help you build the next, and the next after that, until you’re a fully equipped data analyst ready to kick butt and take some names.
Once you’ve cleaned, organized, arranged, plied, and interpreted the data, you want to be able to illustrate your findings visually so that stakeholders, including the data-illiterate, can fully understand. You won’t get any credit for your data analysis chops if you don’t communicate your insights clearly and effectively.
It’ll be helpful to be familiar with data visualization tools like ggplot, matplotlib, sea born, and D3.js. Of course, it’s key to be familiar not just with the tools necessary to actually display the data visually, but also with the principles underlying the visual encoding of that data. To wit, you’ll need to intimately understand the context of the business situation in order to determine how to situate your data visualization to be maximally relevant.
Data intuition and Problem Solving
Bolstered by the technical knowledge of the combined skills above, you’ve got to know how to think, how to ask the right questions. You could spend the rest of your life analyzing a single dataset and visualizing your interpretation in a multitude of formats with a plethora of findings. The reality is, you’ll only ever have a limited amount of time and space to address your associates’ questions in analyzing the data at hand. Therefore, it’s important to nurture an intuition about what things are important, and what things aren’t.
Work toward developing a deep understanding of the field in which you’re working, whether it’s the stock market or consumer packaged goods. Invest the time to work through as many datasets as you can, for example by participating in Kaggle competitions, to learn how to avoid dead ends. Learn to sense the “question behind the question” in assignments, digging down, in other words, to discover the exact business issues driving the need to analyze the data.
How to Become a Data Analyst by Building on a Programming Background
Did some, or a lot, of that content overview sound familiar to you? Have you been trained as a software engineer, or perhaps you studied programming in college, but yet lack the solid mathematical foundation required to become a data analyst?
No sweat. You’re in a great position to launch a learning journey, at the culmination of which you’ll be situated for maximum data analysis success.
Programming is an integral aspect of data analysis.
Here’s what you’ll need to learn next, in order, on the road to clicking “apply” on a data analyst job opening.
Statistics: You’ll need to be able to rigorously interpret, make inferences, and compare different types of data by applying the right approach, technique, or statistical tests to different types of distributions. Check out the above breakdown for specific tools and skills.
Probability: In order to draw accurate conclusions, data analysts need to be able to reason about the likelihood that an event could have happened or that it will happen. Check out the above breakdown for specific tools and skills.
Multivariable calculus/linear algebra: These advanced math skills are less important to know than statistics and probability, but will definitely be useful if you want to understand how machine learning actually works. In addition, if you envision wanting to leverage your data analyst chops into a career as a data scientist at some point, multivariable calc and linear algebra will provide the foundational knowledge to build your own algorithms.
How to Become a Data Analyst Building on a Mathematical Background
OK, so maybe you’re a math whiz, but have no knowledge of programming. Here’s a step-by-step guide to building that programming knowledge that’s so crucial to becoming a data analyst.
Variables, control flow, loops, functions: These are the basic building blocks of programming. Know them and love them.
Debugging: Your code will probably not work correctly the first time around, or could break when unexpected situations occur. When that happens, you’ll need to be able to figure out what the problem is and why it’s happening. This is where debugging skills will come in handy.
Object-oriented programming: Learn how to structure your code into object-oriented design patterns, so it can be easily reused, tested, and shared with other people.
Data structures: For extra credit, familiarize yourself with Stacks, Queues, Lists, Arrays, Hashmaps, Priority Queues, Tries, and Graphs. There are certain situations in which one data structure will be superior to others (in terms of memory usage and runtime efficiency), and if you understand these relationships, you can optimize your program to run faster and more efficiently. That’ll impress your team, and set you apart among other data professionals.
Algorithms: Knowing which algorithm to apply in which situation can reduce the running time of your program from a few days to a few hours, or the memory requirement from a few gigabytes to a few hundreds of megabytes. Work towards understanding divide and conquer (D&C) algorithms, greedy algorithms, dynamic programming, linear programming, and graph algorithms (depth vs. breadth vs. traversal, minimum spanning trees, and shortest path between two nodes).
Software design patterns: Want to make your code robust, reusable, and testable? Many pioneering software engineers and computer scientists have developed software design patterns to help you do so. Become comfortable with them so you can excel at your data analysis.
The Bottom Line
Data analysis is a fast-growing field, and there are a lot of voices out there sharing what you need to learn, in what order. The variety of information can be confusing, overwhelming, and discouraging.
Know that you can rely on this breakdown as the definitive guide to what you really do need to learn in order to land that first data analyst job, along with prescriptions for where to start, depending on your specific background.
The investment in a career as a data analyst is huge, no matter if you’re just starting out or if you’re expanding on existing abilities. But the payoff, we promise, is even bigger.
Want to learn more? Check out the Udacity Data Analyst Nanodegree program to start your career as a Data Analyst.
There are lots of resources out there to learn about, or to build upon what you already know about, data science. But where do you start? What are some of the best or most authoritative sources? Here are some websites, books, and other resources that we think are outstanding.
If you want to see the latest trends and read analyses of what’s happening in the data science field…
Flowing Data: On Flowing Data, Dr. Nathan Yau, PhD explores how data professionals—statisticians, scientists, designers, and others—analyze and visualize data to better understand the world around us. He also offers book recommendations, tutorials, a job board, and a membership feature to help budding data scientists grow and hone their craft. In addition to the tutorials and resources, Yau offers a humorous take on the challenges in working as a data professional, in topics such as the ethical challenges in gathering data and the mistakes often made in data analysis, and how data is used to track changes and growth in society over time.
FiveThirtyEight: Launched by data-wiz Nate Silver, FiveThirtyEight offers data analysis and visualizations of political, cultural, and economic issues. Their work ranges from light-hearted and interactive to in-depth and pointed and offers a great example of how data can be made accessible and applicable to everyday life.
R-Bloggers: Looking for a hub of content around open-source statistical software? R-Bloggers is the place! Currently over 500 blogs are featured on R-Bloggers, focusing on news and tutorials related to R, “a free software environment for statistical computing and graphics.” As an aggregator, R-Bloggers pulls in helpful content around the niche topic of the R-language, making it easy for you to follow major trends in this field and major contributors, all in one site.
Simply Statistics: Three biostatistics professors from Johns Hopkins University, Harvard University, and the Dana Farber Cancer Institute manage this site that is chock-full of articles about how data is being used (and mis-used) to solve complex problems. These professors also offer data analysis classes on Coursera and interview up-and-coming data scientists on their careers. The clear career-focused angle of the interviews allows you to forecast your own career trajectory.
Edwin Chen: A data scientist at Dropbox, Edwin Chen offers hands-on tips on how to create and improve algorithms and data analysis tools. What makes Chen’s blog helpful is his incorporation of large-scale algorithms and analysis (e.g. Facebook, Amazon). If you want more guidance on techniques and analysis rooted in current methods, Chen is a helpful resource.
Hunch: John Langford, Doctor of Learning at Microsoft Research, created this blog to explore machine learning, specifically, what it is and how we’re using it. If you’re new to machine learning or curious about what it means for your newly chosen career, Hunch offers an in-depth look into this topic by reviewing and analyzing new ideas (Allreduce (or MPI) vs. Parameter server approaches) and events like the Conference on Digital Experimentation.
If you want to learn more about data science…
Open Source Data Science Masters: This site offers a free list of online classes and resources. The resources are organized as a self-paced curriculum, with the assumption that you have a basic understanding of programming. The curriculum includes theoretical/foundational classes as well as tactical, hands-on classes in computer science, programming, and design so that you move through the curriculum with a strong understanding of data science.
Learn Data Science: Similar to Open Source Data Science Masters, Learn Data Science offers a self-paced curriculum that introduces you to four key topics in the machine learning field: linear regression, logistic regression, random forests, k-means clustering.
If you want to join a community where you can ask questions and learn from fellow data scientists and analysts…
Reddit Machine Learning Subreddit: With over 30,000 members, this site offers a wide range of people to connect with and to share challenges and solutions with. Redditors share news, research papers, videos, and more on machine learning, data mining, information retrieval, learning theory, and related topics.
Cross Validated: A Q&A community for people interested in stats, data viz, and more, this site offers a straightforward way to to ask questions about data science and to find the most helpful answers. You can also get a weekly digest of popular questions and unanswered questions so you never miss a conversation.
Datatau: Venture Beat refers to this site as a “Hacker News for data scientists,” and it lives up to its name. Interesting articles are shared and commented on, and users share career advice for people new to the data science field.
Metaoptimize: In this Q&A community for people interested in machine learning, data mining, natural language processing and more, questions are voted on and badges awarded, making it easy for visitors to find the most popular and helpful questions and answers.
Kaggle Competitions: If you’re interested in data science, you’ve likely come across Kaggle, a platform for data prediction competitions. While you can search through a list of upcoming competitions, the website also features a forum where visitors can look for partners for competitions, share resources, and ask for support in developing a career in data science.
If you want the latest data science news…
Data Science Weekly: Each Thursday, Data Science Weekly sends an email with the latest news and trends in data science. You can also search their site for interviews, job opportunities, and resources on how to build a career in data science.
KDNuggets: This website is full of great tutorials, articles, webinars, and more on data mining and building models. You can also have this helpful information sent to you twice a month by signing up for their popular newsletter.
If you want to stay on top of the latest research…
Attending conferences is a great way to stay on top of trends. However, if you can’t attend the following conferences in person, you can attend virtually by watching videos of the presentations and downloading the associated papers.
Neural Information Processing Systems Foundation (NIPS) Conference: NIPS is a premier academic conference on machine learning whose goal is to “foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.” At their recent conference in Montreal, Canada, they explored pressing issues around privacy, the role of machine learning in climate change, optimization, and networks. The papers submitted make for great reading.
Knowledge Discovery and Data Mining (KDD): Managed by the Association for Computing Machinery, the KDD encourages “advancement and adoption of the ‘science’ of knowledge discovery and data mining” by encouraging standards in terminology and methodology, and fostering community. Their recent conference in New York focused on how data science can be used for social good. While submitted papers aren’t available, you can see photos and videos from the conference here.
If you’re looking for brilliant minds to follow…
These expert data scientists tweet about the latest news and trends in the field.
Hillary Mason (@hmason): Data Scientist in Residence at Accel and Scientist Emeritus at bitly.
Jeff Hammerbacher (@hackingdata): Founder and Chief Scientist at Cloudera and Assistant Professor at the Icahn School of Medicine at Mount Sinai.
Peter Skomoroch (@peteskomoroch): Equity Partner at Data Collective, former Principal Data Scientist at LinkedIn.
Drew Conway (@drewconway): Head of Data at Project Florida.
Nathan Yau (@flowingdata): Statistician and Author of Flowing Data.
The bottom line
There are lots of websites, books, communities, and resources that you can use independently to improve your skill set and your knowledge of the data science field. With this list as a starting point, you should be able to find plenty of experts out there to help you grow and develop your own expertise. Who knows? One day you just might be a resource on someone else’s list.
So, you’ve learned the skills needed to become a data analyst. You can write queries to retrieve data from a database, scour through user behavior to discover rich insights, and interpret the complex results of A/B tests to make substantive product recommendations.
In short, you feel confident about embarking full steam ahead on a career as a data analyst. The next question is, how do you get noticed and actually hired by recruiters or hiring managers?
There are three main steps you should take in your master plan for data analysis domination: 1. Build data science projects; 2. Show your work and make it publicly available; 3. Network, network, and network some more.
Luckily, data analyst jobs are extremely abundant, lucrative, and intellectually fulfilling. There’s no shortage of work, and good work at that—it’s just a question of how to find it and earn it.
Build Data Science Projects
Building projects is a great way to apply and showcase the data analysis skills that you’ve added to your arsenal. It’s also a solid opportunity for you to demonstrate that you can work through a data problem end-to-end: from data acquisition and cleaning through analysis, to communicating your findings so clearly that even the tech-illiterate can follow along.
Where to start? To begin with, you can tap into projects from data science classes as inspiration. For example, Harvard’s CS109 Data Science Class makes its lectures available (for free!) via online video. You can also browse through student projects from the class on YouTube.
Got your brain juices flowing, but still not sure what project to tackle? Narrow in on a specific task or question you’re interested in solving. For example, there are lots of socially relevant data sets available online that you can analyze. Here are three specific examples of rich data pools you can pull from to examine data on a global, national, and hyperlocal level:
World Bank: The World Bank Open Data project provides free and open access to thousands of data sets about development in countries around the globe. You can browse by country, by topic (for instance, Energy & Mining, Climate Change, or External Debt), or by indicator (like labor force participation rate, military expenditure, or life expectancy at birth). Here’s a freebie: Compare post-earthquake metrics for the relief efforts in Haiti with those of Pakistan.
U.S. Census: The U.S. Census provides ample potential for fascinating data insights. Slice and dice data sets on everything from population estimates per square mile to mean travel time to work. Want more? Infochimps also has a great set of free APIs focused on census data. Census data is really great for conducting spatial analysis. For example, you could compare the average level of education to mean household income for all U.S. zipcodes and display the results on a browseable map.
Local data: New York, San Francisco, Seattle, Philadelphia, and other cities have all made some subset of their city data publicly available, from public transportation and energy usage to school test scores and crime. Check to see if your city, or one nearby, maintains an open data repository and have at it.
Another tactic for building up your portfolio of data science projects is participating in a data competition. For example, try your hand at a Kaggle competition. It’s a great way to gauge your abilities against those of your peers; and if you do well (i.e., place in the top 10), you’ll have another arrow in your quiver in the search for plumb data analyst jobs. You could even land an interview from the brand that sponsored the contest. Companies hiring data analysts are known to search the Kaggle leaderboards when hiring.
Lena Vayn, Head of Talent at startup Soldsie, confirmed that plugging away at data projects is a solid way to look good to recruiters and hiring managers: “It’s a great way to showcase your work and also learn MySQL and even some Python, depending on what kind of data role you want.”
Jake Perlman-Garr, Co-Founder of Datavore Labs, seconded the suggestion: “While you may not currently possess a job helping to build your data analysis skill set, intellectual curiosity is important. There’s plenty of public data available, so pick something that interests you and start to dig down into it.”
Show Your Work, Publicly
Speaking of establishing a data science portfolio, a crucial way to attract the notice of data analyst recruiters is to show and tell. Specifically, showcase your skills and projects on GitHub, or a personal site constructed through Jekyll, WordPress, Medium, Tumblr, SquareSpace, or another personal blog platform.
For bonus points, if you want to present your findings through data visualization, you can create and share interesting visualizations with others on sites like Many Eyes, Plot.ly, or bl.ocks.org.
A strong data portfolio should illustrate your range, including hands-on experience with R, Pandas, Numpy, Scipy, Scikit-Learn, or related data analysis tools; experience working with, and wrangling, very large (too big to fit into one spreadsheet) or unstructured data sets; knowledge of machine-learning and data-mining techniques; and strong problem solving, math, statistics, and quantitative reasoning skills.
Companies hiring data analysts are known to search the Kaggle leaderboards when hiring.
Tarush Aggarwal, who heads up the data engineering team at Offerpop, has seen from his experience that since data analysis differs from company to company, it’s crucial to nurture a broad skill set and then demonstrate those abilities in discoverable ways. “Each company requires their own customized solution for their use cases,” he said. “It’s better to gain a broader education in the data sphere rather than starting out focused only on one technology. Play around with as much as you can.”
The work you share should also demonstrate stellar communication skills. It’s all well and good if you can analyze exceedingly complex data and dig up interesting insights from it. But if you can’t relay those findings in a coherent way, in the correct business context, your skills will be of no use to an organization.
Network, Network, Network
Your Rolodex is your most powerful tool in your hunt for good work. Truly, sometimes it’s who you know, rather than what you know, that can land you the dream job. And having the right professional network at your fingertips can expose you to more job opportunities than if you were trying to land a gig alone.
Perlman-Garr confirmed that notion: “I can speak from experience, as all of my company’s early data-focused hires have come from our networks and the NYC technology community.”
A few good ways to build up your network of professional contacts:
Attend local data science meetups. They’re great opportunities to log face time with others in the industry who may hold positions you’d like to attain or else who know people who do. In addition, the people you meet may know of companies hiring for positions that you’re qualified for.
Reach out to other data analysts or data scientists on LinkedIn. Ask them relevant questions about their work, and ask what advice they’d give aspiring data analysts on finding or getting a job (just be courteous and appropriate, couching your “cold call” in the understanding that they are likely very busy and that you sincerely appreciate their time).
Answer questions in popular digital communities like Quora and Cross Validated in order to build your credibility and your online footprint. Many data professionals, as well as data recruiters and hiring managers, frequent those sites, and your posts and answers may impress them.
Aggarwal advised, “Speak to as many data analysts as possible from a diverse list of companies across industries, and identify what challenges they face and what solutions have worked or not worked for them in the past. Don’t be afraid to ask questions, it’s not a sign of weakness in any way.”
Also remember that professional relationships are a two-way street. Sometimes the best way to ensure you’ll get the most out of one is to do the assistance yourself, first: doing someone a favor by making an introduction or offering to review their e-book on Amazon, for example, is money in the networking bank for the future.
Even when you’ve got a job, if you’re not in love with it, networking can enable you to figure out exactly what you want to do, and then shift gears to something that’s a better fit. “One option is to join a larger organization where you can work your way into a role that interests you,” said Vayn. “Or join a smaller company to run data across a multitude of projects until realizing what sort of data analysis you truly want to do: build the actual processes and systems, or run queries and build recommendations.”
The Bottom Line
With data analysis experience under your belt, it’s time to put your skills to the test by putting yourself on the market as a data analyst for hire. This three-pronged approach—building personal projects, showcasing your work, and networking—will demonstrate both that you can do the work required of a data analyst and that you’re available to do so.
Every time you send a text message, type a tweet, post a Facebook photo, click a link, or buy something online, you’re generating data. And considering there are more than 4.5 billion Internet users in the world in 2020 (a quantity that’s tripled in the last 12 years) and 4.8 billion cell phone users, that’s a heck of a lot of data.
Fortunately, as data has multiplied, so has the ability to collect, organize, and analyze it. Data storage is cheaper than ever, processing power is more massive than ever, and tools are more accessible than ever to mine the zettabytes of available data for business intelligence. In recent years, data analysis has done everything from predict stock prices to prevent house fires.
All that data crunching requires an army of data masters. Translation: there’s never been a better time to pursue a career in data. The 2020 LinkedIn Emerging Jobs Report projected 37% annual growth in Data Science jobs. Enter: you.
The first step on your path to professional data whiz? Taking stock of your three main career options: data analyst, data scientist, and data engineer.
A data analyst is essentially a junior data scientist. It’s the perfect place to start if you’re new to a career in data and eager to cut your teeth.
Data analysts don’t have the mathematical or research background to invent new algorithms, but they have a strong understanding of how to use existing tools to solve problems.
Skills and tools
Data analysts need to have a baseline understanding of five core competencies: programming, statistics, machine learning, data munging, and data visualization.
Beyond technical skill, attention to detail and the ability to effectively present results are equally important to be successful as a data analyst.
How it translates
Data analysts are given direction from more experienced data professionals in their organization. Based on that guidance, they acquire, process, and summarize data. Data analysts are the ones managing the quality assurance of data scraping, regularly querying databases for stakeholder requests, and triaging data issues to come to timely resolutions. They also then package the data to provide digestible insights in narrative or visual form.
An enduring curiosity about data and close examination of evolving best practices and tools serves all data professionals well, no matter the level of seniority.
Some companies treat the titles of “data scientist” and “data analyst” as synonymous. But there’s really a distinction between the two in terms of skill set and experience.
Though data scientists and data analysts have the same mission in an organization—to glean insight from the massive pool of data available—a data scientist’s work requires more sophisticated skills to tackle a higher volume and velocity of data.
As such, a data scientist is someone who can do undirected research and tackle open-ended problems and questions. Data scientists typically have advanced degrees in a quantitative field, like computer science, physics, statistics, or applied mathematics, and they have the knowledge to invent new algorithms to solve data problems.
An enduring curiosity about data and close examination of evolving best practices and tools serves all data professionals well.
Data scientists are extremely valuable to their companies, as their work can uncover new business opportunities or save the organization money by identifying hidden patterns in data (for example, highlighting surprising customer behavior or finding potential storage cluster failures).
Skills and tools
Whereas a data analyst might look at data from only a single source, a data scientist explores data from many different sources. Data scientists use tools like Hadoop (the most widely used framework for distributed file system processing), they use programming languages like Python and R, and they apply the practices of advanced math and statistics.
The exact set of skills differs by organization and project, but this example from Data Science London gives a sense of how complex the data scientist’s toolkit can be:
The most valuable nontechnical skill a data scientist brings to the table is an intense inquisitiveness. Data scientists have to be driven to pose questions and hunt down solutions, and in so doing to unearth information that could transform a business.
As data scientist Gaëlle Recourcé, CSO at Evercontact, said, “I love the power of metrics and tracking user behaviors, because it gives me the opportunity to test personal intuitions and then have real empirical results that allow our team to make data-driven decisions and continually improve our product.”
How it translates
Data scientists essentially leverage data to solve business problems. They interpret, extrapolate from, and prescribe from data to deliver actionable recommendations. A data analyst summarizes the past; a data scientist strategizes for the future.
Data scientists could identify precisely how to optimize websites for better customer retention, how to market products for stronger customer lifecycle value, or how to fine-tune a delivery process for speed and minimal waste.
A data engineer builds a robust, fault-tolerant data pipeline that cleans, transforms, and aggregates unorganized and messy data into databases or datasources. Data engineers are typically software engineers by trade. Instead of data analysis, data engineers are responsible for compiling and installing database systems, writing complex queries, scaling to multiple machines, and putting disaster recovery systems into place.
Data engineers essentially lay the groundwork for a data analyst or data scientist to easily retrieve the needed data for their evaluations and experiments.
Skills and tools
Whereas data scientists extract value from data, data engineers are responsible for making sure that data flows smoothly from source to destination so that it can be processed.
As such, data engineers have deep knowledge of and expertise in:
Hadoop-based technologies like MapReduce, Hive, and Pig
SQL based technologies like PostgreSQL and MySQL
NoSQL technologies like Cassandra and MongoDB
Data warehousing solutions
How it translates
“My responsibilities are quite various,” said Social Searcher Data Engineer Dmitry Novikov. “They range from designing the system architecture and separate modules, to algorithm implementation and infrastructure requirements.”
Data engineers do the behind-the-scenes work that enables data analysts and data scientists to do their jobs more effectively. Here’s a visual look at the specific differences between data engineers and data scientists:
Chris Beland, who leads the data engineering team at Allclasses, describes what his team does, why it matters, and why he loves it:
“In my work right now, I do a lot of natural language processing, turning semi-structured, human-readable web content into highly structured machine-readable databases. My favorite thing to do is to teach the computer something concrete about the real world, like how humans write calendar dates and what they mean, or how the universe of class topics breaks down into categories and subcategories. Then I come up with some algorithms so my machine can exploit that new knowledge to parse and sort text and make sense of it just a little bit like a human would. I feel a bit like a proud parent when I can check the resulting database, give the program a virtual pat on the head for getting all the right answers, despite getting a lot of inputs I never anticipated, and with a satisfying click ship the data out to people who need it.”
The Bottom Line
You have many options when it comes to a career working with data. If you’re interested in exploring such a career, your three major options are data analyst, data scientist, and data engineer.
Sanjay Venkateswarulu, co-founder of big data analytics and visualization startup Datavore Labs, crystallizes why and how this subdividing has occurred: “Data analysts have morphed into these three or more specialized disciplines. I believe it is the same specialization that doctors went through at the birth of modern medicine. First there was your village leader or elder who played the main role, but as tools of the trade have become more and more specialized, we now have GPs, surgeons, and neurosurgeons.”
If you’re new to the field of data science, you’ll want to start by aiming for the GP in Venkateswarulu’s analogy, an analyst job. As you develop your skills and gain experience, you’ll be able to progress to data scientist or data engineer. Check out Udacity’s School of Data Science; we have programs for beginners and advanced learners.
I think platypuses are awesome. They look like a bizarre crossover of ducks, beavers, and otters. Highly venomous, they can kill a small animal or incapacitate a grown human with their poison. To add to its distinction, they don’t use sight or smell to hunt like most animals. They’re too hip for that. Instead they radiate electric fields generated through muscular contractions to locate prey. Platypuses are simply the most awesome creatures on this planet.
Why a platypus and this odd metaphor?
I am excited to announce that today, Udacity is launching its Data Analyst nanodegree to help you get a job as a data analyst and become the platypus of the tech world. We will help you learn the unique combination of skillsets to become someone who can code like a software engineer, derive insights from data like a statistician, and present information like Steve Jobs.
Specifically, we will teach you the:
Data wrangling skills to deal with the messiest of data
Statistical and machine learning knowledge to interpret and make predictions from data
Communication and data visualization skills to tell a data-based story to technical and non-technical audiences alike
Data Analyst is simply one of the most exciting jobs of recent years, but don’t take my word for it. Harvard Business Review calls data science jobs the sexiest job of the 21st century and predicts that it will become the most in demand job function of the upcoming decade.
So venture forth, join us in our Data Analyst nanodegree, and become an awesome data science platypus yourself. We are honored to have you start your data analyst learning journey with us!
Subscribe To Our Newsletter
learning = growing
Sign up for Udacity blog updates to get the latest in guidance and inspiration as you discover
programming, web development, data science, and more.