data science - Programming Languages - Tech tutorial

Data Visualization in R

If you work with data, visualization is an essential skill. Not only does data visualization enable you to communicate clearly with colleagues and clients, but it also can help you better understand your data and find patterns that can help you in your work. With R, you can visualize data easily. In this blog post, we’ll introduce some data visualization in R techniques.

Let’s start by looking at some data. Consider the following nine numbers, which we’ll put together in a vector called simple_data:

simple_data<-c(1,1,2,3,5,8,13,21,34)

These nine natural numbers are called the Fibonacci numbers. They have a special pattern: the way to find the next number in the sequence is by adding the previous two numbers in the sequence (for example, 1+1=2, 1+2=3, 2+3=5, and so on). We can visualize these numbers in R using one simple command:

plot(simple_data)

If you run this in R, you should see the following plot:

Any time you have some data, you can plot it by using the plot() command like we did above. But we don’t always want all of our plots to look like this one. If we make just one change in our code, we can change the appearance of our plot:

plot(simple_data,pch=19)

Here, we adjusted the pch argument in the plot() command, setting it to pch=19. If you run this new code, you’ll see the following output:

You can see that the appearance of the plotted points has changed: instead of hollow circles, we see a solid circle for each data point. You can try other values for the pch argument as well. If you set pch=11, you’ll see a star, and if you set pch=4, you’ll see an “x” shape. 

Adjusting pch isn’t the only change we can make to the appearance of a plot. There are many other arguments that we can specify in the plot() command. For example, we can run the following code:

plot(simple_data,pch=19,col='red')

Here, we’ve specified a value for the col argument, which determines the color of the plotted points. If you run this code, you’ll see the following output:

Again, you can try other values, like col='blue' or col='green' to see how other colors look.

There are some other important arguments that we can specify when using the plot() function. Here’s an example of some code that uses several other arguments:

plot(simple_data,pch=19,col='red',main='Fibonacci Sequences',xlab='Index',ylab='Fibonacci Number')

Here, we’ve specified main (the argument that determines the title displayed on the plot), as well as xlab (the argument that determines the x-axis label) and ylab (the argument that determines the y-axis label). You can see the following output that’s generated by this code:

One crucial argument in the plot() function is the type argument. For example, add type='o' to your plotting code:

plot(simple_data,pch=19,col='red',main='Fibonacci Sequences',xlab='Index',ylab='Fibonacci Number', type=’o’)

When you run this code, you’ll see that the plotted points are shown with connecting lines between them, creating a line graph instead of a scatter plot. Other commonly used plot types are type='l' and type='p'.

Another command that can be helpful when creating visualizations is the abline() command. You can use abline() to draw extra lines on your plots. For example, try out the following code:

plot(simple_data,pch=19,col='red',main='Fibonacci Sequences',xlab='Index',ylab='Fibonacci Number', type='o')

abline(h=4.5)

abline(v=6)

When using abline(), you can use h= to specify that you want to add a horizontal line, and v= to specify that you want to add a vertical line. In this case, you can see that we’ve added one horizontal line and one vertical line to the plot:

By drawing these plots, we can notice a few things about the Fibonacci numbers in our data. The most obvious thing to notice is the exponential growth of the sequence. We can also easily see that this is a monotonic sequence: one in which each value is at least as high as the previous one. Sometimes these kinds of growth patterns are hard to notice or be sure about without visualizing the data first, and that’s one reason why data visualization is so important.

Data visualization is a crucial skill in the world of data science. Improving your data visualization in R skills can help you in your career, as it will improve your communication abilities as well as your speed in understanding data. 

Looking to expand your knowledge of R? Udacity’s Programming for Data Science with R Nanodegree  will teach you the fundamentals of R to help prepare you for a career in data science.