No matter what kind of Python program you’re writing, it’s very likely that you’ll find that your program needs to handle some data. Sometimes data needs to be stored, analyzed, manipulated, ordered, transferred, or even used as input into machine learning models. Since data is so important in the Python world, it’s valuable to carefully study how to handle it correctly and efficiently.
In this article, we’ll discuss data structures and the standard formats for arranging and storing data in Python. We’ll also discuss algorithms, the ordered sets of instructions used to process data. We’ll talk about both of these concepts, and illustrate the concepts with working Python code. Let’s begin!
Getting started: a list and a tuple.
Let’s start by defining some simple data in Python:
my_list = [5,3,1,8,2]
Here, we created an object in Python called my_list. This object is a collection of numbers. But more precisely, this object is called a list. A list is simply a collection of items, stored together in a single variable. We create lists in Python by using the square brackets ([ and ]), and separating individual items with commas.
This is not the only way to create a collection of items in Python. Let’s try using parentheses instead of square brackets:
my_tuple = (5,3,1,8,2)
This new object, called my_tuple, contains the same numbers as my_list. But since we used parentheses instead of square brackets, we call this object a tuple instead of a list.
Comparing lists and tuples.
Lists and tuples are similar in many ways. For example, we can access individual items in lists and tuples in the same way:
print(my_list[3])
print(my_tuple[3])
Here, we’ve printed the item with index 3 in both my_list and my_tuple, and we see the same result for both commands (8). You can see that in this way, lists and tuples work the same.
But lists and tuples are not the same in every way. Let’s see what happens if we try to change a value in our list and our tuple:
my_list[3] = 7
my_tuple[3] = 7
The first line here redefines the 8 in my_list to be equal to 7. The second line looks like it should do the same thing for my_tuple, but if you try to run it, you’ll get an error. That’s because lists and tuples have different rules about assigning values: assigning new values to elements of lists is allowed, but assigning new values to elements of tuples is not allowed. The technical word for this property is mutability; since list items can be changed, we say lists are mutable, and since tuple values can’t be changed, we say they’re immutable. Tuples and lists may seem similar, but the difference in their mutability shows you that they’re not identical.
Lists and tuples are two slightly different formats for storing data in Python. In other words, they’re two different data structures. Becoming familiar with the types of data structures that can be created in Python will help you work with data more effectively.
Example 1: sorting a list of numbers.
Now that we’ve introduced how to store data in two types of data structures, let’s do more with the data. Let’s start by defining our list again:
my_list = [5,3,1,8,2]
You can see that this list appears to be in a random order: it’s neither sorted from smallest to largest nor from largest to smallest. Often, we’ll have a need to sort data, so it will be useful to understand the systematic sets of instructions, or algorithms, that are used for tasks like sorting.
Think about how you might write code to sort a list of numbers like this one from smallest to largest. One thing you might start with is a simple swap: take two numbers that are out of order, and make them trade places. Let’s start with 3 and 1, the items with index 1 and 2, respectively, in our list:
the_index = 1
if my_list[the_index] > my_list[the_index+1]:
my_list[the_index],my_list[the_index+1] = my_list[the_index+1],my_list[the_index]
This code is straightforward: it checks whether a particular list item is greater than the next list item. If any item is greater than its successor in the list, then those two items are out of order, and so the code swaps them. If you run this code, then run print(my_list), you’ll get the following output:
[5,1,3,8,2]
The list is not completely ordered yet, but we’ve made an improvement: we swapped the 3 with the 1 so that at least those two items are in order from smallest to largest.
If we do repeated swaps like this one, the whole list will eventually be sorted. That’s the goal of the following code:
number_of_passes = 0
while number_of_passes < (len(my_list)):
the_index = 0
while the_index < (len(my_list)-1):
if my_list[the_index] > my_list[the_index+1]:
my_list[the_index],my_list[the_index+1] = my_list[the_index+1],my_list[the_index]
the_index +=1
number_of_passes+=1
You can see the same code we used for swapping two items in the middle of this code snippet. But instead of swapping only two items, this code contains nested loops that do repeated swaps of all of the items in our list, enough times that the entire list is eventually sorted. After running this code, you can run print(my_list) to see that the list is sorted correctly from smallest to largest.
This process for sorting a list has a special name: it’s called bubble sort, and it’s a simple algorithm for sorting data. As you progress in your Python education, you can learn more algorithms like bubble sort, and how to use them well. You can also learn about more data structures like lists and tuples, and how each data structure can help you write effective Python code.
Example 2: sorting a list of tuples.
Now that you’re familiar with lists and tuples (two data structures), and bubble sort (an algorithm), let’s try an example that combines them all together. The following code defines a list whose individual elements are all tuples. Then, it uses bubble sort to sort the list from smallest to largest based on the first element of every tuple:
list_of_tuples = [(3,5),(1,9),(2,8),(0,5),(9,6)]
number_of_passes = 0
while number_of_passes < (len(list_of_tuples)):
the_index = 0
while the_index < (len(list_of_tuples)-1):
if list_of_tuples[the_index][0] > list_of_tuples[the_index+1][0]:
list_of_tuples[the_index],list_of_tuples[the_index+1] = list_of_tuples[the_index+1],list_of_tuples[the_index]
the_index +=1
number_of_passes+=1
print(list_of_tuples)
After running this code, you’ll see the following output:
[(0, 5), (1, 9), (2, 8), (3, 5), (9, 6)]
You can see that our list of tuples is sorted from smallest to largest based on the first element of each tuple. (The first tuple has the smallest first element, 0, the second tuple has the second smallest first element, 1, and so on, and then the last tuple has the largest first element, 9.) Congratulations! You can work with multiple data structures in Python, and you can implement a sorting algorithm.
Learn more about data structures and algorithms.
Data structures may seem simple, and the algorithms we use to work with them may seem mundane. But mastering some of these simple ideas can help you write Python code better, more efficiently, and more quickly. When you get to the most advanced ideas in the Python world, you’ll be glad that you mastered some of these basics.
If you find these topics interesting or useful, explore Udacity’s Data Structures and Algorithms Nanodegree. You can use the fundamental skills and knowledge gained in the course to build Python projects at any level of sophistication.