Programming Languages - Python - python finding items in a list

Finding Items in a Python List

Lists are among the most commonly used data types in Python. They can store any kind of object, are easily extendable, and many programs use them to process collections of items. 

At times, however, you may not need to access an entire list, but instead a specific element. Maybe you want to know if an item occurs in a list, or whether one list contains the same elements as another. How would you proceed? In this article, we’ll cover the most effective ways to find an item in a Python list.

What Is a Python List?

A list in Python is a collection of elements. The elements in a list can be of any data type:

>>> cool_stuff = [17.5, 'penguin', True, {'one': 1, 'two': 2}, []]

This list contains a floating point number, a string, a Boolean value, a dictionary, and another, empty list. In fact, a Python list can hold virtually any type of data structure.

A student of Python will also learn that lists are ordered, meaning that the order of their elements is fixed. Unless you alter the list in any way, a given item’s position will always be the same. This is different from sets and dictionaries, which are unordered. 

Sets and dictionaries cannot contain the same element twice. For lists, that’s no problem. In fact, here is an entirely conceivable list in Python:

>>> penguins = ['penguin'] * 5
>>> penguins
['penguin', 'penguin', 'penguin', 'penguin', 'penguin']

Finally, Python lists are “mutable,” meaning they can be changed. This sets them apart from tuples, which are immutable. One of the most common list operations is appending items to the end of a list:

>>> cool_birds = ['robin', 'penguin', 'kiwi', 'kingfisher']
>>> cool_birds.append('kakapo')
>>> cool_animals

['robin', 'penguin', 'kiwi', 'kingfisher', 'kakapo']

Our list of cool birds has gained a new member!

Checking Whether a List Contains an Item

Why might you want to find items in a list in the first place? One use case may be simply checking if an item is part of a list or not. In this section, we’ll be looking at two different methods to do that.

The in Operator

If you don’t need to know the number of occurrences, you could simply use the in operator, which will return the Boolean value True, if the list contains the item:

>>> 'robin' in cool_birds
True

And it’ll return “False” if not:

>>> 'kookaburra' in cool_birds
False

The in Operator and Sets

The in operator is simple and easy to remember. But in combination with lists, it is not particularly fast. That’s because when we want to check for an item, our program first has to process the entire list. If our list does not contain the item, Python will still have to loop through all the elements in the list before it can put out “False.” 

In practice, time complexity is usually only an issue with very long lists. If you’re working with a longer list, it’s a good idea to convert a list to a set before using in. That’s because sets (like Python dictionaries) use a lookup, or hash table to check whether an item exists. Hash tables considerably speed up the lookup process:

>>> import timeit
>>> long_list_of_numbers = range(1000000)
>>> large_set_of_numbers = set(long_list_of_numbers)
>>> timeit.timeit('999999 in long_list_of_numbers', globals=globals())
0.11862506601028144

Using range(), we created a long list containing all numbers from 0 to 999,999. It took about 120 milliseconds to check whether the list contained 999,999, which is its last number.  

>>> timeit.timeit('999999 in large_set_of_numbers', globals=globals())

0.0623531190212816

This same operation was almost twice as fast on a set. But it’s longer to create a set than a list, so the time gain really only pays off if you’re performing multiple lookups.

Sets also provide a handy way of checking whether two lists contain the same elements—regardless of their individual order, or how many times an item occurs in a list. By converting the lists to sets, you can compare them with the equality operator (==):

>>> list1 = ['kiwi', 'kiwi', 'kiwi', 'kiwi', 'kakapo']
>>> list2 = ['kakapo', 'kiwi']
>>> list1 == list2
False

>>> set(list1) == set(list2)

True

While the two lists are certainly not identical, they do contain the same items, which we discovered by using sets.

The count() Method

Sometimes you’ll need to go beyond knowing whether a list contains an item. You might also want to determine the number of that item’s occurrences. If that’s the case, you can use count:

>>> penguins.count('penguin')
5

Just like the in operator, you can use count() even if the item is not in the list:

>>> penguins.count('kookaburra')
0

Finding Items and Their Positions

Other times, it’s not enough to check whether an item is part of a list, or even how many times. In this section, we’ll be looking at how to get the indices of one or more items in a list.

The index() Method

Since lists are ordered, there is value in knowing an item’s exact position within a list. Maybe you want to use the index for slicing, or splitting a list into several smaller lists. The simplest way to do so would be to use index():

>>> cool_birds.index('kiwi')
2    # Remember that Python starts counting at 0

Note that index() only ever returns the position of the first item. When an item is not actually in a list, index() throws an error that stops the program:

>>> cool_birds.index('kookaburra')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'kookaburra' is not in list

The enumerate() Function

If you want to get the indices of all occurrences of an item, you can use enumerate(). This is a very handy function that “zips” together a number range and a list. Coupled with an if-condition, enumerate() helps us find our indices. Consider the following example, where we use a second list to store the indices:

>>> indices = []>>> for i, bird in enumerate(cool_birds):
... if bird == 'penguin':
...       indices.append(i)
...
>>> indices
[1]

We might even use a broader matching condition. For example, we could change the if-condition so that it matches every cool bird starting with a “k”:

>>> for i, bird in enumerate(cool_birds):
... if bird[0] == 'k':
...       indices.append(i)
...
>>> indices
[2, 3, 4]

Unfortunately, this method takes up a lot of space—three lines, as compared to our index() function, which only uses one line! If you have some Python experience, you may have already guessed our solution to this dilemma: the list comprehension. 

List Comprehensions

A list comprehension lets you write an entire for-loop—including the if-condition—on the same line, and returns a list of the results:

>>> indices = [i for i, bird in enumerate(cool_birds) if bird[0] == 'k']
>>> indices
[2, 3, 4]

How about extracting all numbers divisible by 123456?

>>> indices = [i for i, x in enumerate(long_list_of_numbers) if x % 123456 == 0]
>>> indices
[0, 123456, 246912, 370368, 493824, 617280, 740736, 864192, 987648]

List comprehension syntax is great if you want to save some space in your code. Many developers also find it more readable than the nested for-loop.

Using NumPy for Numerical Data

You’re now aware of the pros and cons of lists and sets when it comes to allocating items in a Python data collection. But if you’re working with numerical data, there’s another data type that you should know about: NumPy arrays. To understand what sets arrays apart from lists, let’s take a closer look at how Python implements the latter.

Python lists are really linked lists. Each item is assigned a separate, quasi-random place in memory, and it contains a pointer to the address of the next item. This is why operations pertaining to list-traversal are so expensive. To look at every item in the list, your program has to jump from one element to the next.

Arrays on the other hand are stored in contiguous memory. That’s why, when you create a Numpy array, you need to tell it the kind of data type you want to store, so that it can reserve enough space in memory for your array to fit.

To look up items in a numpy array, use where():

>>> import numpy as np

>>> long_array_of_numbers = np.arange(1000000)
>>> indices = np.where(long_array_of_numbers % 123456 == 0)>>> indices

(array([0, 123456, 246912, 370368, 493824, 617280, 740736, 864192, 987648]),)

This method is as time-efficient as our list comprehension—and it’s syntax is even more readable. NumPy is undoubtedly one of the most important Python libraries out there. Whether you’re planning to get into machine learning, data science, or geospatial modeling: NumPy will be your best friend.

Continue Your Python Journey

In this tutorial, we looked into methods for finding items in a Python list. From the in operator to list comprehensions, we used strategies of varying complexity. Along the way, we discussed the pros and cons of working with different data structures such as lists, sets, and NumPy arrays.

Looking to take your Python skills to the next level?

Enroll in our Introduction to Programming nanodegree, where you’ll master fundamental Python concepts like logic checks, data structures, and functions.