Select Page

Are you looking to learn more about parallelization in Python? If so, this article is for you. We cover the basics of parallelizing tasks and explain how parallel programs can speed up your Python code. We also provide our top three tips for creating error-free parallel programs in Python.

What is Parallelization in Python?

Parallelization in Python (and other programming languages) allows the developer to run multiple parts of a program simultaneously.

Most of the modern PCs, workstations, and even mobile devices have multiple central processing unit (CPU) cores. These cores are independent and can run different instructions at the same time. Programs that take advantage of multiple cores by means of parallelization run faster and make better use of CPU resources.

All common operating systems perform the tasks of assigning workloads to specific CPU cores and handling any interruptions, such as input/output operations and signals. To write programs that use multiple CPU cores, developers must use the constructs for parallel tasks that the operating system provides, namely processes and threads.

A process is an essential abstraction that represents a compute task. The operating system assigns a range from the machine’s memory to each process and makes sure that processes access only the memory assigned to them. This memory separation keeps processes independent and isolated from each other. The operating system’s kernel then handles access to shared resources (like network or terminal input/output) for all processes running on a machine. Process-based parallelism — running multiple processes on the same computer at the same time — is the most common type of parallelization out there.
Every process contains one or more threads. Threads are computing tasks in themselves, but they share the memory space of their parent process and, being linked to a single parent process, they can access their parent’s data structures. The operating system keeps track of all threads within the processes and handles the shared resources the same way it does with processes.

Here is a simple example of parallel code in Python 3 that creates multiple isolated processes to run a task:

from multiprocessing import Process

def numbers(start_num):
    for i in range(5):
        print(start_num+i, end=' ')

if __name__ == '__main__':
    p1 = Process(target=numbers, args=(1,))
    p2 = Process(target=numbers, args=(10,))
    # wait for the processes to finish

# output:
# 1 2 3 4 5 10 11 12 13 14

In this program, two separate processes run the numbers function at the same time with different parameters. First, we create two Process objects and assign them the function they will execute when they start running, also known as the target function. Second, we tell the processes to go ahead and run their tasks. And third, we wait for the processes to finish running, then continue with our program.
While the output looks the same as if we ran the numbers function twice sequentially, we know that the numbers in the output were printed by independent processes.

Why is Parallelization Useful in Python?

When running a program on a computer or mobile device with multiple CPU cores, splitting the workload between the cores can make the program finish faster overall. For example, if a function would take 5 minutes to complete on a single CPU core, you can run the program on two cores and, in an ideal scenario, halve the total run time to just 2 minutes, 30 seconds.
It generally makes sense to take advantage of parallelization in Python for programs that use the CPU heavily for computational work. Such workloads in the real world include arithmetical transformations on large arrays of numbers, vector math, image, sound, and video processing, and predicting outcomes using machine-learning models.
To get the full speed benefit from parallelization for your Python program, you must divide the program into discrete chunks of work, each of which can then be delegated to different threads or processes. You’ll also need to structure your application so that the parallel tasks don’t step over each other’s work nor create contention for shared resources such as memory and input/output channels.

Threads and Parallel Processes in Python

When implementing parallelization in Python, you can take advantage of both thread-based and process-based parallelism using Python standard library modules: threading for threads and multiprocessing for processes. These modules provide a set of Python functions that interface with the operating systems to create, run, and manage Python processes and threads.

The functions you can use in the threading and multiprocessing modules are similar. Below is an example of thread-based parallelism in Python; it looks much like the process snippet we walked through previously:

import threading

def numbers(start_num):
    for i in range(5):
        print(start_num+i, end=' ')

if __name__ == '__main__':
    t1 = threading.Thread(target=numbers, args=(1,))
    t2 = threading.Thread(target=numbers, args=(10,))
    # wait for the processes to finish
    # print a newline

# output:
# 1 2 3 4 5 10 11 12 13 14

Here we create two threads, point them to the numbers function as their target with a different parameter for each thread, and then run the threads.
While threads are faster and more convenient to use on the operating system level, there is a catch: Python uses a global interpreter lock(GIL) to make sure that threads don’t step on each other’s work. Every time a thread accesses a data structure, the GIL locks all other threads within the process, making them wait to access any data until the first thread has finished reading from or writing to memory. The locking and unlocking can happen hundreds of times per second and can slow down your program considerably.
No such limitation exists when running multiple processes through the multiprocessing module — each process runs separate from other processes and has its own GIL. Because of the GIL issue with threads, we recommend using separate processes to avoid problems with the global interpreter lock.

Three Tips for Writing Error-Free Parallel Code in Python

Writing high-performance parallel code in Python can be challenging since you’ll have many new factors to keep track of, from the type of your CPU to its L2 cache size, not to mention Python’s system for handling communication between parallel programs. In this section, we share our top three tips on how to get parallelization right in a Python project.

Be aware of your machine architecture

When writing a parallel program, knowing what’s under the hood on your development and production machines will determine whether your parallel program is a only little faster or significantly faster than a single-core program. Learn how many CPU cores are there on your machine and whether it can run more processes than there are physical cores (frequently available in Intel processors with Hyper-Threading). Know the sizes of your Python data structures in memory and how those sizes compare with the size of your CPU’s L1 and L2 caches. Since memory access is significantly slower than most CPU instructions, taking advantage of the CPU’s caching can dramatically improve your program’s performance.

Use messages instead of shared state

When processing a large data structure with parallel processes or threads, you will eventually need a way for the individual tasks to coordinate with each other. The easiest method of coordination is to have threads or processes write to a shared data structure, for example, multiprocessing.Array. However, using shared memory creates a range of coordination problems where tasks can overwrite each other’s data and cause runtime problems.
Instead, use primitives like locks and semaphores to coordinate processes and threads, and use queues or pipes to send messages between tasks.

Know what’s worth parallelizing and what’s not

Writing parallel programs always creates more complexity than writing regular applications. As software gets increasingly complex, it becomes less maintainable in the long term, and other engineers can’t dive as quickly into the program to make changes. In teams where it’s essential to have multiple developers working on the same applications, this additional complexity can slow down progress for the entire group or organization.
When writing a new program in Python, think about whether the speed gain from parallelization will be worth the additional complexity for your team. Consider the long-term consequences of writing complex parallel code as opposed to single-threaded programs, which are slower but more straightforward, more understandable, and easier to debug.


In this article, we’ve covered how parallelization in Python can help speed up your programs. Check out the references for the multiprocessing and threading Python modules for the details of available functions and parallel primitives. The Anatomy of Linux Process Management tutorial from IBM is a great way to learn more about processes and threads in Linux. Learn programming languages today.

Start Learning