Python

What Is Python concurrent.futures? (with examples)

What Is Python concurrent.futures? (with examples)
In: Python, NetDevOps

As a Python learner, I've faced several challenges, but so far, one of the most difficult topics to understand has been concurrency. In the beginning, it can be incredibly confusing, especially if you're a beginner. The aim of this blog post is to simplify concurrency by breaking it down with a couple of examples and an analogy to help you understand this challenging concept. So, let's get started.

Why does Concurrency Matter?

When writing Python programs, you might find yourself needing to execute multiple tasks simultaneously or in parallel. This is where concurrency comes in. Concurrency allows your program to run multiple tasks at the same time, which can significantly improve performance and efficiency, particularly when handling time-consuming tasks.

The Magic of Python concurrent.futures

Python's concurrent.futures module simplifies concurrent programming by providing a high-level interface for asynchronously executing callable (functions/methods). ThreadPoolExecutor and ProcessPoolExecutor are two popular classes within this module that enable you to easily execute tasks concurrently, using threads or processes, respectively.

When deciding between ThreadPoolExecutor and ProcessPoolExecutor, consider the following analogy - ThreadPoolExecutor is like having multiple chefs in a shared kitchen, while ProcessPoolExecutor is like having multiple chefs, each with their own kitchen.

ThreadPoolExecutor is ideal for I/O-bound tasks, where tasks often wait for external resources, such as reading files or downloading data. In these cases, sharing resources is acceptable and efficient. On the other hand, ProcessPoolExecutor is better suited for CPU-bound tasks, where heavy computations are performed, and sharing resources could lead to performance bottlenecks.

How to Create Tables with Python Tabulate Module
In today’s blog post, we’re going to explore the Python Tabulate module, an incredibly useful tool I often use in my work. For those of you not familiar with it, Tabulate is a Python library that makes

Examples

In our examples, we will be using ThreadPoolExecutor, as our tasks primarily involve waiting for external resources or events rather than heavy computations.

The first example doesn't use concurrency, demonstrating how tasks are executed sequentially. The second example employs executor.map(), which executes tasks concurrently and returns results in the order they were submitted.

The third example makes use of executor.submit() along with concurrent.futures.as_completed(), which also executes tasks concurrently but allows you to process results as they become available, regardless of the order of submission. By analyzing these examples, you will gain a better understanding of how concurrency works in Python.

Let's use an analogy of a post office with multiple mailing clerks to better understand how concurrent.futures work. Imagine you have a stack of letters you want to mail. Each letter needs to be processed by a mailing clerk, who can stamp and send the letters. However, the time it takes for each clerk to process a letter may vary.

1. Without Concurrency

This script does not use any concurrency. Instead, it iterates through the letters list using a for loop and calls the mail_letter() function sequentially for each letter.

By comparing this script to the upcoming examples with concurrent.futures, you can appreciate the efficiency and time-saving benefits of using concurrency in your Python projects.

import time
import random

def mail_letter(letter):
    duration = random.randint(1, 5)
    print(f"Started mailing letter {letter} (duration: {duration}s)")
    time.sleep(duration)
    print(f"Finished mailing letter {letter}")
    return f"Letter {letter} mailed"

if __name__ == '__main__':
    letters = ['A', 'B', 'C', 'D', 'E']
    results = []

    for letter in letters:
        result = mail_letter(letter)
        results.append(result)

    print("Mailing Results:")
    for result in results:
        print(result)

Here's a line-by-line explanation of the code.

  1. Import required modules
  2. def mail_letter(letter):: Defines a function called mail_letter that takes a single argument, letter.
  3. duration = random.randint(1, 5): Inside the function, generates a random integer between 1 and 5 (inclusive), and assigns it to the variable duration.
  4. time.sleep(duration): Pauses the execution of the function for the number of seconds specified by duration. This simulates the time it takes to mail the letter.
  5. return f"Letter {letter} mailed": Returns a string indicating that the letter has been mailed.
  6. if __name__ == '__main__':: Checks if the script is being run as the main program (not being imported as a module).
  7. letters = ['A', 'B', 'C', 'D', 'E']: Creates a list of letters that need to be mailed.
  8. results = []: Initializes an empty list called results to store the results of mailing each letter.
  9. result = mail_letter(letter): Calls the mail_letter() function with the current letter and assigns the returned result to the variable result.
  10. for result in results:: Iterates through each result in the results list.
  11. print(result): Prints the current result.

As you can see below, without concurrency, the mailing process takes longer as each letter is mailed one at a time, and the program must wait for each letter to finish mailing before starting the next one. It took around 18 seconds to complete.

#output

Started mailing letter A (duration: 2s)
Finished mailing letter A
Started mailing letter B (duration: 3s)
Finished mailing letter B
Started mailing letter C (duration: 4s)
Finished mailing letter C
Started mailing letter D (duration: 4s)
Finished mailing letter D
Started mailing letter E (duration: 5s)
Finished mailing letter E

Mailing Results:
Letter A mailed
Letter B mailed
Letter C mailed
Letter D mailed
Letter E mailed

2. With Concurrency (executor.map)

In this example, we use executor.map() to apply the mail_letter() function to each letter in the letters list concurrently. The results are returned in the order that the tasks were submitted, and we print the mailing results in the same order.

import concurrent.futures
import time
import random

def mail_letter(letter):
    duration = random.randint(1, 5)
    print(f"Started mailing letter {letter} (duration: {duration}s)")
    time.sleep(duration)
    print(f"Finished mailing letter {letter}")
    return f"Letter {letter} mailed"

if __name__ == '__main__':
    letters = ['A', 'B', 'C', 'D', 'E']

    with concurrent.futures.ThreadPoolExecutor() as executor:
        results = list(executor.map(mail_letter, letters))

    print("Mailing Results:")
    for result in results:
        print(result)

Here's a line-by-line explanation of the code, excluding parts already explained in the previous example, and focusing on the differences related to concurrent.futures:

  1. import concurrent.futures: Imports the concurrent.futures module, which provides a high-level interface for asynchronously executing callables.
  2. if __name__ == '__main__':: (Same as before) Checks if the script is being run as the main program (not being imported as a module).
  3. letters = ['A', 'B', 'C', 'D', 'E']: (Same as before) Creates a list of letters that need to be mailed.
  4. with concurrent.futures.ThreadPoolExecutor() as executor:: Creates a ThreadPoolExecutor instance as a context manager, which manages the life cycle of a pool of worker threads that will be used to execute tasks concurrently.
  5. results = list(executor.map(mail_letter, letters)): Uses the executor.map() method to apply the mail_letter() function to each item in the letters list concurrently. It returns an iterable with the results in the same order as the input. The iterable is then converted to a list and assigned to the variable results.
  6. print("Mailing Results:"): (Same as before) Prints a message to indicate that the mailing results will be displayed.
  7. for result in results:: (Same as before) Iterates through each result in the results list.
  8. print(result): (Same as before) Prints the current result.

This script demonstrates how to mail a list of letters concurrently using the ThreadPoolExecutor and the executor.map() method, which allows for faster execution of the tasks comparing to the previous example.

Started mailing letter A (duration: 3s)
Started mailing letter B (duration: 2s)
Started mailing letter C (duration: 1s)
Started mailing letter D (duration: 4s)
Started mailing letter E (duration: 1s)
Finished mailing letter C
Finished mailing letter E
Finished mailing letter B
Finished mailing letter A
Finished mailing letter D

Mailing Results:
Letter A mailed
Letter B mailed
Letter C mailed
Letter D mailed
Letter E mailed

The output above demonstrates how the concurrent execution took place.

  1. Each letter's mailing process started almost simultaneously because multiple threads were running the mail_letter() function concurrently. You can see that the "Started mailing letter" messages were printed in order (A, B, C, D, E), but with different random durations.
  2. The mailing process for each letter is finished at different times, depending on the randomly assigned duration. This is evident from the "Finished mailing letter" messages, which were not printed in the original order (A, B, C, D, E). Instead, the letters with shorter durations finished earlier.
  3. The Mailing Results section displays the results of the mailing process, which are returned in the same order as the input letters (A, B, C, D, E). The executor.map() method ensures that the results are ordered according to the input sequence, even though the tasks are finished at different times. This is why you see the results as "Letter A mailed", "Letter B mailed", and so on, in the original order.

In summary, the output demonstrates how the ThreadPoolExecutor and the executor.map() method enabled the mail_letter() function to run concurrently for each letter, starting the tasks almost simultaneously and finishing them depending on their randomly assigned durations. The final results are displayed in the same order as the input letters, thanks to the executor.map() method.

3. With Concurrency (as_completed)

In this final example, we also use executor.submit() to submit the mail_letter() function to the executor for each letter in the letters list. However, we also store the returned Future objects in a dictionary called futures and use concurrent.futures.as_completed() to process the results as they become available, regardless of the order in which they were submitted.

import concurrent.futures
import time
import random

def mail_letter(letter):
    duration = random.randint(1, 5)
    print(f"Started mailing letter {letter} (duration: {duration}s)")
    time.sleep(duration)
    print(f"Finished mailing letter {letter}")
    return f"Letter {letter} mailed"

if __name__ == '__main__':
    letters = ['A', 'B', 'C', 'D', 'E']

    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = {executor.submit(mail_letter, letter): letter for letter in letters}

        for future in concurrent.futures.as_completed(futures):
            letter = futures[future]
            result = future.result()
            print(f"Result: {result}")
  1. Import required modules (same as before)
  2. def mail_letter(letter):: (Same as before) Defines a function called mail_letter that takes a single argument, letter. The function generates a random duration, prints a message indicating the mailing process has started, waits for the duration, prints a message indicating the mailing process has finished, and returns a string indicating that the letter has been mailed.
  3. with concurrent.futures.ThreadPoolExecutor() as executor:: (Same as before) Creates a ThreadPoolExecutor instance as a context manager, which manages the life cycle of a pool of worker threads that will be used to execute tasks concurrently.
  4. futures = {executor.submit(mail_letter, letter): letter for letter in letters}: Uses a dictionary comprehension to submit each mail_letter task for every letter in the letters list to the ThreadPoolExecutor. The executor.submit() method returns a concurrent.futures.Future object representing the result of a computation that may not have completed yet. The dictionary maps these Future objects to their corresponding letters.
  5. for future in concurrent.futures.as_completed(futures):: Iterates over the Future objects in the futures dictionary as they complete (regardless of the order they were submitted). This allows processing the results as soon as they become available.
  6. letter = futures[future]: Retrieves the letter associated with the current Future object from the futures dictionary.
  7. result = future.result(): Waits for the current Future object to complete (if it hasn't already) and retrieves its result.
  8. print(f"Result: {result}"): Prints the result for the current letter.
Started mailing letter A (duration: 2s)
Started mailing letter B (duration: 5s)
Started mailing letter C (duration: 2s)
Started mailing letter D (duration: 1s)
Started mailing letter E (duration: 2s)
Finished mailing letter D
Result: Letter D mailed
Finished mailing letter A
Result: Letter A mailed
Finished mailing letter C
Result: Letter C mailed
Finished mailing letter E
Result: Letter E mailed
Finished mailing letter B
Result: Letter B mailed
  1. Similar to the second example, the mailing process for each letter started almost simultaneously because multiple threads were running the mail_letter() function concurrently. The "Started mailing letter" messages were printed in order (A, B, C, D, E), with different random durations assigned.
  2. As before, the mailing process for each letter finished at different times, depending on their randomly assigned duration. The "Finished mailing letter" messages indicate the completion order, which is not necessarily the same as the original order of the letters.
  3. Unlike the second example, in this case, we display the results immediately after each letter's mailing process is finished.
  4. The concurrent.futures.as_completed() function allows us to iterate through the Future objects as they complete, regardless of their submission order. This is why you see the "Result:" lines interspersed between the "Finished mailing letter" messages, reflecting the order in which the tasks are finished.

Conclusion

In conclusion, we've explored the concurrent.futures module in Python and how it can help you execute tasks concurrently. We covered three examples, highlighting the differences between sequential execution and two concurrent approaches. We hope these simple examples and explanations have made it easier for you, Happy Coding.

Table of Contents
Written by
Suresh Vina
Tech enthusiast sharing Networking, Cloud & Automation insights. Join me in a welcoming space to learn & grow with simplicity and practicality.
Comments
More from Packetswitch
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Packetswitch.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.