Parallel Processing in Python

Parallel processing has been made easy with the multiprocessing library. Earlier we had to use a Thread and Queue implementation to get a ThreadPool functionality with multiple workers.

This has been simplified in the multiprocessing library. It is not very well documented though,

Using multiprocessing library in the ThreadPool mode,

from multiprocessing.dummy import Pool as ThreadPool

def dummy_print(my_string):  
    print("I am a harmless print:" + my_string)

def create_and_run_threadpool(input_map):  
    number_of_workers = 5
    pool = ThreadPool(number_of_workers)

    # Call dummy_print with each element of the input_map. These calls are executed by the worker threads.
    results = pool.map(dummy_print, input_map)
    pool.close()
    pool.join()
    return results

if __name__ == '__main__':  
    create_and_run_threadpool(["first", "second", "third", "fourth", "fifth"])

The above code uses python threads and run all the code on the same CPU core. We can also make the workers run on different CPU cores. This is illustrated below,

from multiprocessing import Pool, cpu_count  
from contextlib import closing

def dummy_print(my_string):  
    print(my_string)

def create_and_run_workers_on_cores(input_map):  
    # Call dummy_print with each element of the input_map. These calls are executed by the workers across CPU cores.
    with closing(Pool(cpu_count())) as pool:
        results = pool.map(dummy_print, input_map)
        pool.close()
        pool.join()
    return results

if __name__ == '__main__':  
    create_and_run_workers_on_cores(["first", "second", "third", "fourth", "fifth"])