Parallel Processing in Python
Parallel processing has been made easy with the multiprocessing
library. Earlier we had to use a Thread
and Queue
implementation to get a ThreadPool
functionality with multiple workers.
This has been simplified in the multiprocessing
library. It is not very well documented though,
Using multiprocessing
library in the ThreadPool mode,
from multiprocessing.dummy import Pool as ThreadPool
def dummy_print(my_string):
print("I am a harmless print:" + my_string)
def create_and_run_threadpool(input_map):
number_of_workers = 5
pool = ThreadPool(number_of_workers)
# Call dummy_print with each element of the input_map. These calls are executed by the worker threads.
results = pool.map(dummy_print, input_map)
pool.close()
pool.join()
return results
if __name__ == '__main__':
create_and_run_threadpool(["first", "second", "third", "fourth", "fifth"])
The above code uses python threads and run all the code on the same CPU core. We can also make the workers run on different CPU cores. This is illustrated below,
from multiprocessing import Pool, cpu_count
from contextlib import closing
def dummy_print(my_string):
print(my_string)
def create_and_run_workers_on_cores(input_map):
# Call dummy_print with each element of the input_map. These calls are executed by the workers across CPU cores.
with closing(Pool(cpu_count())) as pool:
results = pool.map(dummy_print, input_map)
pool.close()
pool.join()
return results
if __name__ == '__main__':
create_and_run_workers_on_cores(["first", "second", "third", "fourth", "fifth"])