parallel processing - python multiprocessing slow -


i have code parallelizes calls function. inside function, check if file exists, if not create it, else nothing.

i find if files exist, calling multiprocessing.process has huge time penalty compared simple loop. expected or there can reduce penalty?

def fn():     # check if file exists, if yes return else make file     if(not(os.path.isfile(fl))):         # processing takes enough time make paralleization worth     else:         print 'file exists'   pkg_num = 0 total_runs    = 2500 threads = []  while pkg_num < total_runs or len(threads):     if(len(threads) < 3 , pkg_num < total_runs):         t = multiprocessing.process(target=fn,args=[])         pkg_num = pkg_num + 1         t.start()         threads.append(t)     else:         thread in threads:             if not thread.is_alive():                 threads.remove(thread) 

there's fair bit of overhead bringing processes -- you've got weigh overhead of creating processes against performance benefits you'll gain making tasks concurrent. i'm not sure there's enough of benefit simple os call worthwhile.

also, sake of future generations, should check out concurrent.futures.processpoolexecutor; way, way cleaner. if use 2.7, can port it.


Popular posts from this blog