python - Why is it important to protect the main loop when using joblib.Parallel? -


the joblib docs contain following warning:

under windows, important protect main loop of code avoid recursive spawning of subprocesses when using joblib.parallel. in other words, should writing code this:

import ....  def function1(...):     ...  def function2(...):     ...  ... if __name__ == '__main__':     # stuff imports , functions defined     ... 

no code should run outside of “if __name__ == ‘__main__’” blocks, imports , definitions.

initially, assumed prevent against occasional odd case function passed joblib.parallel called module recursively, mean practice unnecessary. however, doesn't make sense me why risk on windows. additionally, this answer seems indicate failure protect main loop resulted in code running several times slower otherwise have simple non-recursive problem.

out of curiosity, ran super-simple example of embarrassingly parallel loop joblib docs without protecting main loop on windows box. terminal spammed following error until closed it:

importerror: [joblib] attempting parallel computing without protecting import on system not suppo rt forking. use parallel-computing in script, must protect main loop using "if __name__ == '__main__'". ple ase see joblib documentation on parallel more information 

my question is, windows implementation of joblib requires main loop protected in every case?

apologies if super basic question. new world of parallelization, might missing basic concepts, couldn't find issue discussed explicitly anywhere.

finally, want note purely academic; understand why generally practice write one's code in way, , continue regardless of joblib.

this necessary because windows doesn't have fork(). because of limitation, windows needs re-import __main__ module in child processes spawns, in order re-create parent's state in child. means if have code spawns new process @ module-level, it's going recursively executed in child processes. if __name__ == "__main__" guard used prevent code @ module scope being re-executed in child processes.

this isn't necessary on linux because does have fork(), allows fork child process maintains same state of parent, without re-importing __main__ module.


Popular posts from this blog