Paralellization usually is pretty tricky in python, however there’s a super easy way to implement pretty straightforward parallelization using the built-in os.fork() functionality.

Let’s talk about what os.fork() actually does. In short, it simply creates an additional copy (referred to as the “child process”) of the running program at the same(ish) spot as the “parent process”. These are separate processes, so they have completely different memory allocated for them (and cannot share variables).

This lack of internal communication means that this is best suited for having a single piece of code concurrently spin up a separate task that has some external communication method (files, network, etc.) or one that doesn’t have any communication requirements whatsoever.

When os.fork() is called, it returns back a different number depending on whether or not it’s the child or parent process. For the child, the result of os.fork() is the number 0. For the parent the result is the pid of the child process.

I use this for spinning up a webserver and then running integration tests against it in the same codebase without having to shell out. A good example is the Boulder Python Meetup website. This code is available on github but here’s the relevant code snippet:

Leave a Reply

Your email address will not be published. Required fields are marked *