import concurrent.futures
import itertools
import random
def generate_random(count):
return [random.random() for _ in range(count)]
if __name__ == "__main__":
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
executor.submit(generate_random, 10000000)
executor.submit(generate_random, 10000000)
# I guess we don't care about the results...
Changing this to use multiple processes instead of multiple threads is just a matter of s/ThreadPoolExecutor/ProcessPoolExecutor.
You can also write this more idiomatically (and collect the combined results) as:
if __name__ == "__main__":
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
out_list = list(
executor.map(lambda _: random.random(), range(20000000)))
In this example case, this will be quite a bit slower because the work item (in this case generating a single random number) is trivial compared to the overhead of maintaining a work queue of 200000000 items - but in a more typical case where the work takes more than a millisecond then it is better to let the executor manage the division of labour.
Good point. Just a couple of points on futures: 1) they're backported to python 2[1], and 2) to make the example work, you need a pickleable function as you say, for example, if you have ipython running in a virtualenv:
import pip
pip.main(["install","futures"])
import random
def l(_):
return random.random()
with f.ProcessPoolExecutor(max_workers=4) as ex:
out_list = list(ex.map(l, range(1000)))
len(out_list)
#> 1000
concurrent.futures is nice but it's a real shame that ThreodPoolExecutor doesn't take an initializer argument like multiprocessing.Pool does; e.g., if you want a bunch of processes to work on a big data file, it's convenient to have all workers load that file at initialization. See https://code.google.com/p/pythonfutures/issues/detail?id=11
This example is not too realistic and just narrows it down to the case where a job can be divided into isolated tasks with no shared data/state.
Often times threads need to update shared dict/list etc... With multiprocessing this cannot be done. You can use a Queue for this but it's horribly inefficient.
Generally speaking if you need performance and Python is not meeting the requirements then you are better off using another language.
It's py3 but there is a back port of it for py2 that works wonderfully. I've recently begun using it and will never look back to multiprocessing (on which it is built).
For the every day when I want to make embarrassingly parallel operations in Python go fast I find joblib to be a pretty good solution. It doesn't work for everything, but it's quick and simple where it does work.
I haven't used that, but it looks interesting. After a brief look it seems like they both submit jobs to Python interpreters started up in other processes.
Parallel Python (PP) seems to have a clunkier API, but also more functionality. I think the biggest advantage is that it can distribute jobs over a cluster instead of just different cores on the same machine. I might look into PP if I need to do things on a cluster, but I think I'll still stick with joblib when I'm on one machine.
That's just my first impression. I'd be interested to read your blog post.
For python developers who dislike the continued existence of the GIL in a multicore world, and who feel that multiprocessing is a poor response given the existence proofs of IronPython and Jython as non-GIL interpreter implementations, please consider moving to Julia.
Julia addresses nearly all the problems I've found with Python over the years, including poor performance, poor threading support on multicore machines, integration with C libraries, etc. I was a big adherent of Python but as machines got more capable, the ongoing resistence to solving the GIL problem (which IronPython demonstrated can be done with reasonable impact on serial performance) I could not continue using the language except for legacy applications.
This comment overstates the current power of Julia's parallel programming model — as of now Julia has no real tools for shared-memory parallelism and probably will not for another few versions or so. For distributed memory Julia is great, but please do not use Julia if you are being hindered by the GIL.
(NB I say this as a big Julia evangelist. it has a lot of potential but is not really there yet on a number of things, this being one of them.)
This is not strictly accurate. Julia does not support multi-threaded parallelism, but there is decent (if, yes, still immature) support for multi-process shared memory parallelism - similar to Python's multiprocessing library. Not an alternative to the GIL as such, but definitely more than nothing.
One nice example using this is a shared memory, parallel sparse matrix multiplication implementation:
I don't know what you are talking about. The GIL has never bothered me. I have been using Python together with multiprocessing and threads with concurrent.futures. For integration with C libraries I use Cython; generally interfacing with C is one of Python's strong points, don't know where you got that from. Have you actually looked into why Python has a GIL? It's a pretty clear trade-off, I think. It seems intuitive to me that requiring lots of small locks to avoid a global lock might not be beneficial, and attempts to get rid of it such as PyPy is doing with software transactional memory involve big changes, so it's not like you can decide overnight "let's get rid of the the GIL".
Julia looks nice but comes with its own set of problems: no inheritance, 1-based indexing, less libraries, less mature.
Yes, I have looked into why the Python has a GIL. I've even written C interface code which released the GIL and then reacquired it when necessary (I know a ton about this, having spent too many of the last 20 years integrating C and python). Yeah, I actually know what the tradeoffs are and can evaluate them (I used to work with the author of IronPython).
you have several choices for C integration in Python. SWIG, which is now generally considered a huge mess, hand-wrapping, which is a tedious pain, and dlopen/dlsym methods that talk to the C api direectly (which requires something like GCCXML to handle type recognition for complicated APIs).
I don't think PyPy's approach to transactional memory is the right direction either.
In short: multithreading on multicore machines is how you write performant software in industry. The hardware is designed for, the compilers are designed for it, and if you don't take advantage of it, you're just wasting machines.
Now people could argue that multiprocessing addresses it, but it's just message passing between different process spaces, which while a wonderful and powerful tool, is ultimately just more cumbersome (hey, I used to write big MPI/OpenMP apps that did both models at the same time).
Anyway, the ultimate existence proof is that IronPython was both faster serially and in parallel, without the GIL, than CPython. So basically we know it's possible. The Python developers have no will, inclination, or ability to make it so,.
It would be interesting if you could give some arguments for your positions. Why is STM not the way? Why, if IronPython is as good as you say it is, doesn't it see greater adoption or why don't other implementations use its strategies for removing the GIL? Wikipedia says that IronPython scores worse on PyStone benchmarks compared to CPython, and it's likely that this is a consequence of IronPython's fine-grained locking which is required in the absence of a GIL.
As for interfacing with C, like I said, Cython really makes this a lot easier than the approaches you mention.
You mention IronPython as not having a GIL, but then IronPython doesn't allow easy interfacing with C code, e.g., it's not compatible with numpy ...
See other replies to the post you've replied to: threads in Python can be "parallel", if one of the threads releases the GIL. This can happen during calls to I/O, or more generally, any C call that decides to release the GIL. Most of the time, you're doing I/O anyways, so it suffices. If you're not (you're truly doing computation), then there is multiprocessing.
I was considering adding this but I wasn't fully sure that it would be good content for a "first intro to parallel programming" article. Perhaps a good candidate for the next one?
You can write the threaded example as:
Changing this to use multiple processes instead of multiple threads is just a matter of s/ThreadPoolExecutor/ProcessPoolExecutor.You can also write this more idiomatically (and collect the combined results) as:
In this example case, this will be quite a bit slower because the work item (in this case generating a single random number) is trivial compared to the overhead of maintaining a work queue of 200000000 items - but in a more typical case where the work takes more than a millisecond then it is better to let the executor manage the division of labour.