PyPy to me looks like a great effort and a technological tour de force. But I wish they would talk _more_ about why and when PyPy will be _slower_ than CPython on their site . Here is why.
I tried to run some of my scripts on PyPy and performance was invariably worse (about 50% worse). And my first reaction was: PyPy is not delivering on its promises. Only later, on some forum I read that PyPy does not perform well on large dictionaries (and this is essentially what I do in my scripts). Have I known it in advance, my first impression of PyPy would be much better.
It's the creation of large dicts to be precise. The thing is we try to attack those problems as they arise and as people report bugs. Generally pypy slower than cpython is a bug and we consider it as such. That also means that those are moving targets usually - as we discover this, we fix it and sometime maybe something else pops up. It would be a bit of a mess to keep the list of such things on the website.
In yesterday's "Fast VM" thread I asked about memory consumption of PyPy vs. CPython, because most of the benchmarks focus on speed and there usually are no memory consumption figures (iirc, they were huge for Unladden Swallow [~800megs on one of their benchmarks, probably the django benchmark]). While I agree with fijal that there are no benchmarks stressing the memory sub-system, I think it would also be interesting for many potential early adopters to know about expected requirements.
If I remember correctly then working with big integers (Python has "unlimited" integer) was really slow with pypy. My first example I ran in pypy used those and I was wondering what I did wrong when pypy is so slow compared to cpython.
I'm very amused that someone else has observed this about PyPy, I began to think I was the only one. It does not seem to be the orthodox view.
I like CPython's fast startups and low memory overhead, bad scores in Pfannkuch benchmarks notwithstanding. If PyPy runs my everyday stuff better to casual inspection I will get a different impression, but as things stand I see it as a kind of Java-ization whose superiority is often unclear or qualified.
Is a JIT always better? Is reference counting always worse? Is it a total no-brainer to want Software Transactional Memory? I don't think so. But I think there are some arguments, and perhaps as importantly I think there are many newer Python folks who want to make a name for themselves.
What would be the point of filing bugs against PyPy stating that it is slower or more resource-hungry on programs XYZ, when this is ultimately due to PyPy's core design decisions which are a point of pride and never going to be abandoned?
Expecting some positive outcome from that seems incredibly unrealistic to me. I believe it is a waste of time. It isn't my responsibility to provide reasons to PyPy why I'm not using it. Let PyPy - a project receiving no small amount of promotion - show that it is better for my purposes, before I invest big in switching over projects.
What you COULD say is that it doesn't make sense that in practice, people are treated like idiots and flamed if they publicly mention that they don't find in practice that PyPy is always or even generally better than CPython.
Or you could say that they should both work for most purposes, and that the choice is nuanced (measure the difference yourself), and in just a few respects PyPy is not as mature (not surprising given the lengths of the projects' histories). And PyPy is a work in progress and you expect it to get better if it isn't better than CPython now, for some specific purpose.
I don't expect you to say either of those things, because it seems important to the PyPy project to promote it over CPython and if that means selectively mentioning only the cases which are in PyPy's favor, or softly suppressing dissent, then so be it. That is how it seems to me, and I don't understand why it has to be that way.
I think your point about "should work for most purposes" express my view as well. Choice is tricky and especially programs violently optimized for CPython might find PyPy's characteristics strangely different. I don't see any particular design decisions that would prevent PyPy outperforming CPython on everything in the long run, but this is certainly not the case right now.
Maybe our PR got too strong or something, but I think "measure yourself and report if it's too slow" was always our motto. In fact we're definitely more interested about hearing when people find their programs slow, rather than fast because it gives us more optimization opportunities. It's however entirely pointless without a way for me to reproduce it, since I have entirely no clue.
In short - I think we violently agree and if PyPy's PR is not up to the standard and fairness you would expect, I apologize.
I've seen the promotional materials on PyPy's performance (I find them particularly persuasive on odd synthetic benchmarks) but thank you for the link.
I think you are a nice guy doing very useful work. You deserve to be proud of your work. But I do think you should be aware that an aggressive social orthodoxy has formed around the performance of PyPy. (It isn't unusual that I was downvoted here for suggesting PyPy isn't always faster, for example; it's the same thing in other fora and offline).
I think the PyPy's approach for generating Python interpreters is a great, clever idea. I think the project is more exciting than Shedskin was, etc. I'm impressed at the progress that has been made in expanding functionality and library support. I am looking forward (although with a little skepticism) to the day when it's really better at everything and is on my phone and everywhere. Sounds good.
But I am concerned about a cultural shift in Python, and unnecessary increases in complexity which come along with it. Many of the things which drew me to Python years ago come from its roots in the Unix/C world. I only recently began to hear dogmatic arguments that JIT is always faster than ASM, and memory is cheap so it makes no difference to use 2-4x as much, and being able to hook up to C isn't so important, and everyone should be writing incomprehensible, heavily-threaded programs using the world's biggest gc and synchronization mechanisms which require a lot of close supervision by people with pompous job titles. And you end up managing concerns of this type more than you spend writing domain code. And you do it all not because it is the cleanest and most direct way to get a good result, but because it's what is understood to be the right thing.
So I think a lot depends on whether PyPy drinks too much of its own kool-aid. It could second-system Python to death. It could succumb entirely to Enterprisitis. I don't need CPython, but I hope PyPy will smell like it. If the culture and the working environment continue to Java-ize, I will probably jump ship to Go or Ruby or whatever has a good library and most of the virtues I currently get from Python. I don't think that going more complicated is the best way to make software faster or better and it is important to my quality of life that I feel good about the code I am writing.
I'd love to see a list of diffs: things that work in CPython and don't work in PyPy. That would be much more helpful for me to decide if it's time to switch.
Would the work on numpypy also make e.g. scipy available? Or would a separate effort be needed to move that to PyPy? Their website (scipy.org) mentions it builds on numpy, but it's unclear to me if it also depends on other libraries.
Edit:
I see the scipy website also mentions this:
Various SciPy modules use Fortran 77 libraries and some use C++, so you'll also need Fortran 77 and C++ compilers installed. The SciPy module Weave uses a C++ compiler at run time.
So I guess it wouldn't work out of the box. Do the PyPy devs have any plan for this as well?
I tried to run some of my scripts on PyPy and performance was invariably worse (about 50% worse). And my first reaction was: PyPy is not delivering on its promises. Only later, on some forum I read that PyPy does not perform well on large dictionaries (and this is essentially what I do in my scripts). Have I known it in advance, my first impression of PyPy would be much better.