Hacker News new | past | comments | ask | show | jobs | submit login
Maciej Fijalkowski's view on PyPy's future (lostinjit.blogspot.com)
86 points by pieceofpeace on Dec 31, 2011 | hide | past | favorite | 21 comments



PyPy will likely be complete enough to use by the end of 2012, if not sooner. By that I mean most of the major libraries will work with it and people will ask themselves "why wouldn't I want a massive speedup?". The one reason why they might not is memory usage: it can be 10 times as much. That's significant in quite a few applications.

As it's beyond my ken I would like to know from someone capable of answering if this is going to be a constant constraint, stemming from the design of the project, or if memory usage is likely to go down at some point.


PyPy has higher memory usage than CPython for two main reasons: A) JIT-generated code, B) pure garbage collection lets dead objects live longer than CPython's refcount/GC hybrid. Neither of these is going to go away, although it's reasonable to expect the gap to narrow over time. More importantly, much of this extra memory is a fixed cost, so while simple programs will see a 10x increase, real-world programs should see a less dramatic impact.

FWIW, my own tests of pypy 1.7 have shown the memory overhead to be about 3x: http://groups.google.com/group/python-tornado/browse_thread/...


I believe the memory usage has been adressed lately (either last release, or shortly after). There was an article about it on HN, which i can't find atm. Do you mean some specific use cases?


http://news.ycombinator.com/item?id=3349429

If you search for my username in that thread I asked for someone to run the code from the story on a similar dataset with PyPy. Twice as fast but x10 memory usage. But as I said elsewhere a bit of searching suggests it won't be a showstopper.


Be careful extrapolating from such small examples, where the memory usage is dominated by fixed overhead. Running Django or something similar is more likely to give a real world result, and even then, a bunch of tests is far better.

3mb vs 30mb might be par for the course for smaller programs, but either is negligible.


What about start-up time? Isn't it significantly longer with PyPy?

[disclaimer: this is not FUD, but a honest curiosity]


Try it. It's very slightly longer than CPython. It's nothing like, say, the JVM.


In my opinion, they could get more contributors if they simplified the rather byzantine build process for pypy. It's especially annoying for developers attempting to port to other platforms.

(No incremental build, it takes roughly three hours on a high-end nehalem workstation, and if it fails for any reason, you get to start all over again!)


What should we make of the failure of PyPy's users to fund Maciej's work? The value of PyPy seems much, much greater than one engineer's salary.


I find it puzzling as well. Google heavily uses python, as do a number of other web companies. Further, Ubuntu and Redhat have it as a system administration language. It's almost the default language for O'Reilly books that aren't language specific. Given all that you'd think there'd be a few more corporate contributions now that a x5 speedup has already been proven and it seems to be just a matter of polishing it up.

A bit of searching and it seems the memory problems I raised in my other comment aren't so drastic after all. Theoretically it can use less memory in many operations and the current blowouts are not as high as I thought (I was going from one benchmark a HNer, brianh, was kind enough to run for me[1]).

[1]http://news.ycombinator.com/item?id=3357160


Just because pypy is faster than cpython on a handful of test cases doesn't make it worthy of corporate sponsorship. I would sponsor development if it made a noticeable improvement.

Not to belittle the project, but I tried a vanilla order entry implementation on my test box in nasdaq. Essentially it runs an epoll loop using a C extension to take advantage of the myricom DBL calls. RT latency was roughly 5 usec faster using cpython. As for other parts of the system (eg feed handler), the performances were comparable.

For other applications (eg compliance reporting) pypy is 2x cpython speed, but I could care less about that timing (even if it ran 1000x slower than cpython, it wouldn't matter)

TL;DR: it needs to be useful. And I just don't see the usefulness for my Python applications.


> I find it puzzling as well. Google heavily uses python, as do a number of other web companies.

Especially after Google invested engineers on Unladen Swallow, which fizzled out. I've read that Google likes to max out their servers to the point where OOM is not unlikely, so PyPy memory usage might not be worth the runtime performance gains.


More likely, google requires compatibility with its set of C extensions. At least when unladen swallow was announced, this was one of the major requirement. I would expect the problem to be similar in most large corporate environments.


I agree, it is quite bizarre. PyPy is among the most remarkable and successful projects of its kind. That it isn't funded in some way is very surprising.

Perhaps there is a business opportunity here? Python as a service, sort of like how Python runs on Google App Engine, but using PyPy. PyPy's sandboxing makes this easier actually, and the main advantage of course would be performance.


This might be a contrary view, but Cython has been ready for production for quite a while while PyPy has still been getting its boots on:

http://cython.org/

It actually works right out of the box. You can download it and get real world examples to work in five minutes. It works with C++. It works with numpy.

http://wiki.cython.org/WrappingCPlusPlus

http://wiki.cython.org/tutorials/numpy

It's the real deal: ridiculously easy to use as a drop-in replacement for slow functions. It works on problems that are not toys. It allows you to use high performance C and C++ libraries, it's used in a large and well maintained project (sage), and development was at least partially funded by Google.

With all due respect to Maciej for his work, PyPy has been in the works for many years now, and projects like that don't tend to ship -- or to meet expectations if they actually do ship.


I think you are being unfair. PyPy is something you can use on your code without modifying your code. That's a big thing. And by big I mean huge. Paying someone $70/hour to modify code is expensive. Switching "python" executable with "pypy" is a lot cheaper than profiling code then re-writing it in Cython. Yeah you might still end up having to do that but who knows, you might not have to or you might even be running faster than C (yes in theory jit compiler could do that).


Seriously, pypy outperforms Cython in pretty much all cases unless you provide type informations and even then, pypy is often faster. Complex projects, like twisted or django tend to have a flat performance profile - in order to run them reasonably fast, you would need to put types everywhere and very likely convert classes to C structs. This is not going to fly and you know it, cython is only useful for people inside numeric community.

Also, it's true that pypy took many years to build, but it actually does work. If you claim that pypy never shipped, it very likely means none of the "new python interpreter" projects would ever ship for you, but hey, there are people out there for whom it works.

Cheers, fijal


I've tried Pypy to speed up a background worker that create collages of 300-500 images, this is my experience:

- The JIT optimization doesn't survive between different runs of the script. Since I run a different script for every collage, I haven't see a noticeable speedup.

- The script also download the 300-500 images from facebook with curl, this module is not supported yet by pypy

- Since the most of the time is spent on images resize (PIL.resize) which is in C, the JIT can't do many optimizations.


It's similiar to how clang tries to be a gcc replacement. I think clang got started in ~2008 and now you can build most of the open source projects with it. But still some packages use GNU extensions or they depend on some gcc-only features.

I think the reason it works for clang is because they're really trying hard to be drop-in compatible with gcc. I remember reporting some minor thing like: command line flag -- in gcc generated a warning, but it didn't return an error code on exit and it did in clang, so they've changed it, because it caused crashes some project builds.

Not sure how pypy handles the compatibility now, because I couldn't compile it yet, tried like 4 times already and it always crashes randomly or it gets out of memory or something.


I don't think speed is a big enough pain point for the python community at large to justify the pain of moving from cpython.

PyPy needs to use their great implementation stack to develop their own killer feature. Like for instance a small and fast python for embedded scripting or compiling normal python program to a small and optimized native executable.


So far just faster python have been a killer feature for some people :) True, it does not fit anyone, but hey, nothing does.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: