Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unfortunately, many projects never benefit from PGO, because there is quite a lot of complexity involved in setting up a profiling workload, storing the profile somewhere, and using it for future builds.

I'd like compiler writers to embed a 'default profile' into the compiler, which uses data from as much opensource code as they can find all over github etc.

This default profile will improve the performance of lots of libraries that everyone uses, and will probably still help closed source code (since it will probably be written in a similar style to opensource code).




The "default" profile "for PGO" is the compiler on its own -- folk put a lot of effort into making sure it will generally compile arbitrary code well. And a big part of that is lots of people running lots of open source code and measuring how well it performs.

The difficulty with "as much open source code as they can find" is that we need to execute the code to make a profile. And unless we're running the code under real-world conditions, there's no guarantee that we'll generate a useful profile. So we need to be a little careful about which code we look at from a performance perspective. Even when we have a profile, it's a count of branches taken for the specific code that was compiled, and it's not normally applicable to either a different version of the compiler or any input that's not identical to the input used for profiling. With link-time optimisations, even a "common" profile for library code isn't necessarily going to be useful: which bits of a library we'll try to inline will vary according to the code that's calling it.


To that point, llvm now has an "ml inliner" that's been trained on a lot of open source code to allow it to make the best inlining decisions.


> I'd like compiler writers to embed a 'default profile' into the compiler, which uses data from as much opensource code as they can find all over github etc.

What would be the point? The whole thing about PGO is that it measures which paths of _your_ code are "hot".


Lots of your code is library code that everybody uses...

And lots of your code has similar hot paths to everyone elses code. It turns out that `for x in pixels { }` is probably going to be a hot loop... But `for x in serial_ports { }` probably isn't a hot loop...


Consider error handling paths.


Rust will already end up optimising out the error handling that can't happen because Infallible is an Empty Type (it makes no sense to emit code for an Empty Type because no values of this type can exist, so during monomorphization this code evaporates)

(e.g. trying to convert a 16-bit unsigned integer into a 32-bit signed integer can't fail, that always works so its error type is Infallible, whereas trying to convert a 32-bit signed integer into an unsigned one clearly fails for some values, that's a core::num::TryFromIntError you need to handle)

So we're left only with errors which don't happen. But who says? On my workload maybe the profile image file doesn't exist 0% of the time since I'm actually making the image files, so of course they exist, but in your workload the user gets to specify the filename and so they type it wrong about 0.1% of the time, and in somebody else's workload the hostile adversary spews nonsense filename values like "../../../../../etc/passwd" to try to exploit bugs in some PHP code from 15 years ago, so they see almost 10% errors. What would we learn from a "general profile"? Nothing useful.


Or a perennial favorite of mine:

$ process Some Image Name.png

Could not find file “Some”

$ process “Some Image Name.png”

Done.


> Some Image Name.png

... Urg, that.

If I ever implement a bespoke file system format, it is going to be encoding-level impossible to represent file names with spaces. Not FAT-style[0] "the spec says to replace that with a underscore" or something, but more "the on-disk character encoding does not contain any sequence of bits that represents space".

0: (non-ex-)FAT stores filenames in all caps, but the data of disk is ASCII, so you can just write lowercase letters in the physical directory entries. (I've seen at least one FAT implementation that actually uses that to 'support' lowercase filenames.)


Meh, I remember the move from DOS 3.3 to ProDOS back in the Apple //e days and the loss of spaces in filenames was something that seemed a regression to me.


I’d rather see a ban on non-Unicode strings as file paths. ^&*# Windows.


each time a new unicode version is introduced, you get new backward compatibility issues


In fact, no. Unicode is rigorously backwards compatible. When Unicode 15.0.0 was released this year, the only thing I needed to do with my Unicode library was update the data tables that indicated the categories and combining classes for the newly added characters. Once a character is added, it’s there forever. This is part of why, for example, languages written in different descendants of the Brahmic script treat vowels differently, because they meant to preserve round-trip compatibility with pre-Unicode character conventions so in Thai, most vowels are treated as separate graphemes from the consonant to which they’re attached while in Devanagari, the corresponding vowels are treated as combining (and spacing) diacritics. The one place where Unicode chose to break backwards compatibility with pre-Unicode was its most controversial choice, Han unification, where the various incompatible 16-bit character encodings of Han characters (Japanese, Korean, and THREE Chinese encodings) were replaced with a single unified set that eliminated the duplications between the three sets. But within the Unicode history, I think there was one breaking change in the 90s that was fixing an error (I don’t care to dig up the history for a HN comment), but otherwise, any text encoded with a Unicode version prior to 15.0.0 will be interpreted identically in the current Unicode.

(I had someone ask for the possibility of being able to choose older versions of Unicode in my library to handle his use case with clusters in a terminal app, but on further investigation to what he was trying to do, I discovered that it was a misunderstanding about how grapheme clusters work and in fact would not do what he wanted it to do.)


And all this time I’ve been blaming Windows for bringing us white spaces in file names.


I think white space in file names was possible in Unix long before Windows. IIRC, VMS and (probably TOPS-10/20) did not allow white space. It may have also been possible to include spaces in DOS file names, but it’s long enough ago, that I wouldn’t be completely sure of it.


I think a cool project to work on would be model-based ML-generated profiles that takes a set of parameters like:

* application type (e.g. client, server, batch process, parser, etc.) * target architecture, vendor, model, etc. * target resources like RAM, HD Types, Network interfaces, etc.

I would think you could get very close to an actual PGO level of performance with just a handful of parameters and lot of data.


Agree this difficulty is the biggest obstacle to PGO's success. A language/ecosystem that works out how to integrate this as smoothly as testing would have a sizeable performance boost in practice.

The default profile is a nice hack. We do this by default for C++ builds at [company], it works great. Teams that care can build a custom profile which performs better, but most don't.

> I'd like compiler writers to embed a 'default profile' into the compiler, which uses data from as much opensource code as they can find all over github etc.

Working out how to build, let alone profile all that code is no joke. And the result will be large, and maybe not that much overlap with the average program. As a sibling points out, maybe using ML to recognize patterns instead of concrete code would help?

I'd settle for profiling of the standard library. In an ecosystem like Rust, per-crate default profiles that you could stitch together would be amazing.


I think you can already build shared libraries with PGO, although this doesn't really work with header only libraries for C++...


That is the beauty of modern JITs with feedback PGO data, it can be saved across execution sessions and with time the data with grow towards an optimal data point.


JITs do PGO all the time. It’s their bread and butter.


Nowadays they also save PGO across executions, so that they don't always start from zero.

The most modern ones that is (Java, .NET, Android).


I don’t know if V8 does between runs but it seems to share something between subprocesses. There is commentary in the Node Worker docs about turning it off/on.


Perhaps a better approach would be some sort of per-library profile?


If you can make it be zero effort for developers, thats a good plan... But if it involves even a minor effort from the developer, then most developers probably won't bother.

I'm imagining for example a 'profile server', which anyone can upload profiler data to, and that the compiler queries to get profile data for any given file it wants to compile.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: