Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I'd like compiler writers to embed a 'default profile' into the compiler, which uses data from as much opensource code as they can find all over github etc.

What would be the point? The whole thing about PGO is that it measures which paths of _your_ code are "hot".




Lots of your code is library code that everybody uses...

And lots of your code has similar hot paths to everyone elses code. It turns out that `for x in pixels { }` is probably going to be a hot loop... But `for x in serial_ports { }` probably isn't a hot loop...


Consider error handling paths.


Rust will already end up optimising out the error handling that can't happen because Infallible is an Empty Type (it makes no sense to emit code for an Empty Type because no values of this type can exist, so during monomorphization this code evaporates)

(e.g. trying to convert a 16-bit unsigned integer into a 32-bit signed integer can't fail, that always works so its error type is Infallible, whereas trying to convert a 32-bit signed integer into an unsigned one clearly fails for some values, that's a core::num::TryFromIntError you need to handle)

So we're left only with errors which don't happen. But who says? On my workload maybe the profile image file doesn't exist 0% of the time since I'm actually making the image files, so of course they exist, but in your workload the user gets to specify the filename and so they type it wrong about 0.1% of the time, and in somebody else's workload the hostile adversary spews nonsense filename values like "../../../../../etc/passwd" to try to exploit bugs in some PHP code from 15 years ago, so they see almost 10% errors. What would we learn from a "general profile"? Nothing useful.


Or a perennial favorite of mine:

$ process Some Image Name.png

Could not find file “Some”

$ process “Some Image Name.png”

Done.


> Some Image Name.png

... Urg, that.

If I ever implement a bespoke file system format, it is going to be encoding-level impossible to represent file names with spaces. Not FAT-style[0] "the spec says to replace that with a underscore" or something, but more "the on-disk character encoding does not contain any sequence of bits that represents space".

0: (non-ex-)FAT stores filenames in all caps, but the data of disk is ASCII, so you can just write lowercase letters in the physical directory entries. (I've seen at least one FAT implementation that actually uses that to 'support' lowercase filenames.)


Meh, I remember the move from DOS 3.3 to ProDOS back in the Apple //e days and the loss of spaces in filenames was something that seemed a regression to me.


I’d rather see a ban on non-Unicode strings as file paths. ^&*# Windows.


each time a new unicode version is introduced, you get new backward compatibility issues


In fact, no. Unicode is rigorously backwards compatible. When Unicode 15.0.0 was released this year, the only thing I needed to do with my Unicode library was update the data tables that indicated the categories and combining classes for the newly added characters. Once a character is added, it’s there forever. This is part of why, for example, languages written in different descendants of the Brahmic script treat vowels differently, because they meant to preserve round-trip compatibility with pre-Unicode character conventions so in Thai, most vowels are treated as separate graphemes from the consonant to which they’re attached while in Devanagari, the corresponding vowels are treated as combining (and spacing) diacritics. The one place where Unicode chose to break backwards compatibility with pre-Unicode was its most controversial choice, Han unification, where the various incompatible 16-bit character encodings of Han characters (Japanese, Korean, and THREE Chinese encodings) were replaced with a single unified set that eliminated the duplications between the three sets. But within the Unicode history, I think there was one breaking change in the 90s that was fixing an error (I don’t care to dig up the history for a HN comment), but otherwise, any text encoded with a Unicode version prior to 15.0.0 will be interpreted identically in the current Unicode.

(I had someone ask for the possibility of being able to choose older versions of Unicode in my library to handle his use case with clusters in a terminal app, but on further investigation to what he was trying to do, I discovered that it was a misunderstanding about how grapheme clusters work and in fact would not do what he wanted it to do.)


And all this time I’ve been blaming Windows for bringing us white spaces in file names.


I think white space in file names was possible in Unix long before Windows. IIRC, VMS and (probably TOPS-10/20) did not allow white space. It may have also been possible to include spaces in DOS file names, but it’s long enough ago, that I wouldn’t be completely sure of it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: