Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every time I see someone argue for case insensitivity I remember this excellent Hacker News comment on the issue: https://news.ycombinator.com/item?id=29755865

It's definitely more complex than you'd naively think it is.



People are making things more complex than they are. No one said you need to convert between scripts or muck about with fullwidth variants or stuff like that; "semantically identical" is not the same as "case insensitive". That post is going of on a "if you want case-insensitivity then you must also treat color and colour as identical!" tangent, which is just silly and not what anyone has ever argued for.

The scripts with case translations that are more complex than a simple 1-to-1 mapping are the exception, not the rule (German, Greek, Turkish, Lithuanian). These can be dealt with.

It's certainly not the case that it "will only work for English and a few European languages"; it will work for much of the world.

The fact of the matter is, two out of the three most used systems today are case-insensitive, and Linux/POSIX being the exception is rather painful.


> People are making things more complex than they are.

You are the one who wants to introduce extra complexity: A few narrow corner cases where the filesystem treats certain code-points as equivalent. Imagine how confusing that would be to someone who doesn't regularly use those code-points.


> It's certainly not the case that it "will only work for English and a few European languages"; it will work for much of the world.

I respectfully disagree.

Consistently broken in the same way for everyone is better than unreliably working for some people, reliably working for other people, unreliably broken for a third group and reliably broken for a fourth group.

Filenames should be created as the user typed it - don't change the input before storing it.

Filenames should be displayed as the user typed it - don't render output different to what was input.

Tools should match filenames as the user expects it - in this case (hehe) it should perform a case-insensitive match. Ambiguity resolution has to be performed in the case where there is more than one match.

    cat > MyFile.txt    # Create 'MyFile.txt'
    ls                  # Display 'MyFile.txt'
    echo myfile*        # Display 'MyFile.txt'
    vim myfile.txt      # Opens 'MyFile.txt'
The first problem with this is in automated shell scripts: there is no interactivity so the script can't prompt the user "Which of the following files did you mean to open: [MyFile.txt, myfile.txt]?". This means that a script which relies on 'foo.txt' will fail if someone creates a 'Foo.txt' in the same directory.

Another issue with this is that userspace calls to `open()`, `stat()`, etc aren't able to fail and return a list of case-insensitive matches in the case of ambiguity. This makes handling the ambiguity the applications (very complex) problem - before any `open(fname)` call, the application will first have to `scandir()` to get all the filenames, then perform a case-insensitive match against the list to get all the CI matches, then prompt the user to select the correct one.

> The fact of the matter is, two out of the three most used systems today are case-insensitive

I think it's more that they are case-aware than case-insensitive; after all the filesystem certainly stores the case, and they both retrieve the case correctly.

It's the tooling that faces the user which "fix" cases to prevent two files with the same name in different cases being created.

You can certainly create file 'FOO' and file 'foo' on Windows in the same directory, and then the tools tend to randomly open only one of them no matter which on the user clicked on.[1]

Which is why I say that the actual filesystem is case-sensitive. The user programs normalise the case for the user.

[1] EDIT: I accidentally did this once, and had to write another small tool to remove files with the filename as typed because the normal tools (del, explorer.exe) just randomly choose one file to remove.


> Filenames should be created as the user typed it - don't change the input before storing it.

I wouldn’t go that far. Certainly, I would use some normalization algorithm on whatever the user typed, so that there’s no difference between typing a precomposed character (https://en.wikipedia.org/wiki/Precomposed_character) and typing a character and a combining character (https://en.wikipedia.org/wiki/Combining_character)

If you don’t do that, the user’s keyboard layout may affect whether they can type a given file name.

Also (nitpicking), if a user types a backspace or control-H, I wouldn’t include a 0x08 in the file name.


> It's definitely more complex than you'd naively think it is.

It has been working just fine on macOS for more than 20 years now.


…for English, and a handful of other European languages where the problem is trivial. Not for the 70% or so of humanity who isn't European or European-descendant.


It uses a specific normalisation. It’s not perfect (even for supposedly easy European languages), but that’s still many less foot guns than the status quo. Again, macOS works just fine with these languages.


Oh, wait, THAT'S what keeps breaking umlauts every time Mac users try to share files with any other operating system? Bloody hell, that garbage just keeps causing trouble constantly.


[1] https://github.com/tesseract-ocr/tesseract/issues/3447

Less that 20 years ago. Not a small project.


That's interesting but he does dive head first into a stupidly common fallacy - "if we can't do it perfectly we shouldn't do it at all".

I still think filesystems should be case sensitive, but not because there are some weird languages out there that it wouldn't work for.


> That's interesting but he does dive head first into a stupidly common fallacy - "if we can't do it perfectly we shouldn't do it at all".

I rather think you need to solve at least a somewhat significant subset of a problem in order to justify the extra complexity of the solution (and any confusion caused by it not solving the whole problem). Believe me, I'm a great believer in "perfect is the enemy of good", but not to the extent that I think that "any shitty solution at any cost is better than nothing". I'm not convinced that case insensitivity doesn't fall in the later category.


Over 2bn people speak English, Spanish or French. That's somewhat significant. You're going to tell those 2bn people they can't have nice things because there are other people that wouldn't get them?

As I said before, I still don't think it is a good idea, but that's because of other reasons (basically it introduces more confusing behaviour than it removes); NOT because it can only work for a subset of people.


> Over 2bn people speak English, Spanish or French. That's somewhat significant. You're going to tell those 2bn people they can't have nice things because there are other people that wouldn't get them?

If the cost of the nice thing (which I, as one of the 2bn don't even consider nice) has to be paid by the other 6bn people as well, sure.

> As I said before, I still don't think it is a good idea, but that's because of other reasons (basically it introduces more confusing behaviour than it removes); NOT because it can only work for a subset of people.

This is my main reason for being hesitant to the whole thing as well. The post linked above just happened to broaden my view on the problem a bit.


100% agree that the argument should be around file system behaviour rather than some imagined social justice in file system names.

There are some weird takes on here. Most users aren’t even going to see the file system in the future.

If there were a time to make everything case insensitive it was 35 years ago.

We might even rarely type soon.


"weird languages"? sigh.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: