Every time I see someone argue for case insensitivity I remember this excellent ...

arp242 · on June 20, 2023

People are making things more complex than they are. No one said you need to convert between scripts or muck about with fullwidth variants or stuff like that; "semantically identical" is not the same as "case insensitive". That post is going of on a "if you want case-insensitivity then you must also treat color and colour as identical!" tangent, which is just silly and not what anyone has ever argued for.

The scripts with case translations that are more complex than a simple 1-to-1 mapping are the exception, not the rule (German, Greek, Turkish, Lithuanian). These can be dealt with.

It's certainly not the case that it "will only work for English and a few European languages"; it will work for much of the world.

The fact of the matter is, two out of the three most used systems today are case-insensitive, and Linux/POSIX being the exception is rather painful.

alextingle · on June 20, 2023

> People are making things more complex than they are.

You are the one who wants to introduce extra complexity: A few narrow corner cases where the filesystem treats certain code-points as equivalent. Imagine how confusing that would be to someone who doesn't regularly use those code-points.

lelanthran · on June 20, 2023

> It's certainly not the case that it "will only work for English and a few European languages"; it will work for much of the world.

I respectfully disagree.

Consistently broken in the same way for everyone is better than unreliably working for some people, reliably working for other people, unreliably broken for a third group and reliably broken for a fourth group.

Filenames should be created as the user typed it - don't change the input before storing it.

Filenames should be displayed as the user typed it - don't render output different to what was input.

Tools should match filenames as the user expects it - in this case (hehe) it should perform a case-insensitive match. Ambiguity resolution has to be performed in the case where there is more than one match.

    cat > MyFile.txt    # Create 'MyFile.txt'
    ls                  # Display 'MyFile.txt'
    echo myfile*        # Display 'MyFile.txt'
    vim myfile.txt      # Opens 'MyFile.txt'

The first problem with this is in automated shell scripts: there is no interactivity so the script can't prompt the user "Which of the following files did you mean to open: [MyFile.txt, myfile.txt]?". This means that a script which relies on 'foo.txt' will fail if someone creates a 'Foo.txt' in the same directory.

Another issue with this is that userspace calls to `open()`, `stat()`, etc aren't able to fail and return a list of case-insensitive matches in the case of ambiguity. This makes handling the ambiguity the applications (very complex) problem - before any `open(fname)` call, the application will first have to `scandir()` to get all the filenames, then perform a case-insensitive match against the list to get all the CI matches, then prompt the user to select the correct one.

> The fact of the matter is, two out of the three most used systems today are case-insensitive

I think it's more that they are case-aware than case-insensitive; after all the filesystem certainly stores the case, and they both retrieve the case correctly.

It's the tooling that faces the user which "fix" cases to prevent two files with the same name in different cases being created.

You can certainly create file 'FOO' and file 'foo' on Windows in the same directory, and then the tools tend to randomly open only one of them no matter which on the user clicked on.[1]

Which is why I say that the actual filesystem is case-sensitive. The user programs normalise the case for the user.

[1] EDIT: I accidentally did this once, and had to write another small tool to remove files with the filename as typed because the normal tools (del, explorer.exe) just randomly choose one file to remove.

Someone · on June 20, 2023

> Filenames should be created as the user typed it - don't change the input before storing it.

I wouldn’t go that far. Certainly, I would use some normalization algorithm on whatever the user typed, so that there’s no difference between typing a precomposed character (https://en.wikipedia.org/wiki/Precomposed_character) and typing a character and a combining character (https://en.wikipedia.org/wiki/Combining_character)

If you don’t do that, the user’s keyboard layout may affect whether they can type a given file name.

Also (nitpicking), if a user types a backspace or control-H, I wouldn’t include a 0x08 in the file name.

kergonath · on June 20, 2023

> It's definitely more complex than you'd naively think it is.

It has been working just fine on macOS for more than 20 years now.

creshal · on June 20, 2023

…for English, and a handful of other European languages where the problem is trivial. Not for the 70% or so of humanity who isn't European or European-descendant.

kergonath · on June 20, 2023

It uses a specific normalisation. It’s not perfect (even for supposedly easy European languages), but that’s still many less foot guns than the status quo. Again, macOS works just fine with these languages.

creshal · on June 21, 2023

Oh, wait, THAT'S what keeps breaking umlauts every time Mac users try to share files with any other operating system? Bloody hell, that garbage just keeps causing trouble constantly.

ElectricalUnion · on June 20, 2023

[1] https://github.com/tesseract-ocr/tesseract/issues/3447

Less that 20 years ago. Not a small project.

IshKebab · on June 20, 2023

That's interesting but he does dive head first into a stupidly common fallacy - "if we can't do it perfectly we shouldn't do it at all".

I still think filesystems should be case sensitive, but not because there are some weird languages out there that it wouldn't work for.

scbrg · on June 20, 2023

> That's interesting but he does dive head first into a stupidly common fallacy - "if we can't do it perfectly we shouldn't do it at all".

I rather think you need to solve at least a somewhat significant subset of a problem in order to justify the extra complexity of the solution (and any confusion caused by it not solving the whole problem). Believe me, I'm a great believer in "perfect is the enemy of good", but not to the extent that I think that "any shitty solution at any cost is better than nothing". I'm not convinced that case insensitivity doesn't fall in the later category.

IshKebab · on June 20, 2023

Over 2bn people speak English, Spanish or French. That's somewhat significant. You're going to tell those 2bn people they can't have nice things because there are other people that wouldn't get them?

As I said before, I still don't think it is a good idea, but that's because of other reasons (basically it introduces more confusing behaviour than it removes); NOT because it can only work for a subset of people.

scbrg · on June 20, 2023

> Over 2bn people speak English, Spanish or French. That's somewhat significant. You're going to tell those 2bn people they can't have nice things because there are other people that wouldn't get them?

If the cost of the nice thing (which I, as one of the 2bn don't even consider nice) has to be paid by the other 6bn people as well, sure.

> As I said before, I still don't think it is a good idea, but that's because of other reasons (basically it introduces more confusing behaviour than it removes); NOT because it can only work for a subset of people.

This is my main reason for being hesitant to the whole thing as well. The post linked above just happened to broaden my view on the problem a bit.

Affric · on June 20, 2023

100% agree that the argument should be around file system behaviour rather than some imagined social justice in file system names.

There are some weird takes on here. Most users aren’t even going to see the file system in the future.

If there were a time to make everything case insensitive it was 35 years ago.

We might even rarely type soon.

ykonstant · on June 20, 2023

"weird languages"? sigh.