Well, caseless text is a special scenario and not the default scenario. Case is a very strong signal for NER disambiguation, so if you want to support that, then you should apply a special model for that - because if the default model would include support for caseless text, then it would harm the accuracy for all the majority of scenarios where text actually is cased properly.
In essence, the current approaches are targeted for one domain of text over another. You can have a model that works reasonably in one scenario, or a model that works reasonably in another scenario, or an universal model that works poorly in all scenarios and thus is useless unless you really don't know what you're going to be analyzing.
You can support non-literary slang, but that comes at a cost for accuracy on literary languages. You can support multiple variants of language (e.g. for English - British, Indian, AAVE and non-AAVE American) but that comes at a cost of accuracy on any particular variant. You can support text ridden with typos, grammatical mistakes and chat-abbreviations, but that comes at a cost on correct text. The same applies for word casing. So for all of these things you try to support them if and only if you think you need them, since you don't have much of an "accuracy reserve" to sacrifice; the systems generally are barely sufficient for their use for your target domain, and they become not sufficient if you try to make them more general than you need to.
It would be nice if the default models would explicitly list their assumptions, though. Like, a model trained only on correct literary text of one language variant in proper case and not on anything else should clearly state that.