Yeah, if anything, python worsens the situation.
I had a friend DOS our server because he accidentally inserted a tab, causing the illusion that one statement was inside a block but was actually outside it.
He swore off python at that point.
I personally avoid the language, but I understand due to issues like that these days mixing tabs and spaces is an error (or is it just a warning?) by default. Regardless, still pretty silly to me to have whitespace play such a major significant role, besides the fact that I find it visually harder to read, like text without punctuation or capitalisation.
Mixing tabs and spaces usually throws a runtime exception. I'm not gonna make a value judgement about that, but your story doesn't make sense based on how I understand py3
Edit, sorry, shoulda read your whole commebt before replying
Yep. It was a few years ago while that was stilled allowed (as I'd noted ;) ) but regardless. Significant whitespace is just annoying.. There's a lot of things that render as whitespace, and source code one might be reviewing could be printed wrapped or copied and pasted in odd ways. Other languages are more robust to this.
I feel that the main point still stands, though. Saying that Python doesn't have a whitespace problem because you can send the code through a tool that detects whitespace-related problems still acknowledges the existence of said problem.
Well, in C the visual indentation and the meaning of the code (given by {}) can diverge. That's even worse, and happens in practice. Many style guidelines for C have evolved specifically to address that problem with 'the language itself'.
The compiler doesn't check your indentation at all. (OK, not true, these days you get warnings for misleading indentation.) But here's an example of misleading indentation in C:
if(some condition)
do something; {
do something else;
}
do another thing;
You can stretch 'some condition' out over multiple lines and have some more parents inside of it, to make it more confusing.
Not necessarily. In practice, in C-as-actually-used people (should) set up linters and formatters, so that you can rely on indentation.
When programming, this means that you can behave as if both curly braces and indentation are significant, and you get an error when they are out-of-sync.
I think its more of an tooling issue. If your editor / diff viewer cant differentiate between different whitespaces get a better one. Also if you want to ensure some function isnt in local scope etc just test it.
> There's a lot of things that render as whitespace
Like what? As you note, mixing tabs and spaces is now an error.
I've never understood the objection to semantic whitespace. Complaining about having to indent code correctly strikes me as being akin to complaining about having to pee in the toilet instead of the bathtub.
there is a huge difference between having a coding standard (handled easily by a linter on commit in most languages) and making a particular whitespace indentation a critical language feature.
Bonus, if there's a good reason to change the linter format, you can do so.
Rust handles this rather well I think.
Which non-printing characters are you talking about? Whitespace characters are very much printable.
Yes, I agree that Python should just forbid tabs. As a second best, you can tell your linter (and/or formatter) to forbid tabs (and any weird unicode whitespace). That's basically the equivalent of compiling your C with -Wall.
This was an issue in Python 2, where for some dumb reason it allowed mixing tabs and spaces and equated tab to 8 spaces (I think). Python 3 doesn't have that issue.
Of course, if one was coding an exploit, one could still use python2. It is still commonly available due to a long tail of legacy scripts and in some cases (like a script I use routinely but didn't write) the difficulty of porting it to python3 (I've asked over a dozen pythonistas over the years, they kept running into same problems)
Regarding your almost query. There was a debate over Ogham space mark in unicode.
It is considered whitespace though, with the rationale that it is sometimes visible, but sometimes invisible. Depending upon whether the text has a stem-line.
That doesn't make the set of non-whitespace delimited languages empty.
Perhaps there is one with an always-visible delimiter that didn't get the whitespace justification, but does at least give one human language delimited by a printable character (which happens to be whitespace).
That's reasonably sane of SQL. In Slang, you don't need to quote. (The syntax is still unambiguous. In principle, eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between. But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between.)
In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers
> eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between
The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity
> But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between
This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`
> In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers
Sorry, I don't remember, and I can't seem to find it out online. I could ask my friends who still work there, if it's important. (For what it's worth, I never remember anyone complaining about mixing up a different number of spaces in their variable names. So either all number of spaces were treated the same, or perhaps multiple spaces in a row in an identifier were just banned (via linter or via language?).)
> The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity
Yes, you would need to sort out these details, if you wanted to add this 'feature' to Python.
> This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`
I'm glad Rust made the same decision.
I do like using the 'space operator' to denote functions calls, at least for a language like Haskell or OCaml.
humans are weird creatures sometimes. there was this bad thing that happened that won't happen again now, but now I can't use the thing forever because Reasons.
In Python, you only need to indent the `def test_foo` by an additional whitespace, to make it a locally scoped function instead of a proper test function.
No, not if you have any sane linter or formatter involved. They wouldn't let you get away with indenting by a single space, but only by multiples of whatever you normally use in the rest of your program.
I mean, some CI system should be checking that "if_code_compiles()" blocks compile somewhere. It should be an error until the CI system has that header and can test both variants.
People are really quick to add optionality like this without understanding the maintenance cost. (Every boolean feature flag increases the number of variants you need to test by 2!) Either make a decision, or check that both sides work. Don't let people check in dead code.
AST diffs instead of textual diffs might have helped here (to spot the `.` making the code un-compilable).
Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.
> Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.
Not sure, you could also just forbid code that's too complex to analyse without going down a rabbit hole. Instead of trying to go down the rabbit hole.
In general, it's hard to analyse arbitrary code. But we don't have to allow arbitrarily complex code when doing code review.
I think main issue was that it was embedded in the file itself like that. Would have preferred to have it in a separate valid C file with syntax highlighting etc and being parsed from that file.
That’s what makes this so clever: these systems were born in the era where you couldn’t trust anything - compilers sometimes emitted buggy code, operating systems would claim to be Unix but had weird inconsistencies on everything from system calls to command line tool arguments, etc. - so they just try to compile a test script and if it fails assume that the feature isn’t supported. This is extremely easy to miss since the tool is working as it’s designed and since it’s running tons of stuff there’s a ton of noise to sort through to realize that something which failed was supposed to have worked.
You could just run tests for the feature detection on a known system or a dozen(VMs are cheap). The big problem is that most code is not tested at all or test errors are flat out ignored.
Nothing “just” about it. VMs weren’t cheap when the C world started using autoconf - that was a feature mostly known on IBM mainframes – and in any case that couldn’t do what you need. The goal is not to figure out what’s there on systems _you_ control, it’s to figure out what is actually available on the systems other users are building on. Think about just this library, for example, your dozen VMs wouldn’t even be enough to cover modern Linux much less other platforms but even if it was (drop most architectures and only support the latest stable versions for the sake of argument), you’d still need to build some reliable prober for a particular bug or odd configuration.
The problem here wasn’t the concept of feature detection but that it was sabotaged by someone trusted and easily missed in the noise of other work. What was needed was someone very carefully reviewing that commit or build output to notice that the lockdown feature was being disabled on systems where it was fully enabled, and we already know that maintainer time was in short supply. Any other approach would likely have failed because the attacker would have used the same plausible sounding language to explain why it needed to be a complicated dynamic check.
That's kind of the point. It's feature detection code. If the code cleanly compiles, the feature is assumed supported, otherwise, it's assumed not present/functional. This pretty common with autotools. The gotcha here is this innocuous period is not supposed to be syntactically valid. It's meant to be subtle and always disable the feature.
Shouldn't whatever is depending on xz supporting landlock be verifying that it's the case through blackbox tests or something? Otherwise a check like this even without the bug could end up disabling landlock if e.g. the compiler environment was such that a given header wasn't available...
EDIT: come to think of it, even that might not have done much here, where well-formedness is the issue :(