Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just got a little more respect for pythonic whitespace-sensitivity

EDIT: come to think of it, even that might not have done much here, where well-formedness is the issue :(



Yeah, if anything, python worsens the situation. I had a friend DOS our server because he accidentally inserted a tab, causing the illusion that one statement was inside a block but was actually outside it. He swore off python at that point. I personally avoid the language, but I understand due to issues like that these days mixing tabs and spaces is an error (or is it just a warning?) by default. Regardless, still pretty silly to me to have whitespace play such a major significant role, besides the fact that I find it visually harder to read, like text without punctuation or capitalisation.


Mixing tabs and spaces usually throws a runtime exception. I'm not gonna make a value judgement about that, but your story doesn't make sense based on how I understand py3

Edit, sorry, shoulda read your whole commebt before replying


Yep. It was a few years ago while that was stilled allowed (as I'd noted ;) ) but regardless. Significant whitespace is just annoying.. There's a lot of things that render as whitespace, and source code one might be reviewing could be printed wrapped or copied and pasted in odd ways. Other languages are more robust to this.


You can enable showing whitespace characters in your editor, in PyCharm they are visualised perfectly well as to not distract from the non-whitespace.


Never had this happen, this is largely eliminated by using tools like black or other autoformatters.


I feel that the main point still stands, though. Saying that Python doesn't have a whitespace problem because you can send the code through a tool that detects whitespace-related problems still acknowledges the existence of said problem.


The point still stands that this is an inherit possibility in the language it self.


Well, in C the visual indentation and the meaning of the code (given by {}) can diverge. That's even worse, and happens in practice. Many style guidelines for C have evolved specifically to address that problem with 'the language itself'.


> Well, in C the visual indentation and the meaning of the code (given by {}) can diverge.

How?


The compiler doesn't check your indentation at all. (OK, not true, these days you get warnings for misleading indentation.) But here's an example of misleading indentation in C:

    if(some condition)
        do something; {
        do something else;
    }
    do another thing;
You can stretch 'some condition' out over multiple lines and have some more parents inside of it, to make it more confusing.

See also https://softwareengineering.stackexchange.com/a/16530


I see.

Yes

I class that as a bug in the original C specification


But you either make visual indentation significant, or you risk having a discrepancy between visual indentation and what the language sees.


> But you either make visual indentation significant...

That would be Python. I do not like it, which does not mean it is bad, it is a matter of taste

The alternative is you say "blocks are delimited by {}"

`if(foo) bar` should be invalid

`if(foo){bar}` is OK by me. Worik's tick of approval

(Neither is valid C I think)


> That would be Python.

Not necessarily. In practice, in C-as-actually-used people (should) set up linters and formatters, so that you can rely on indentation.

When programming, this means that you can behave as if both curly braces and indentation are significant, and you get an error when they are out-of-sync.


I think its more of an tooling issue. If your editor / diff viewer cant differentiate between different whitespaces get a better one. Also if you want to ensure some function isnt in local scope etc just test it.


And/or linters.

This issue is no worse that having multiple different variables that look the same and only in their obscure same-looking unicode symbols.

A linter can handle this. As can a decent font, or editor settings.


> There's a lot of things that render as whitespace

Like what? As you note, mixing tabs and spaces is now an error.

I've never understood the objection to semantic whitespace. Complaining about having to indent code correctly strikes me as being akin to complaining about having to pee in the toilet instead of the bathtub.


there is a huge difference between having a coding standard (handled easily by a linter on commit in most languages) and making a particular whitespace indentation a critical language feature.

Bonus, if there's a good reason to change the linter format, you can do so. Rust handles this rather well I think.


white space as a delimiter is why i never use python.


> white space as a delimiter is why i never use python.

Whitespace is a delimiter in (almost?) all languages humans use.

Whitespace determining which scope you’re in is one of the many problems of making whitespace significant, which might be what you meant.


“The thing that controls scope is the count of certain nonprinting characters, which happen to come in multiple widths” is reasonably insane, yes


Which non-printing characters are you talking about? Whitespace characters are very much printable.

Yes, I agree that Python should just forbid tabs. As a second best, you can tell your linter (and/or formatter) to forbid tabs (and any weird unicode whitespace). That's basically the equivalent of compiling your C with -Wall.


This was an issue in Python 2, where for some dumb reason it allowed mixing tabs and spaces and equated tab to 8 spaces (I think). Python 3 doesn't have that issue.


Of course, if one was coding an exploit, one could still use python2. It is still commonly available due to a long tail of legacy scripts and in some cases (like a script I use routinely but didn't write) the difficulty of porting it to python3 (I've asked over a dozen pythonistas over the years, they kept running into same problems)


Regarding your almost query. There was a debate over Ogham space mark in unicode. It is considered whitespace though, with the rationale that it is sometimes visible, but sometimes invisible. Depending upon whether the text has a stem-line. That doesn't make the set of non-whitespace delimited languages empty. Perhaps there is one with an always-visible delimiter that didn't get the whitespace justification, but does at least give one human language delimited by a printable character (which happens to be whitespace).


> That doesn't make the set of non-whitespace delimited languages empty.

Well, there's also the opposite: Goldman Sachs's Slang allows space as part of identifiers.


> Well, there's also the opposite: Goldman Sachs's Slang allows space as part of identifiers.

Many SQL implementations permit whitespace in identifiers, but then you need to use quoted identifiers.


That's reasonably sane of SQL. In Slang, you don't need to quote. (The syntax is still unambiguous. In principle, eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between. But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between.)


In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers

> eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between

The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity

> But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between

This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`


> In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers

Sorry, I don't remember, and I can't seem to find it out online. I could ask my friends who still work there, if it's important. (For what it's worth, I never remember anyone complaining about mixing up a different number of spaces in their variable names. So either all number of spaces were treated the same, or perhaps multiple spaces in a row in an identifier were just banned (via linter or via language?).)

> The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity

Yes, you would need to sort out these details, if you wanted to add this 'feature' to Python.

> This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`

I'm glad Rust made the same decision.

I do like using the 'space operator' to denote functions calls, at least for a language like Haskell or OCaml.


Human language is also way more ambiguous. One of the reasons I love coding is a massively reduced vocabulary and a way more strict grammer.


Yes mixing tabs and spaces is a big no no and rightfully throws an error now


Not an improvement

Tabs still have syntactical meaning and are still invisible

Python will never be a success, I predict, because of this

Trust me


humans are weird creatures sometimes. there was this bad thing that happened that won't happen again now, but now I can't use the thing forever because Reasons.


In Python, you only need to indent the `def test_foo` by an additional whitespace, to make it a locally scoped function instead of a proper test function.


No, not if you have any sane linter or formatter involved. They wouldn't let you get away with indenting by a single space, but only by multiples of whatever you normally use in the rest of your program.


I mean, some CI system should be checking that "if_code_compiles()" blocks compile somewhere. It should be an error until the CI system has that header and can test both variants.

People are really quick to add optionality like this without understanding the maintenance cost. (Every boolean feature flag increases the number of variants you need to test by 2!) Either make a decision, or check that both sides work. Don't let people check in dead code.


multiplies it by 2, not increases


yup, meant to say "a factor of 2".


AST diffs instead of textual diffs might have helped here (to spot the `.` making the code un-compilable).

Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.


> Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.

Not sure, you could also just forbid code that's too complex to analyse without going down a rabbit hole. Instead of trying to go down the rabbit hole.

In general, it's hard to analyse arbitrary code. But we don't have to allow arbitrarily complex code when doing code review.


I think main issue was that it was embedded in the file itself like that. Would have preferred to have it in a separate valid C file with syntax highlighting etc and being parsed from that file.


Perhaps, but given how most build systems work, that would complicate things in other ways (since build systems often try to compile all .c files).


That's really not how most build systems work.


I am a CMake novice. Is that true for CMake in this example?


Definitely not, cmake's try_compile function even directly supports being passed a source file.


Put it in a special folder that is ignored from the build.


Ah yes but thanks to C being cursed due to includes and macros this is harder to do


Huh, the code with a dot is not legal C. It is CMake issue that the test breaks here.


That’s what makes this so clever: these systems were born in the era where you couldn’t trust anything - compilers sometimes emitted buggy code, operating systems would claim to be Unix but had weird inconsistencies on everything from system calls to command line tool arguments, etc. - so they just try to compile a test script and if it fails assume that the feature isn’t supported. This is extremely easy to miss since the tool is working as it’s designed and since it’s running tons of stuff there’s a ton of noise to sort through to realize that something which failed was supposed to have worked.


You could just run tests for the feature detection on a known system or a dozen(VMs are cheap). The big problem is that most code is not tested at all or test errors are flat out ignored.


Nothing “just” about it. VMs weren’t cheap when the C world started using autoconf - that was a feature mostly known on IBM mainframes – and in any case that couldn’t do what you need. The goal is not to figure out what’s there on systems _you_ control, it’s to figure out what is actually available on the systems other users are building on. Think about just this library, for example, your dozen VMs wouldn’t even be enough to cover modern Linux much less other platforms but even if it was (drop most architectures and only support the latest stable versions for the sake of argument), you’d still need to build some reliable prober for a particular bug or odd configuration.

The problem here wasn’t the concept of feature detection but that it was sabotaged by someone trusted and easily missed in the noise of other work. What was needed was someone very carefully reviewing that commit or build output to notice that the lockdown feature was being disabled on systems where it was fully enabled, and we already know that maintainer time was in short supply. Any other approach would likely have failed because the attacker would have used the same plausible sounding language to explain why it needed to be a complicated dynamic check.


The goal is to figure out if the optional features work at all. For that a system you control is required.

That the sabotage worked at all relied on the fact that nobody was testing those features.

How secure is a sandbox that may not even exist? Apparently its good enough for most.


That's kind of the point. It's feature detection code. If the code cleanly compiles, the feature is assumed supported, otherwise, it's assumed not present/functional. This pretty common with autotools. The gotcha here is this innocuous period is not supposed to be syntactically valid. It's meant to be subtle and always disable the feature.


Shouldn't whatever is depending on xz supporting landlock be verifying that it's the case through blackbox tests or something? Otherwise a check like this even without the bug could end up disabling landlock if e.g. the compiler environment was such that a given header wasn't available...


Yes, this is intentionally how autoconf is supposed to work.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: