C is the ultimate WYSIWYG language (provided you understand the semantics of you...

hgs3 · on May 23, 2024

> Please don’t ruin C.

Exactly this. C++ folks should not approach C like a "C++ lite". I appreciate the authors candid take on the subject.

As for defer, there is some existing precedent like GCC and Clang's __attribute__((cleanup)), but - at least for me - a simple "goto cleanup;" is usually sufficient. If I understand N3199 [1] correctly, which is the authors proposal for introducing defer in C, then "defer" would be entirely a compile-time construct. Essentially just a code transformation to inject the necessary cleanup at the right spots. If you're going to introduce defer to C then that does seem like the "best" approach IMO.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3199.htm

smallstepforman · on May 23, 2024

We C++ devs haved moved away from C decades ago, and frankly dont even think of it any more, and will never go back. Its a relic of its time, like DOS, Amiga etc. RAII is a big feature we can no longer live without. The type system and overloading is fantastic. And std::vector is a magnificant feature. A language without these features is a relic for us C++ devs.

And yes, I also agree that C++ has WTF insanity, like 17 or so initialisation quirks, exceptions in general (primarily to address failures in constructers, surely there must be a better way, also OOM / bad_alloc is a relic from the past), and unspecified sizes for default built in types (thats C heritage).

uecker · on May 23, 2024

I moved from C++ back to C and found that I am much more productive not worrying about a lot of things. But it takes a while to figure out how to do things in C because almost nothing comes out-of-the-box.

atn34 · on May 23, 2024

> provided you understand the semantics of your target architecture

Unless you're writing inline assembly or intrinsics or something like that, the semantics of your target architecture are quite irrelevant. If you're reasoning about the target architecture semantics that's a pretty good indication that what you're writing is undefined behavior. Reasoning about performance characteristics of your target architecture is definitely ok though.

chowells · on May 23, 2024

And presuming you avoid 100% of undefined behavior, which I've never seen a non-trivial C program succeed at. C is way too complicated in the real world. You don't want C, you want a language that actually gives defined semantics to all combinations of language constructs.

fooker · on May 23, 2024

>you want a language that actually gives defined semantics to all combinations of language constructs

No, this is wrong. It's a common misconception though. You would only want that in a hypothetical world where all computers are exactly the same.

Undefined and implementation defined behavior is what allows us to have performance at all. Here are some simple examples.

Suppose we want to make division by zero and null pointer dereference defined. Now every time you write a/b or *x, the compiler will be forced to emit an extra branching check before this operation.

Something much more common---addition. What about signed overflow? Do you want the compiler to emit an overflow check in advance? Similar reasoning for shift instructions.

UB in the language specification allows compilers to optimize based on the assumption that the programs you write won't have undefined behavior. If compilers are not able to do this, it becomes impossible to implement most optimizations we rely on. It's a very core feature of modern language specifications, not an oversight you can fix by thinking about it for 10 minutes.

dooglius · on May 23, 2024

> Now every time you write a/b or *x, the compiler will be forced to emit an extra branching check before this operation.

This is wrong, because you would define them to have the behavior that the architecture in question does, so no changes would be needed. For integer division this would mean entering an implementation-defined exceptional state that does not by default continue execution (on Linux, SIGFPE with the optional ability to handle that signal). For dereferencing a pointer, it should have the same semantics as a load/store to any other address--if something is there it works normally, if the memory is unmapped e.g. for typical Linux x86 programs you get SIGSEGV (just as you would for accessing any other unmapped address).

fooker · on May 23, 2024

Okay, you get half of the story.

Suppose now, there are two architectures with slightly differing behavior.

Can the compiler still optimize signed x + 1 > x to true?

jay-barronville · on May 23, 2024

> Suppose we want to make division by zero and null pointer dereference defined.

A good example is WebAssembly*—address 0x00000000 is a perfectly fine and well-defined address in linear memory. In practice though, most code you’ll come across targeting WebAssembly treats it as if dereferencing it is undefined behavior.

* Of course WebAssembly is a compiler target rather than a language, but it serves as a good example of the point you’re making.

bigstrat2003 · on May 23, 2024

> UB in the language specification allows compilers to optimize based on the assumption that the programs you write won't have undefined behavior.

Given that has proven to be a completely false assumption, I don't think there's a justification for compilers continuing to make it. Whatever performance gains they are making are simply not worth the unreliability they are courting.

fooker · on May 23, 2024

> Given that has proven to be a completely false assumption

This part is correct. The problem is in how to deal with this. If you want the compiler to correctly deal with code having undefined behavior, often the only possibility is to assume that all code has undefined behavior. That means, almost every operation gets a runtime branch. That is completely incompatible with how modern hardware works.

The rest is wrong, but again, this is a common misconception. Language designers and compiler writers are not idiots, contrary to popular belief. UB as a concept exists for a reason. It's not for marginal performance boosts, it is to enable any compiler based transformation, and a notion of portability.

grumpyprole · on May 23, 2024

I'm sorry I still don't buy it. Can you please show me a use case where ignoring null pointer or overflow checks makes your product non-viable or uncompetitive?

Some of these checks could be removed by languages with better compilers and likely more restrictions. That is the better approach. As a user, I don't want to run code that is potentially unsafe and/or insecure.

daemin · on May 23, 2024

So the simplest case for not providing a language specification for dereferencing a null pointer is that it requires putting in checks everywhere to detect the condition and then do something in the case where the pointer is null. So what should the null pointer case do then? Something like emit an exception, or send a signal, or call std::terminate to exit the process?

I know that languages like Java have a NullPointerException which they can throw and handle for situations like this, but they're also built on a highly specified virtual machine architecture that is consistent across hardware platforms. This also does not guarantee that your program is safe from crashing when this exception gets thrown, as you have to handle it somewhere. For something as general as this it will probably be in the Main function, so you might as well let it go unhandled as there's not that much you can do at that point.

For a language like C++ it is simpler, easier, and I would argue more correct, to just let the hardware handle the situation, which in this case would trigger a memory error of trying to access invalid memory. As the real issue is probably somewhere else in the code which isn't being handled correctly and the bad data is flowing through to the place where it accesses the null pointer and the program crashes.

To add to that in a lot of cases the program isn't crashing while trying to access address 0, it's crashing trying to access address 200, or 1000, or something like that, and putting in simplistic checks isn't going to catch those. You could argue that the check should guard against accessing the lowest 1k of memory, but then when do you stop, at 64k? Then you have an issue with programs that must fit within 1k of memory.

Leaving it unspecified is the better choice.

fooker · on May 23, 2024

It's not ignoring about ignoring null pointer or overflow checks, it's having to necessarily insert these checks everywhere.

grumpyprole · on May 23, 2024

We should build compilers that insert these checks for us (if they cannot statically determine them unnecessary). The ability to omit these checks doesn't IMHO justify undefined behaviour.

fooker · on May 23, 2024

Well, good news is that you have optional modes in most compilers that do this.

You would not want to force these by default, nobody wants it. You can not statically determine them unnecessary in for the vast majority of code, even stuff as simple as `print(read(a) + read(b))`.

dralley · on May 23, 2024

And yet somehow languages such as Rust, which have no UB (in the safe subset) manages to be within 5% of C and often faster in both real-world codebases and microbenchmarks.

fooker · on May 23, 2024

It’s just a change in jargon for ‘marketing’ reasons.

For example: Rust will silently wrap signed integers in release mode even when it’s considered a bug and crashes in debug mode.

rcxdude · on May 23, 2024

That is pretty much the only example where there's a compromise between performance and correctness as a difference between release and debug mode, and note that it's a) not undefined behaviour and b) does not violate any of rust's safety guarantees.

Every other example you mention is done by rust in release mode and the performance impact is minimal, so I would say it's a good counterexample to your claims that defining these things would hamstring performance (signed integer overflow especially is an obvious no-brainer for defining. Note that doesn't necessarily mean overflow checks! Even just defining the result precisely would remove a lot of footguns).

Slyfox33 · on May 23, 2024

Signed overflow is not UB in rust. That's not the same thing at all.

fooker · on May 23, 2024

It’s not.

You have missed my point.

samatman · on May 23, 2024

Zig, a language which is explicitly aimed at the same domain as C, has an improved semantics for all of these things.

If a pointer can be null, it must be an optional pointer, and you must in fact check before you dereference it. This is what you want. Is it ok to write a program which segfaults at random because you didn't check for a pointer which can be null? Of course not. If you don't null-check the return value of e.g. malloc, your program is invalid.

But the benefit is in the other direction. Careful C checks for null before using a pointer, and keeping track of whether null has been checked is a manual process. This results in redundant null checks if you can't statically prove (by staring at the code and thinking very hard) that it isn't null. So in practice you're likely to have a combination of not checking and getting burned, and checking a pointer which was already checked. To do otherwise you have to understand the complete call graph, this is infeasible.

Zig doesn't do any of this. If it's a pointer, you can safely dereference it. If it's an optional pointer, you must check, and then: it's a pointer. Safe to pass down the call stack and freely use. If you want C behavior you can always YOLO and just say `yoloptr.?.*`.

Overflow addition and divide by zero are safety checked undefined behavior, a critical concept in the specification. They will panic with a stack trace in debug and ReleaseSafe mode, and blow demons out of your nose in ReleaseFast and ReleaseSmall modes. There's also +% for guaranteed wraparound twos-complement overflow, and +| for saturating addition. Also `@addWithOverflow` if your jam is checking the overflow bit. Unwrapping an optional without checking it is also safety-checked UB: if you were wrong about the assumption that the payload carries a value, you'll get a panic and stack trace on the line where you did `yolo.?`.

Shift operations require that the right hand side of the shift be a type log2(Type.bitwidth) of the left hand side. Zig allows integers of any width, so for a: u64, calling a << b requires that b be a u6 or smaller. Which is fine: if you know values will be within 0..63, you declare them u6, and if you want to shift on a byte, you truncate it: you were going to mask it anyway, right? Zig simply refuses to let you forget this. Addition of two u6 is just as fast as addition of the underlying bytes because of, you got it, safety-checked undefined behavior. In release mode it will just do what the chip does.

There's a common theme here: some things require undefined behavior for performance. Zig does what it can to crash your program if that behavior is exhibited while you're developing it. Other things require that you take some well-defined actions or you'll get UB: Zig tracks those in the type system.

You'll note that undefined behavior is very much a part of the Zig specification, for the same reasons as in C. But that's not a great excuse to make staying within the boundaries of defined behavior as pointlessly difficult as it is in C.

fooker · on May 23, 2024

Yes, you can surely improve things from C. C is not a benchmark for anything other than footguns per line of code.

The debug modes you mention are also available in various forms in C and C++ compilers. For example ASan and UBSan in clang will do exactly what you have described. The question is, then whether these belong in the language specification or left to individual tools.

pjmlp · on May 23, 2024

As proven multiple times throughout the computing history, individual tools are optional, and as such used less often than they actually should be.

Language specification is unavoidable when using said language.

fooker · on May 23, 2024

Have you wondered why Rust or Python do not have a specification?

For a bunch of languages outside the C-centric world, specifications don't exist.

pjmlp · on May 24, 2024

The certainly have, even if it isn't a ISO one.

https://docs.python.org/3/reference/index.html

https://docs.python.org/3/library/index.html

https://doc.rust-lang.org/reference/index.html

https://doc.rust-lang.org/std/index.html

https://ferrous-systems.com/blog/ferrocene-language-specific...

fooker · on May 25, 2024

Documentation and specification are not the same things.

The intuitive distinction is that the second one is for compiler/library developers, and the former is for users.

A specification can not leave any room for ambiguity or anything up to interpretation. If it does (and this happens), it is treated as a bug to be fixed.

lstodd · on May 23, 2024

mwahahaha. as if there is some divine "language specification" which all compilers adhere to on pain of eternal damnation.

no such thing ever existed.

pjmlp · on May 23, 2024

Given that one can write Fortran in any language, maybe you're right.

rcxdude · on May 23, 2024

it's not just in debug modes. It should be the standard in release mode as well (IMO the distinction shouldn't exist for most projects anyway). ASan and UBSan are explicitly not designed for that.

samatman · on May 23, 2024

Worth noting that Zig has ReleaseSafe, which safety-checks undefined behavior while applying any optimizations it can given that restriction.

The more interesting part is that the mode can be individually modified on a per-block basis with the @setRuntimeSafety builtin, so it's practical to identify the performance-critical parts of the program and turn off safety checks only for them. Or the opposite: identify tricky code which is doing something complex, and turn on runtime safety there, regardless of the build status.

That's why this sort of thing should be part of the specification. @setRuntimeSafety would be meaningless without the concept of safety-checked undefined behavior.

I would say that making optionals and fat pointers (slices) a part of the type system is possibly more important, but it all combines to give a fighting chance of getting user-controlled resource management correct.

Given the topic of the Fine Article, it's worth briefly noting that `defer` and `errdefer` are keywords in Zig. Both the test allocator, and the GeneralPurposeAllocator in safe mode, will panic if you leak memory by forgetting to use these, or rather, forget to free allocations generally. My impression is that the only major category of memory bugs these tools won't catch in development is double-free, and that's being worked on.

fooker · on May 23, 2024

Well, give it a try.

If you can make it work in a way that has acceptable performance characteristics, every systems language will adopt your technique overnight.

rcxdude · on May 23, 2024

I use rust, which already does this.

fooker · on May 23, 2024

Signed overflow is officially a 'bug' in rust, it traps in debug mode but silently follows LLVM/platform behavior in release mode.

Huh, doesn't that sound familiar?

steveklabnik · on May 23, 2024

> silently follows LLVM/platform behavior

This is not the case. It's two's compliment overflow.

Also, since we're being pedantic here: it's not actually about "debug mode" or "release mode", it is tied to a flag, and compilers must have that flag on in debug mode. This gives the ability to move release mode to also produce the flag in the future, if it's decided that the overhead is worth it. We'll see if it ever is.

> Huh, doesn't that sound familiar?

Nope, it is completely different from undefined behavior, which gives the compiler license to do anything it wants. These are well defined semantics, the polar opposite of UB.

fooker · on May 25, 2024

>This is not the case. It's two's compliment overflow.

Okay, here is an example showing that rust follows LLVM behavior when the optimizer is turned on. LLVM addition produces poison when signed wrap happens. I'm a little bit puzzled about the vehement responses in the comments wow. I have worked on several compilers (including a few patches to Rust), and this is all common knowledge.

https://godbolt.org/z/r6WTxGjrb

steveklabnik · on May 26, 2024

The Rust output:

  define noundef i32 @square(i32 noundef %x, i32 noundef %y) unnamed_addr #0 !dbg !7 {
    %_0 = add i32 %y, %x, !dbg !12
    ret i32 %_0, !dbg !13
  }

Let's compare like to like, here's one with equivalent C++ code: https://godbolt.org/z/Y4MnGeof4

The C++ output:

  define dso_local noundef i32 @square(int, int)(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 !dbg !99 {
    tail call void @llvm.dbg.value(metadata i32 %0, metadata !104, metadata !DIExpression()), !dbg !106
    tail call void @llvm.dbg.value(metadata i32 %1, metadata !105, metadata !DIExpression()), !dbg !106
    %3 = add nsw i32 %1, %0, !dbg !107
    ret i32 %3, !dbg !108
  }

> LLVM addition produces poison when signed wrap happens.

https://llvm.org/docs/LangRef.html#add-instruction

> nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.

Note that Rust produces `add`. The C++ produces `add nsw`. No poison in Rust, poison in C++.

Here is an example of these differences producing different results, due to the differences in behavior: https://godbolt.org/z/Gaonnc985

Rust:

  define noundef zeroext i1 @test() unnamed_addr #0 !dbg !14 {
    ret i1 true, !dbg !15
  }

C++:

  define dso_local noundef zeroext i1 @test()() local_unnamed_addr #0 !dbg !123 {
    tail call void @llvm.dbg.value(metadata i32 undef, metadata !128, metadata !DIExpression()), !dbg !129
    ret i1 false, !dbg !130
  }

This is because in Rust, the wrapping behavior means that this will always be true, but in C++, because it is UB, the compiler assumes it will always be false.

> I'm a little bit puzzled about the vehement responses in the comments wow.

You are claiming that Rust has semantics that it was very, very deliberately designed to not have.

samatman · on May 24, 2024

Rust includes a great deal of undefined behavior, unlocked with the trustme keyword. Ahem, sorry, unsafe. If only...

So if we're going to be pedantic, it's safe Rust which has defined semantics for basically everything. A considerable accomplishment, to be sure.

steveklabnik · on May 24, 2024

While this is true, we’re talking about integer overflow. That’s part of safe Rust. So it’s not really germane to this conversation.

pjmlp · on May 23, 2024

Even languages like Modula-2 and Ada, among others, had better semantics than C, but they didn't come for free alongside UNIX.

rperez333 · on May 23, 2024

I know nothing about Zig, but this is pretty interesting and looks well designed. Linus was recently very mad when someone suggested a new semantics for overflow:

—— I'm still entirely unconvinced.

The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.

Example:

   static inline u32 __hash_32_generic(u32 val)
   {
        return val * GOLDEN_RATIO_32;
   }

and dammit, I absolutely DO NOT THINK we should annotate this as some kind of "special multiply". —-

Full thread: https://lore.kernel.org/lkml/CAHk-=wi5YPwWA8f5RAf_Hi8iL0NhGJ...

jcranmer · on May 23, 2024

> The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.

No, it's really not. Do this experiment: for the next ten thousand lines of code you right, every time you do an integer arithmetic operation, ask yourself if the code would be correct if it wrapped around. I would be shocked if the answer was "yes" in as much as 1% of the time.

(The most recent arithmetic expression I wrote was summing up statistics counters. Wraparound is most definitely not correct in that scenario! Actually, I suspect saturation behavior would be more often correct than wraparound behavior.)

This is a case where I think Linus is 100% wrong. Integer overflow is frequently a problem, and demanding the compiler only check for it in cases where it's wrong amounts to demanding the compiler read the programmer's mind (which goes about as well as you'd expect). Taint tracking is also not a viable solution, as anyone who has implemented taint tracking for overflow checks is well aware.

cozzyd · on May 23, 2024

It depends heavily on context.

For the kernel, which deals with a lot of device drivers, ring buffers, and hashes, wraparound is often what you want. The same is likely to be true for things like microcontroller firmware and such.

In data analysis or monte carlo simulations, it's very rarely what you want, indeed.

jcranmer · on May 23, 2024

Is it really?

For example, I opened up https://elixir.bootlin.com/linux/latest/source/drivers/firew... as a random source file in the Linux kernel, and I didn't see a single line where wraparound would be correct behavior.

There are definitely cases where wraparound behavior is correct. There are also cases hard errors on overflow isn't desirable (say, statistics counters), but it's still hard to call wraparound the correct behavior (e.g., saturation would probably work better for statistics than wraparound). There are also cases where you could probably prove that overflow can't happen. But if you made the default behavior a squawk that wraparound occurred, and instead made developers annotate all the cases where that was desirable to silence the squawk, even in the entire Linux kernel, I'd suspect you'd end up with fewer than 1000 places.

This is sort of the point of the exercise--wraparound behavior is often what you want when you think about overflow, but you actually spend so much of your time not thinking about it that you miss how frequently wraparound behavior isn't what you wanted.

cozzyd · on May 23, 2024

I think wraparound generally is better for statistics counters like the ones in the linked code, since often you want to check the number of packets/errors per some time interval, which you can do with overflow (as long as the time interval isn't so long that you overflow within a period) but not with saturation.

samatman · on May 23, 2024

I think it's critical that we do annotate it as a special multiply.

If wraparound is ok for that particular multiplication, tell the compiler that. As a sibling comment says, this is seldom the case, but it does happen, in particular, expecting byte addition or multiplication to wrap around can be useful.

The actual expectation of the vast majority of arithmetic in a computer program is that the result will be correct in the ordinary schoolyard sense. While developing that program, it should absolutely panic if that isn't the case. "Well defined" doesn't mean correct.

I don't understand your objection to spelling that `val *% GOLDEN_RATIO_32` is. When someone sees that (especially you, later, coming back to your own code) it clearly indicates that wrapping is expected, or at least allowed. That's good.

bregma · on May 23, 2024

Unsigned integer overflow is not undefined in C or C++. You can rely on how it works.

Signed integer overflow, on the other hand, is undefined. The compiler is allowd to assume it never happens and can re-arrange or eliminate code as it sees fit under that assumption.

How many lines will this code print?

    for (int i = INT_MAX-1; i < 0; ++i) printf("I'm in danger!\n");

kimixa · on May 23, 2024

I feel the meme of "Undefined Behavior" has been massively exaggerated on the internet - the vast majority of examples appear to be extreme toy examples using the weirdest contrived constructs, or things that are expected to fault and you're already using platform-specific information to know what that would look like (e.g. expecting a segmentation fault). It's a Scary Boogyman That Will Kill You, not something that can be understood, managed, and avoided if necessary.

And even then there are tools to help define much of that - if you want well defined wrapped signed integers, great. If you want to trap on overflow, there's an option for that. Lots of compiler warnings and other static analysis tools (that would just be default-rejected by the compiler today if it didn't have historical baggage, but they exist and can be enabled to do that rejection).

Yes, there's many issues with the ecosystem (and tooling - those options above should be default IMHO), but massively overstating them won't actually help anyone make better software.

And other languages often have similar amounts of "undefined behavior" - but just don't document it as such, relying on a single implementation being "Defined Correct", and hope they're not actually being relied on if anything changes. Just like C, only undocumentated.

adrianN · on May 23, 2024

I don't feel like the cause of most (all?) memory safety bugs has been "massively exaggerated".

kimixa · on May 23, 2024

If you removed every case of "Undefined Behavior" from the C spec, you'd still have memory safety bugs. Because they're orthogonal (though may be coupled if they come from the same core logic error).

This is what I mean by it becoming "meme" - things like "Undefined Behavior" or "Memory Safety" have become a discussion-ending "Objective Badness", hiding the real intent - being "Languages I Do No Like" (or, most often, are a poor fit for the actual job I'm trying to do. Which is fine, but not rejecting that those jobs actually exist).

But they mean real things that we can improve in terms of software quality, and safety - but that's rarely the intended result when those terms are now brought up. And many things we can do right now with existing systems to improve things, to not throw away huge amounts of already well-tested code. To do a staged improvement, and not let "perfect" be the enemy of better.

adrianN · on May 23, 2024

I suppose there are ways to make the undefined behavior defined that preserve memory unsafety, so you’re technically correct. In practice one would probably require safe crashes for OOB access etc.

actionfromafar · on May 23, 2024

I can give an example on how to remove all undefined behaviour and preserve memory unsafety. First, we decide that all compilers compile to a fixed instruction set running on a CPU with a fixed memory model. Just pick one of the existing ones, like a 68000 or a 80486DX. Then, we decide that all unitialized memory is actually 0, always, from the operating system and the allocator. That should go pretty far or am I missing something?

throwaway2037 · on May 23, 2024

    > You don't want C, you want a language that actually gives defined semantics to all combinations of language constructs.

So, Zig?

sixfiveotwo · on May 23, 2024

Well, perhaps a subset of it, since it also introduces concepts that do not exist in C (eg. exceptions).

samatman · on May 23, 2024

Zig does not have exceptions, what it has is error sets. It uses the words try and catch, which does cause confusion, but the semantics and implementation are completely different.

If a function has an error type (indicated by a ! in the return type), you have a few options. You can use `result = try foo();`, which will propagate the error out of the function (which now must have ! in its signature). Or you can use `result = foo() catch default;` or `result = foo() catch unreachable;`. The former substitutes a default value, the latter is undefined behavior if there's an error (panic, in debug and ReleaseSafe modes).

Or, just `result = foo();` gives `result` an error-union type, of the intended result or the error. To do anything useful with that you have to unwrap it with an if statement.

It's a different, simpler mechanism, with much less impact on performance, and (my opinion) more likely to end up with correct code. If you want to propagate errors the way exceptions do, every function call needs a `try` and every return value needs a ! in the return type. Sometimes that's what you need, but normally error propagation is shallow, and ends at the first call which can plausibly do anything about the error.

sixfiveotwo · on May 24, 2024

Thank you for your input, I stand corrected. So as I understand it, it works somewhat like the result type of rust (or ocaml), or the haskell either type, but instead of being parameterized, it is extensible, isn't it?

samatman · on May 24, 2024

More like that, yes. Rust has two general-purpose mechanisms, generics and enums, which are combined to handle Optional and Result types. Zig special-cases optional types with `?type` (that is literally the type which can be a type or null), and special-cases errors with `!`. Particularly with errors, I find this more ergonomic, and easier to use. Exceptions were right about one thing: it does often make sense to handle errors a couple call frames up the stack, and Zig make that easy, but without the two awful things about exceptions: low-performance try blocks, and never quite knowing if something you call will throw one.

It also has tagged unions as a general mechanism for returning one of several enumerated values, while requiring the caller to exhaustively switch on all the possibilities to use the value. And it has comptime generics ^_^. But it doesn't use them to implement optionals or errors.

einpoklum · on May 23, 2024

You don't necessarily want that. Forcing language-defined semantics on everything costs performance. Sorry, it just does, we can't have it all. So, you can sacrifice performance for well-defined'ness, or you can choose not to - and the choice depends on the language _design goals_. As the design goals differ, so do the combinations of choices made for syntax and semantics.

bigstrat2003 · on May 23, 2024

I think pretty much any amount of performance is worth sacrificing in order to get rid of the gnarly things UB can cause. Correctness is the first and most important thing in programming, because if you can't be certain it works then it's not very useful.

einpoklum · on May 24, 2024

It may be worth it _for you_. It is not worth it _for others_.

Correctness can be established well enough - even if guaranteed automatically - in a language with UB.

planede · on May 23, 2024

How do you define a buffer overflow?

gpderetta · on May 23, 2024

> a simple statement like `a = b++;` can mean multiple constructors being called, hidden allocations, unexpected exceptions [...]

Yes, nothing like that is possible in C

https://godbolt.org/z/Ge4EqzznT

quietbritishjim · on May 23, 2024

The difference is that in C++ it's expected that you'll overload operators, provide implicit conversions and throw exceptions. Of course you can write terrible code in C but it is not common accepted practice to hide a longjmp in a macro disguised as an identifier.

pjmlp · on May 23, 2024

Indeed, you hide longjump in a #define macro instead, with a do while block trick.

jay-barronville · on May 23, 2024

The funny thing is, examples of macro craziness only strengthen my point, because C++ inherits all of that in addition to its hidden behaviors and magical semantics. It’s rare to find serious C code doing a lot of crazy things behind macros. In my experience, the few exceptions I can think of include the GMP library and data structure-related code trying to emulate generics (mostly hash tables).

gpderetta · on May 23, 2024

Yes, C++ is a larger language for sure. But because it has better abstraction facilities, macro hackery is less common.

gpderetta · on May 23, 2024

pthread_cleanup_{push,pop}

jay-barronville · on May 23, 2024

Haha. You can’t be serious—what’s the likelihood of running into C code like this in anything remotely serious (compared to the millions upon millions of lines of innocent-looking C++ code that does like a dozen different things under the hood)?

gpderetta · on May 23, 2024

No true Scotsman.

I assume you haven't looked at the expansion of errno lately?

edit: also

https://github.com/KxSystems/kdb/blob/master/c/c/k.h

fargle · on May 23, 2024

that's a deliberately unfair comparison. operator overloading, constructors, assignments, etc. happen "under-the-hood" in c++ and are standard language features.

whereas you can see the user-defined macro definition of "b" at the top of the file. you can't blame the c language for someone choosing to write something like that. sure it's possible, but its your choice and responsibility if you do stupid things like this example.

gpderetta · on May 23, 2024

Macros are also standard C features, and good luck figuring out that an identifier is a macro without IDE help when the definition is buried in some header.

fargle · on May 23, 2024

what you say is partially true (you can also of course use -E to check macros) but:

- macros are also standard C++ features too, so this point doesn't differentiate between those languages

- i'm failing to adequately communicate my point. there's a fundamental difference practically and philosophically between macro stupidity and C++ doing things under-the-hood. of course a user (you, a co-developer, a library author you trusted) can do all sorts of stupid things. but it's visible and it's written in the target language - not hard-coded in the compiler.

yes - sure, good luck finding the land-mine "b" macro if it was well buried. but you can find it and when you do find it, you can see what it was doing. you can #undef it. you can write your own version that isn't screwed up, etc.

you can do none of those things for operations in c++ that occur automatically - you can't even see them except in assembly.

gpderetta · on May 23, 2024

> there's a fundamental difference practically and philosophically between macro stupidity and C++ doing things under-the-hood. of course a user (you, a co-developer, a library author you trusted) can do all sorts of stupid things. but it's visible and it's written in the target language - not hard-coded in the compiler

I specifically reject this. Constructors, exceptions, and so on are as similarly visible at the source level as macro definitions.

And thanks to macros, signal handling, setjmp, instrumentation, hardening, dynamic .so resolution, compilers replacing what look like primitive accesses with library functions, any naïve read of C code, is, well, naïve.

I'm not claiming C++ superiority here [1], I'm trying to dispel the notion that C is qualitatively different from C++ form a WYSIWYG point of view, both theoretically and in practice.

[1]although as I mentioned else, other C++ features means that macros see less use.

fargle · on May 24, 2024

to be clear, i'm neither defending nor bashing either language. i use and like both as appropriate. and it's fine to disagree, btw. please do not read "good" or "bad" into my attempt to describe either.

but i will also emphatically reject your position: "Constructors, exceptions, and so on are as similarly visible at the source level as macro definitions"

no they are not. you can certainly see what the macro is doing - you see it's definition, not just it's existence. whereas in c++ you have to trust that language/compiler to:

- build a vtable (what exactly does this look like?)

- make copy ctors

- do exception handling.

- etc.

none of these are explicit. all of them are closed and opaque. you can't change their definition, nor add on to it.

at issue at hand is both "magic" and openness. c gives relatively few building blocks. they are simple (at least in concept). user libraries construct (or attempt to construct) more complex idioms using these building blocks. conversely c++ bakes complex features right into the language.

as you note, there are definitely forces that work against the naïve original nature of c. macros, setjmp, signal handling, instrumentation, hardening, .so resolution, compilers replacing primitive accesses, etc. but all of those apply equally to c and c++. they are also more an affect of the ABI and the platform/OS than either language. in short, those are complaints and complexities due to UNIX, POSIX, and other similar derived systems, not c or c++ the language itself.

c has relatively few abstractions: macros, functions, structured control flow, expressions, type definitions. all of these could be transformed into machine code by hand, for example in a toy implementation. sure a "good" compiler and optimizer will then mangle that into something potentially unrecognizable, but it will still nearly always work the way that the naïve understanding would. that's why when compilers do "weird" things with UB, it gets people riled up. it's NOT what we expect from c.

c++ on the other hand has, in the language itself, many more abstractions and they are all more complex. you aren't anywhere near the machine anymore and you must trust the language definition to understand what the end effect will be. how it accomplishes that? not your problem. this makes it squarely a high-level language, no different than java or python in that facet.

i explicitly reject your position that "that C is qualitatively [not] different from C++ from a WYSIWYG point of view, [either] theoretically [or] in practice."

to me, it absolutely is. it represents at lower level interface with the system and machine. c is somewhere between a high-level assembler and a mid-level language. c++ is truly high-level language. yes, compilers and os's come around and make things a little more interesting than the naïve view of c in rare cases . but c++? everything is complex - there is not even workable illusion of simplicity. to me this is unfortunate because, c++ is still burdened by visible verbosity, complexities, land-mines, and limitations due to the fact that it is probably not quite high-level enough.

this is all very long winded. you and many other readers might think i'm wrong. the reason i'm responding is not to be argumentative, but because it is that it's by no means a "settled" question and there are certainly also plenty of people that see it a very different way. which i think is fine.

theeandthy · on May 23, 2024

Agreed 100%. C is what it is and that’s a good thing.

However, if I were to request a feature to the core language it would be: NAMESPACES. This would clean up the code significantly without introducing confusing code paradigms.

hgs3 · on May 23, 2024

Namespaces are nice, but to my knowledge require name mangling which isn't a thing in C. I'm curious what you mean by "clean up the code significantly" and "confusing code paradigms" because in C you typically prefix your functions to prevent name collisions which isn't confusing or too noisy in my subjective opinion.

pjmlp · on May 23, 2024

Name mangling is an implementation detail to fit into UNIX linker design space, not the same approach as other compiled languages with modules, with their own linker.

gpderetta · on May 23, 2024

Also name mangling (which in this case would simply be appending the namespace name to the identifier) would be trivially implementable in C.

In fact on some targets the assembler name of identifiers doesn't always match the C name already.

Although as someone almost always explicitly qualifies names, typing foo_bar is not very different from foo::bar; the only minor advantages are that you do not have to use foo:: inside the implementation of foo itself and the ability to use aliases.

planede · on May 23, 2024

> which in this case would simply be appending the namespace name to the identifier

surely not. How do you differentiate these two functions?

  void fooN(void);
  
  namespace N { void foo(void); }

gpderetta · on May 23, 2024

[I meant to write prepend, but that doesn't change the argument]

You would mangle it as something like foo$N depending on the platform.

theeandthy · on May 23, 2024

Yeah you’re right. I guess folks who want C++ stuff should just use C++…

I guess I should have reworded. I don’t expect that feature in C, but if I were to reinvent C today I would keep it the same but add namespace and mangling.

Adding an explicit prefix to every function call is a lot boilerplate when it’s all added up.

riku_iki · on May 23, 2024

> a simple statement like `a = b++;` can mean multiple constructors being called, hidden allocations, unexpected exceptions, unclear object hierarchies, an overloaded `++`, an overloaded `=`, etc.

its just mean if you need that logic, in C you would write lots of verbose less safe code.

tsegratis · on May 23, 2024

wishlist

1) labels as values in standard 2) control over memory position offsets, without linker script

other than that a few more compiler implementations offering things like checked array bounds, and a focus on correctness rather than accepting the occasional compiler bug

the rough edges like switch fallthrough are rough, but easy to work around. They don't need fixing (-pedantic fixes it already, etc)

maybe more control over assembly generation, such as exposing compilation at runtime; but that is into the wishful end of wishlists

pjmlp · on May 23, 2024

Only if you mean C as defined by K&R C, and its original use when porting UNIX.