In fact I find double-free safer because it usually crashes (and in my code I do checks so it almost certainly crashes), while in C# I can happily use such object without knowing it. But as I said, it depends on specific use case.
> In fact I find double-free safer because it usually crashes (and in my code I do checks so it almost certainly crashes)
You don't know what an undefined behavior is, do you ? You cannot be sure it crashes since the compiler is allowed to do anything with the assumption it doesn't happen. It's absolutely legit for the compiler to remove all the code you added to check a double-free didn't happen because it is assuming that's dead code.
See this post[1] from the LLVM blog which explains why you can't expect anything when you're triggering an UB.
I know very well what UB is and I bet there is not a single big program which does not have undefined behaviour. I even rely on UB sometimes, because with well defined set of compilers and systems, it's in reality well defined.
I was talking in general about "unsafe" languages. I use c++ in my projects and use custom allocators everywhere, so there is no problem with UB there. The custom allocators also do the checking against double-free.
What do you mean by checking against double-free? Either you pay a high runtime cost, or use unconventional (and somehow impractical in C++) means (e.g. fancy pointers everywhere with a serial number in them), or you can't check reliably. Standard allocators just don't check reliably, and thus do not provide sufficient safety.
Anyway, double-free was only an example. The point is that a language can, or not, provide safety by itself. Not just allow you to create you own enriched subset that is safer than the base language (because you often are interested in safety of 3rd party components not written in your dialect and local idioms of the language)
In the case of C and C++, they are full of UB, and in the general case UB means you are dead. I find that extremely unfortunate, but this is the reality I have to deal with, so I don't pretend it does not exist...
> What do you mean by checking against double-free?
I pay small runtime cost for the check by having guard values around every allocation. At first I wanted to enable it only in debug builds, but I am too lazy to disable it in release builds, so it's there too. Anyway the overhead is small and I do not allocate often during runtime.
> Anyway, double-free was only an example. The point is that a language can, or not, provide safety by itself.
I can write safe code in modern C++ (and probably in C) and I can write unsafe code in e.g. Rust, only difference is which mode is default for the language. On the other hand I have to be prepared to pay the performance (or other) price for safe code.
> In the case of C and C++, they are full of UB, and in the general case UB means you are dead.
I doubt there is a big C or C++ program without UB, does that mean they are all dead? I do not think so.
> I find that extremely unfortunate, but this is the reality I have to deal with, so I don't pretend it does not exist...
I do not like UB in C++ too, but mostly because it does not make sense on platforms I use. On the other hand I can understand that the language can not make such platform-specific assumptions. I can pretend UB does not exist with some restrictions. UB in reality does not mean that the compiler randomly do whatever he wants, it do whatever he wants but consistently. But as I said it twice, it depends on use case. Am I writing for SpaceX or some medical instruments? Probably not a good idead to ignore UB. Am I making writing a new Unreal Engine? Probably not a good idea to worry much about UB, since I would never finish.
> UB in reality does not mean that the compiler randomly do whatever he wants, it do whatever he wants but consistently.
There is nothing consistently consistent about UB. The exact same compiler version can one day transform one particular UB to something, the other day to something else because you changed an unrelated line of code 10 lines under or above, and the day after tomorrow if you change your compiler version or even just any compile option, you get yet another result even when your source code did not changed at all.
EDIT: and I certainly do find extremely unfortunate that compiler authors are choosing to do that to us poor programmers, and that they mostly dropped the other saner interpretation expressively allowed by the standard and practiced by "everybody" 10 years ago; that UB can also be for non portable but well-defined constructs. But, well, compiler authors did that, so let's live with it now.
> Yet, for years I am memmove-ing objects which should not be memmoved. Or using unions the way they should not be used.
There can be two cases:
A. you rely on additional guarantee of one (or several) of the language implementation you are using (ex: gcc, clang). Each compiler usually has some. They are explicitly documented, otherwise they do not exist.
B. you rely on undocumented internal details of your compiler implementation, that are subject to change at any time, and just have happened to not have changed for several years.
> Do you have any example?
I'm not sure that compiler did "far" (not just intra-basic-block instruction scheduling) time-traveling constraint propagation on UB 10 or 15 years ago. For sure, some of them do now. This means you should better use fno-delete-null-pointer-checks and all its friends, because that might very well save you completely in practice from some technically UB but not well known by your ordinary programmer colleague - so likely to appear in lots of non-trivial code bases.
Simpler example: behavior of signed integer overflow. (Very?) old compilers simply translated to the most natural thing the target ISA did, so in practice you got 2s complement behavior in tons of cases and tons of programs started to rely on that. You just can't rely on that so widely today without special care.
More concerning is the specification of << and >> operators. On virtually all platforms they should map to shifting instructions that interpret unsigned int a << 32 as either 0 or a (and same thing for a>>0), and so regardless of the behavior (a<<b) | (a>>(32-b)) should do a ROL op. Unfortunately, mainly because some processors do one behavior and others do the other one (for a single shift), the standard specified it as UB. Now in the standard spirit, UB can be the sign something that is non-portable but perfectly well-defined. Unfortunately now that compiler authors have collectively all "lost" (or voluntarily burned) that memo, and are actively trying to trap other programmers and kill all their users, either it is already handled as all other UB in their logic (=> nasal daemons) or it is only an event waiting to happen...
Maybe a last example: out-of-bound object access was expected to reach whatever piece of memory is at the position of the intuitively computed address, in the classical C age. This is not the case anymore. Out-of-bound object access now carry the risk of nasal-daemons invocation, regardless of what you know about your hardware.
Other modern features of compilers also have an impact. People used to assume all kind of safe properties at TU boundaries. Those where never specified in the standard, and they have been dropped through the window with WPO. It is likely that some code-bases have "become" incorrect (become even in practice, given they always have been in theory with the most risky interpretations of the standard, that compiler authors are now unfortunately using)
> Do you mean instead of signed integer overflow being UB it should be defined as 2 complement or something like that?
Maybe (or at least implementation specified). I could be mistaking, but I do not expect even 50% of C/C++ programmers knowing that signed overflow is UB, and what it means precisely on modern implementations. I would even be positively surprised if 20% of them know about that.
And before anybody through them at me:
* I'm not buying the performance argument at least for C, because the original intent of UB certainly was not to be yielded this way, but merely to specify the lowest common denominator of various processors -- its insanely idiotic to not be able to express a ROL today because of that turn of events and the modern brain-fucked interpretation of compiler authors -- and more importantly because I happen to know how modern processors work, and I do not expect stronger and safer guarantees to notably slow down anything)
* I'm not buying the "specify the language you want yourself or shut up" argument either, for two at least reasons:
- I also have an opinion about safety features in other aspects of my life, yet I'm not an expert in those area (e.g. seat belt). I am an expert in CS/C/C++ programming/System Programming/etc... and I'm a huge user of compilers, in some case in areas where it can have an impact on people health. Given that perspective, I think any argument to just specify my own language or write my own compiler would just be plain stupid. I expect people actually doing that for a living (or as a main voluntary contributor, etc..) to use their brain and think of the risks they impose on everybody with their idiotic interpretations, because regardless of they want it or I want it or not, C and C++ will continue to be used in critical systems.
- The C spec is actually kind of fine, although now that compiler author have proven they can't be trusted with it, I admit it should be fixed at the source. But would have them be more reasonable, the C spec would have been continued to be interpreted like in the classical days, and most UB would merely have been implementation defined or "quasi-implementation defined" (in some cases by defining all kind of details like a typical linear memory map, crashing the program in case of access to unmapped are, etc...) in the sense you are thinking of (mostly deterministic -- at least way more than it unfortunately is today). The current C spec do allow that and my argument would be that doing otherwise (except if the performance price is extremely highly unbearable, but the classical implementations have proven it is not!). So I don't even need to write an other less dangerous spec, they should just stop to write dangerous compilers...
It can crash right away, in a few seconds, minutes, hours later, or never and just keep generating corrupt data.
Having a reference that the GC doesn't collect doesn't lead to memory corruption, just more being used than it should be.