I have no problem with Curl being written in C (I'll take battle-tested C over experimental Rust) but this point seemed odd to me:
>C is not the primary reason for our past vulnerabilities
>There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.
#61 -> uninitialized random : libcurl's (new) internal function that returns a good 32bit random value was implemented poorly and overwrote the pointer instead of writing the value into the buffer the pointer pointed to.
#60 -> printf floating point buffer overflow
#57 -> cookie injection for other servers : The issue pertains to the function that loads cookies into memory, which reads the specified file into a fixed-size buffer in a line-by-line manner using the fgets() function. If an invocation of fgets() cannot read the whole line into the destination buffer due to it being too small, it truncates the output
This one is arguably not really a failure of C itself, but I'd argue that Rust encourages a more robust error handling through its Options and Results when C tends to abuse "-1" and NULL return types that need careful checking and can't usually be enforced by the compiler.
#55 -> OOB write via unchecked multiplication
Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.
#54 -> Double free in curl_maprintf
#53 -> Double free in krb5 code
#52 -> glob parser write/read out of bound
And I'll stop here, so far 7 out of 11 vulnerabilities would probably have been avoided with a safer language. Looks like the vast majority of these issues wouldn't have been possible in safe Rust.
> Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.
That bug is only triggered by an unrealistic corner case (username longer than 512MB). Run-time check in debug build won't help unless you realize the possibility before hand and put a unit test for that. I am more interested in the second part of your comment: "the OOB wouldn't be possible". How does rust protect against such integer overflow caused by multiplication? Thanks in advance.
"That bug is only triggered by an unrealistic corner case (username longer than 512MB)."
What makes you call that an "unrealistic" case? You're probably imagining some sort of world where a randomly distributed set of usernames are sent to this function drawn from the distribution of "real usernames that people use". Since 0 of them are longer than 512MB, you intuitively assume a 0 probability of exploit.
But in the security world, that's the wrong distribution. You have to assume a hostile, intelligent adversary. It isn't hard at all to construct a case of some web service allowing you to specify remote resources accessible with a username & password of your choosing (not corresponding to the username you're using to log in), an attacker specifying one of their own resources, reading the incoming headers to notice that you're using a vulnerable version of libcurl, and stuffing 512MB+exploit into the username field of your web app. If you don't add any other size restrictions between the attacker and the libcurl invocation, they may well pass it right in. (And your same intuition will lead you to not put any size restrictions there; "why would anybody have a multi-megabyte username?" You won't even have thought the question explicitly, you just won't put a length restriction on the field because it'll never even cross your mind.) By penetration standards, that doesn't even rise to the level of a challenge.
I think you missed the point of the post you are responding to. That person understands the value of the bug, and is just pointing out that Rust runs with unchecked math in production and would therefore be just as vulnerable in production. The benefit would come from it running with checked math in debug mode, but you, as the parent posted noted, would have to have a unit test or an integration test that realized this vulnerability to begin with and tried to exploit it.
The use of "unrealistic" was meant in the sense that you wouldn't think of it, not that you shouldn't guard against it once known.
I think there's two components here: yes, this might lead to a _logic bug_, but it should never lead to a _memory safety_ bug. That is, to get that CVE, you need both, and Rust will still protect you from one.
In Rust overflows generally will cause a panic. So "2 + 2" will return 4 or panic if you're on a 2-bit system (they provide .saturating_add() and .wrapping_add() to get code that will never panic)
Thus, you could have caused a denial of service by crashing a rust-based Curl, but the crash would have been modeled and just uncaught.
1. Rust specifies that if overflow happens, it is a "program error", but it is well-defined as two's compliment overflow.
2. Rust specifies that in debug builds, overflow must be checked, and panic if it happens.
In the future, if overflow checking ever has acceptable overhead, this allows us to say that it must always be checked. But for now, you will get a well-formed result.
I'm not a Rust user (yet), but I'm a little surprised that with its emphasis on safety Rust doesn't maintain all debug safety checks in production by default. You could then do some profiling and turn off only those few (or one or none) that actually turned out to provide enough real-world, provable benefit to be worth turning off in this specific piece of code.
Since you would turn them off one at a time explicitly, rather than having a whole set of them disappear implicitly, you would probably also tend to have a policy of requiring a special test suite to really push the limits of any specific safety issue before you would allow yourself to turn that one off.
Obviously, if this occurred to me at first glance, it occurred to the designers, who decided to do it the other way after careful consideration, so I'm just asking why.
Basically, overflow doesn't lead to memory safety in isolation. That's the key of it. The worst that can happen is a logic errors, and we are not trying to stop all of those with the compiler :) Justifying a 20%-100% speed hit (that was cited in the thread) for every integer operation to save something that can't introduce memory safety is a cost we can't afford to pay.
EDIT: oh, one more thing that may have significance you may or may not have picked up: one reason why under/overflow in C and C++ is dangerous is that certain kinds are undefined behavior. It's well-defined in Rust. Just doing that alone helps, since the optimizer isn't gonna run off and do something unexpected.
I think you can opt-in the checked arithmetic in release mode if you can stomach the performance cost.
Anyway, the buffer overflow itself is:
>If this happens, an undersized output buffer will be allocated, but the full result will be written, thus causing the memory behind the output buffer to be overwritten.
In Rust's case the output buffer would be returned into a dynamically sized type, probably a Vec<>. Attempts to put more data in the Vec than it can hold would either cause a runtime error or cause the Vec<> to reallocate (which could cause a performance issue, or maybe even a DoS if you could cause the system to allocate GBs of memory, but it wouldn't allow access to invalid memory).
So maybe something like:
/* XXX potential multiplication overflow */
let mut buf = Vec::with_capacity(insize * 4 / 3 + 4);
while let Some(input) = get_input() {
// Reallocs when capacity is exceeded
buf.extend_from_slice(input);
}
So even if the multiplication overflow is not caught it's by design impossible to have a buffer overrun in Rust.
By default rust doesn't allow out of bounds indexing (indexing and iteration are checked, although bounds checks are often optimized out), you have to explicitly write unsafe code to read off the end of an array or vector.
Can you elaborate on bounds checks being optimized out? Is it only in certain situations where compiler can prove that they are completely unnecessary? Otherwise, they can't be relied on, so what's the point of having them at all?
I think you are being too generous. All of those 11 vulnerabilities were caused by the language, its lack of memory safety, limited expressiveness, poor abstractions it encourages, etc.
That's not completely fair. It's true that there are some issues I've not counted that could be caused by the "limited expressiveness" of C. Notably the latest vulnerability, #62, which is caused by bogus error checking and would probably not have occurred in idiomatic Rust and could easily be blamed on C's terrible error handling ergonomics (or lack thereof).
However for others it's not immediately obvious how a safer language would've helped, for instance #59: Win CE schannel cert wildcard matches too much
This is clearly a logical error due to a badly implemented standard. There's no silvet bullet here.
The patch that they put out for that issue combines multiple topics into one.
* A preprocessor #if statement is altered with defined(USE_SCHANNEL)&&defined(_WIN32_WCE) which is OR-ed with the other conditions, so that code that was previously not compiled on WinCE is now potentially defined.
* A local buffer in the code is increased from 128 to 256 characters. A comment in the patch refers to a "buffer overread". So there is a C issue in here!
* In a call that appears to be a Microsoft API function CertGetNameString, a flag argument that was zero is now specified as CERT_NAME_DISABLE_IE4_UTF8_FLAG. Unless I don't understand something in the patch comment, it doesn't appear to be remarking on this at all.
* Code that was taking on the responsibility for doing some matching logic is replaced by something that appears to be using proper API's within curl (Curl_cert_hostcheck).
Why didn't the programmer know about the existing function? One possibility is that it didn't exist yet at the time that code was written. Other such ad hoc matching code may have been refactored to use the matching function; this wasn't found. Or maybe the function did exist, but wasn't well documented. A review process isn't in place that would allow someone to raise a red flag "this should be calling Curl_cert_hostcheck and not itself using string matching at all, let alone be checking for a * wildcard and incrementing over it."
How a safer language helps with the non-obvious errors is:
* makes code smaller with less distracting repeated boiler-plate, so things like that stand out more.
* programmers waste less time on fighting language-related ergonomic pains, and so more of their attention is available to spot these errors.
If your language is such that you shout "hooray" and pat yourself on the shoulder when it compiles and the code passes the address sanitizer and Valgrind and whatnot, then actual functional problems will slip under the rug.
Permit me, also, to indulge in some argumentum ad lingua obscura: could we spot in a Brain##### program that some certificate wildcard matches too much? :)
It's easy to see the advantages of memory safety, since it eliminates an entire class of bugs that C programs tend to suffer from, but I think talking about language "ergonomics" is much more tenuous.
One thing that C programmers tend to love about C is that it's simple. C doesn't have that many language constructs and that makes C programs easier to reason about, read and audit, which also makes it easier to find these non-obvious errors. A C programmer approaching an unfamiliar codebase can feel assured that it is made out of the same concise set of constructs as any other C program. On the other hand, modern languages like Rust, C# and Go are a lot more complicated. They have most of the language constructs of C, plus some extra, so not only do you have to understand the concepts in C, you also have to understand things like ownership, variance or pattern matching. Every new feature increases cognitive burden on the programmer. Adding something like exception handling, for example, means that control flow suddenly becomes more complex. Now, instead of only leaving at return statements, a function has a potential exit point anywhere it calls another function. "Smart" features that encourage terse code and reduce boilerplate can also result in code that is totally obtuse to anyone other than the person who wrote it.
I'm not saying that C is at a sweet-spot for language complexity (Brainfuck is probably too simple, but OTOH it only has eight symbols!) I'm just saying that it's important to understand why a lot of developers really like C, and C is often praised for its simplicity. Any language that intends to replace C needs to understand why C programmers use C.
I'd argue that talking about C's simplicity requires one big caveat... it gains that simplicity by delegating a lot of essential complexity to the underlying platform and the spec leaves a lot of details up to the compiler writers.
While I'm sure experienced C developers will be used to that, one cannot simply read The C Programming Language, set a language reference on their desk, and safely learn how C will behave through experience because of all the unspecified or counter-intuitive things which often don't even trigger compiler warnings with -Weverything.
While Rust may have more concepts to grasp and more grammar to remember, I find it far less of a mental burden to code in for the same reason that OOP proponents trumpet encapsulation... C forces me to constantly double-check that I haven't forgotten some detail of the language or GCC's implementation which is sensible if you understand the low-level implementation, but completely counter to my intuition. Rust allows me to audit the heck out of modules containing unsafe code to make sure the unsafety can't escape, then set it aside to think about another piece of the logic.
Rust also places a stronger emphasis on making it possible to reason locally, bolstered by things like hygienic macros.
Finally, simplicity does not automatically make something intuitive. (That's something I hear quite commonly among HCI people bemoaning how Apple has brilliantly used simplicity to convince iOS users that, if they get stumped by the UI, it's their own fault, not a failing on Apple's part.)
> A C programmer approaching an unfamiliar codebase can feel assured that it is made out of the same concise set of constructs as any other C program.
Plus a shit load of user defined macros to make it look like a modern language.
Every major c repo will have their own macros for foreach, cleanup on exit, logerrorandreturn, etc. Another example is extensive use of attributes like _cleanup_ from gcc to simulate raii, not plain c89 at all.
In the future, there will be no bugs because the languages will be so expressive that they cannot be expressed. The only mistakes a programmer can make will be choosingthe wrong language.
> Of course that leaves a share of problems that could’ve been avoided if we used another language. Buffer overflows, double frees and out of boundary reads etc, but the bulk of our security problems has not happened due to curl being written in C.
He addressed all of those points in the second short paragraph. None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language. Avoidance of problems in a safer language doesn't mean when things happen, it's the language's fault.
> None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language.
The point of type system features e.g. Option types instead of nulls, or linear/affine types avoiding use-after-free is to make programmer mistakes turn into compiler errors. Nothing more. There is no point in talking about whether or not something is the "languages fault" or not. We know C is like juggling knives. It's a tool. It has drawbacks and benefits. Being widespread and fast are the benefits. Not turning many forms of programmer errors into compiler errors is the drawback. That means the programmer can't make mistakes because they will be shipped. But programmers invariably make mistakes.
The grandparent argued that for the sample of issues he looked at, a lot would in fact be avoided by the type system of e.g. Rust - contradicting the argument in the blog post (could be because of the small sample though).
I think it's perhaps less important to focus on the number of issues of each kind, and instead look at the severity of them. If the kinds of issues avoided by better type systems are typically trivial issues, but the kind of issues coming from logic errors are severe security issues - then perhaps the case for stronger type systems isn't so strong after all. But I doubt that's the case.
Discipline is precisely the thing which shouldn't be required to avoid disaster, if a language is to be well-designed (this is a feature of good design in general).
> He addressed all of those points in the second short paragraph. None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language.
They are absolutely the fault of the language, given other languages would have made these bugs impossible.
Errare humanum est is hardly a new concept, blaming humans for not being computers is inane, and in fact qualifies for in errare perseverare diabolicum as out of misplaced pride you persevere in the core original error of using C.
I agree with the gist of your comment but I think it's a bit harsh to say that using C for curl was an error. If I can trust wikipedia the original release of the library was in 1997, at the time it was a perfectly reasonable choice IMO.
I doubt the library would have reached its current adoption levels if it had been written in any other language (and I presume an other C library would've taken its place).
You're speaking past each other. You're talking about "blame" in a moralistic sense, but the other posters just mean simple causation -- "would this fault have occurred if a safer language was used?"
The former sense would be useful... if you want to sue someone or something I guess? But the latter is more useful in real life, so that's what we're discussing
Maybe, maybe not: from Wikipedia the first official validation of GNAT (the Free Ada compiler) was done in 1995.
So there were already safer languages available.
> and contradict the statement regarding zero policy static analysis errors.
Not necessarily - I've found that static analysis tools (Coverity etc.) have limitations, and I'd expect them to find less than half of these kind of bugs - serious fuzzing tends to find more.
The static analysis tool has to work with the type system of the language, and C's type system isn't particularly helpful, so the tool has to find a balance between flagging almost every pointer dereference as "potentially a problem", most of them being false positives, and flagging only the small percentage of actual problems that can be unambiguously proven to be problems with the context available inside a single function definition.
(Just stating generalities, haven't looked at the fixes for these bugs.)
Double free is not necessary a C issue, but it can be also a program logic issue - I expect to have an object, but it's already deleted. So it's one of
1. object should not exist and the second free is incorrect
2. object should exist and the first free is incorrect
3. object existance is uncertain and the second free must somehow check that
Although I can double-free only in unsafe languages, the wrong logic behind it can be the same in safe languages. It just have different consequences.
Of course it is a C issue, in the same sense that ALL logic errors are in the case you describe -- so either C issues do not exist or you did not manage to find the correct definition: for ex OOB access is caused by faulty program logic, and the consequences are dramatic in unsafe languages. That is an issue of the language, despite you being able to compute OOB indexes in any language. Same thing for double free; the language issue is that the result is catastrophic, not that you can write for ex faulty logic attempting a liberation too early, or an extra one. (Because in safe languages, the result is not as catastrophic as in unsafe languages).
1. object should not exist and the second free is incorrect
In C# I can have to variables pointing to the same object, I null only one of them. The second should be nulled too, but it's not. That's a logic error. So in C# I end up with some object that should not be there, but it is. Which is better - doube-free or undestroyed object - depends on use case.
In fact I find double-free safer because it usually crashes (and in my code I do checks so it almost certainly crashes), while in C# I can happily use such object without knowing it. But as I said, it depends on specific use case.
> In fact I find double-free safer because it usually crashes (and in my code I do checks so it almost certainly crashes)
You don't know what an undefined behavior is, do you ? You cannot be sure it crashes since the compiler is allowed to do anything with the assumption it doesn't happen. It's absolutely legit for the compiler to remove all the code you added to check a double-free didn't happen because it is assuming that's dead code.
See this post[1] from the LLVM blog which explains why you can't expect anything when you're triggering an UB.
I know very well what UB is and I bet there is not a single big program which does not have undefined behaviour. I even rely on UB sometimes, because with well defined set of compilers and systems, it's in reality well defined.
I was talking in general about "unsafe" languages. I use c++ in my projects and use custom allocators everywhere, so there is no problem with UB there. The custom allocators also do the checking against double-free.
What do you mean by checking against double-free? Either you pay a high runtime cost, or use unconventional (and somehow impractical in C++) means (e.g. fancy pointers everywhere with a serial number in them), or you can't check reliably. Standard allocators just don't check reliably, and thus do not provide sufficient safety.
Anyway, double-free was only an example. The point is that a language can, or not, provide safety by itself. Not just allow you to create you own enriched subset that is safer than the base language (because you often are interested in safety of 3rd party components not written in your dialect and local idioms of the language)
In the case of C and C++, they are full of UB, and in the general case UB means you are dead. I find that extremely unfortunate, but this is the reality I have to deal with, so I don't pretend it does not exist...
> What do you mean by checking against double-free?
I pay small runtime cost for the check by having guard values around every allocation. At first I wanted to enable it only in debug builds, but I am too lazy to disable it in release builds, so it's there too. Anyway the overhead is small and I do not allocate often during runtime.
> Anyway, double-free was only an example. The point is that a language can, or not, provide safety by itself.
I can write safe code in modern C++ (and probably in C) and I can write unsafe code in e.g. Rust, only difference is which mode is default for the language. On the other hand I have to be prepared to pay the performance (or other) price for safe code.
> In the case of C and C++, they are full of UB, and in the general case UB means you are dead.
I doubt there is a big C or C++ program without UB, does that mean they are all dead? I do not think so.
> I find that extremely unfortunate, but this is the reality I have to deal with, so I don't pretend it does not exist...
I do not like UB in C++ too, but mostly because it does not make sense on platforms I use. On the other hand I can understand that the language can not make such platform-specific assumptions. I can pretend UB does not exist with some restrictions. UB in reality does not mean that the compiler randomly do whatever he wants, it do whatever he wants but consistently. But as I said it twice, it depends on use case. Am I writing for SpaceX or some medical instruments? Probably not a good idead to ignore UB. Am I making writing a new Unreal Engine? Probably not a good idea to worry much about UB, since I would never finish.
> UB in reality does not mean that the compiler randomly do whatever he wants, it do whatever he wants but consistently.
There is nothing consistently consistent about UB. The exact same compiler version can one day transform one particular UB to something, the other day to something else because you changed an unrelated line of code 10 lines under or above, and the day after tomorrow if you change your compiler version or even just any compile option, you get yet another result even when your source code did not changed at all.
EDIT: and I certainly do find extremely unfortunate that compiler authors are choosing to do that to us poor programmers, and that they mostly dropped the other saner interpretation expressively allowed by the standard and practiced by "everybody" 10 years ago; that UB can also be for non portable but well-defined constructs. But, well, compiler authors did that, so let's live with it now.
> Yet, for years I am memmove-ing objects which should not be memmoved. Or using unions the way they should not be used.
There can be two cases:
A. you rely on additional guarantee of one (or several) of the language implementation you are using (ex: gcc, clang). Each compiler usually has some. They are explicitly documented, otherwise they do not exist.
B. you rely on undocumented internal details of your compiler implementation, that are subject to change at any time, and just have happened to not have changed for several years.
> Do you have any example?
I'm not sure that compiler did "far" (not just intra-basic-block instruction scheduling) time-traveling constraint propagation on UB 10 or 15 years ago. For sure, some of them do now. This means you should better use fno-delete-null-pointer-checks and all its friends, because that might very well save you completely in practice from some technically UB but not well known by your ordinary programmer colleague - so likely to appear in lots of non-trivial code bases.
Simpler example: behavior of signed integer overflow. (Very?) old compilers simply translated to the most natural thing the target ISA did, so in practice you got 2s complement behavior in tons of cases and tons of programs started to rely on that. You just can't rely on that so widely today without special care.
More concerning is the specification of << and >> operators. On virtually all platforms they should map to shifting instructions that interpret unsigned int a << 32 as either 0 or a (and same thing for a>>0), and so regardless of the behavior (a<<b) | (a>>(32-b)) should do a ROL op. Unfortunately, mainly because some processors do one behavior and others do the other one (for a single shift), the standard specified it as UB. Now in the standard spirit, UB can be the sign something that is non-portable but perfectly well-defined. Unfortunately now that compiler authors have collectively all "lost" (or voluntarily burned) that memo, and are actively trying to trap other programmers and kill all their users, either it is already handled as all other UB in their logic (=> nasal daemons) or it is only an event waiting to happen...
Maybe a last example: out-of-bound object access was expected to reach whatever piece of memory is at the position of the intuitively computed address, in the classical C age. This is not the case anymore. Out-of-bound object access now carry the risk of nasal-daemons invocation, regardless of what you know about your hardware.
Other modern features of compilers also have an impact. People used to assume all kind of safe properties at TU boundaries. Those where never specified in the standard, and they have been dropped through the window with WPO. It is likely that some code-bases have "become" incorrect (become even in practice, given they always have been in theory with the most risky interpretations of the standard, that compiler authors are now unfortunately using)
> Do you mean instead of signed integer overflow being UB it should be defined as 2 complement or something like that?
Maybe (or at least implementation specified). I could be mistaking, but I do not expect even 50% of C/C++ programmers knowing that signed overflow is UB, and what it means precisely on modern implementations. I would even be positively surprised if 20% of them know about that.
And before anybody through them at me:
* I'm not buying the performance argument at least for C, because the original intent of UB certainly was not to be yielded this way, but merely to specify the lowest common denominator of various processors -- its insanely idiotic to not be able to express a ROL today because of that turn of events and the modern brain-fucked interpretation of compiler authors -- and more importantly because I happen to know how modern processors work, and I do not expect stronger and safer guarantees to notably slow down anything)
* I'm not buying the "specify the language you want yourself or shut up" argument either, for two at least reasons:
- I also have an opinion about safety features in other aspects of my life, yet I'm not an expert in those area (e.g. seat belt). I am an expert in CS/C/C++ programming/System Programming/etc... and I'm a huge user of compilers, in some case in areas where it can have an impact on people health. Given that perspective, I think any argument to just specify my own language or write my own compiler would just be plain stupid. I expect people actually doing that for a living (or as a main voluntary contributor, etc..) to use their brain and think of the risks they impose on everybody with their idiotic interpretations, because regardless of they want it or I want it or not, C and C++ will continue to be used in critical systems.
- The C spec is actually kind of fine, although now that compiler author have proven they can't be trusted with it, I admit it should be fixed at the source. But would have them be more reasonable, the C spec would have been continued to be interpreted like in the classical days, and most UB would merely have been implementation defined or "quasi-implementation defined" (in some cases by defining all kind of details like a typical linear memory map, crashing the program in case of access to unmapped are, etc...) in the sense you are thinking of (mostly deterministic -- at least way more than it unfortunately is today). The current C spec do allow that and my argument would be that doing otherwise (except if the performance price is extremely highly unbearable, but the classical implementations have proven it is not!). So I don't even need to write an other less dangerous spec, they should just stop to write dangerous compilers...
I don't want to jump on the "Rust fixes everything" train, but lifetimes, scope-based destructors and reference counted pointers seem like they help with this sort of thing beyond just preventing literal accesses of freed memory; they can make sure objects are around when you need them and that they're destroyed automatically when you don't anymore.
Of course, you don't get it all for free; you have to wrestle with mutability and lifetimes, which can get hairy.
Catching these kind of programmers mistakes is exactly what distinguishes a language as "safer". Be it because they are ruled out by the type system, memory system, or at least the run-time checks.
Yeah, even I only put that in there as a way to handle cases (like Rust) where there is a single implementation of the language, which makes "implementation bugs" and "language bugs" one and the same.
But you can't have your cake and eat it too. The language does not implement those checks because perhaps they're not so trivially implementable in the language.
Sure, rust is "safer" than C, but is it a better language? that's arguable.
> Sure, rust is "safer" than C, but is it a better language? that's arguable.
Obviously there are more aspects to a language than safety, so I'll give you that, but yes Rust is a better language than C in this aspect.
Don't forget that programming languages are meant for humans, not for computers. One of the primary goals of a programming language is to prevent humans from making dumb mistakes.
" ... a programming language designer should be responsible for the mistakes that are made by the programmers
using the language.
[...]
It's very easy to persuade the customers of your language that everything that goes wrong is their fault and not yours. I rejected that..."
- Tony Hoar
I think this is a particularly solid point by Tony Hoare (made during his talk on 'the billion dollar mistake', null).
We find it very easy to blame developers for mistakes that really shouldn't have been possible to make ta all.
I think there needs to be a grain of salt with the billion dollar mistake thing. The really dangerous part isn't dereferencing the null so much as what happens afterwards. If you dereference a null and the program terminates, that sucks. But, you debug it and life moves on. If "something" happens that's a whole different much bigger issue. Rust doesn't have null but you can ".unwrap()" an option type. Which may terminate the program and force you to debug. If memory serves, Haskell doesn't enforce exhaustive patterns. So once again, the program terminates and you're stuck debugging. That's really just life.
Now, I do in general prefer languages "without" null as they generally have more warning about when you can actually encounter null. Though my personal experience says that in sensibly written programs dealing with null isn't that big a deal. Is it a billion dollar mistake? I would have little trouble believing that. However, that would likely make dynamic typing an order of magnitude or two larger in the mistake dept.
The quote wasn't really about null - that's just the talk he was giving. It was about checked array indexing.
I would also say there really is no comparison between null, which escapes typechecking, and unwrap, which does not. But that's not my point, nor is it Hoare's. It's that we blame people for problems that are better solved by languages.
On the one hand, Curl is a great piece of software with a better security record than most, the engineering choices it's made thus far have served it just fine, and its developers quite reasonably view rewriting it as risky and unnecessary.
On the other hand, the state of internet security is really terrible, and the only way it'll ever get fixed is if we somehow get to the point where writing networking code in a non-memory-safe language is considered professional malpractice. Because it should be; reliably not introducing memory corruption bugs without a compiler checking your work is a higher standard than programmers can realistically be held to, and in networking code such bugs often have immediate and dramatic security consequences. We need to somehow create a culture where serious programmers don't try to do this, the same way serious programmers don't write in BASIC or use tarball backups as version control. That so much existing high-profile networking software is written in C makes this a lot harder, because everyone thinks "well all those projects do it so it must be okay".
At extreme cases, where you want to prove your network stack, you'd have to write it in C (or something equivalently dangerous), because you can't prove big runtimes.
But in mid-way applications, dangerous languages are dangerous, and should be avoided.
I'd think the most "correct" way to handle this is to create small bits of network code with proof of correctness that handles down parsed and tagged (typed) data into code in high level languages. In that case, Curl type of code would still be in C.
They're making an oblique reference to Rust; it has the same level of runtime that C does.
Many people say "no runtime" to mean "small runtime like C or C++ have" rather than literally no runtime. This is due to the confusion around interpreters sometimes being called "runtimes."
I suspect it was something about compiled vs. interpreted.
The Rust runtime is larger than C's (I imagine it's not much larger than C++, but this is already way too large). AFAIK, nobody ever proved it works as designed. The fact that it's design is not very stable is also a problem.
Now, with some more thought about it, C itself is not a great target for correctness proofs. Too much is left unspecified, and there is little agreement over some of the lower level stuff. A simplified version of Rust might be a better target than C, if somebody ever creates it, but a lot of the safety would have to be left out of it.
IIRC Rust by default brings in a few things (panic handling is the main one, and also jemalloc) that make its runtime larger than C (but not larger than C++), but they can be disabled with compile flags. Many folks do, for embedded stuff.
Rust also lets you write a custom runtime by specifying a start function if you want.
> The fact that it's design is not very stable is also a problem.
> The Rust runtime is larger than C's (I imagine it's not much larger than C++, but this is already way too large).
I believe that it is smaller than C++'s but larger than C's, but I'm not a mega expert here. I used to know where those files were located in-tree, but don't anymore.
I think there's also several distinctions you can draw here; for example, strictly speaking, almost no runtime is required, but if you're using say, the standard library, you still might end up with a bunch of code. The actual requirements are very small though.
> AFAIK, nobody ever proved it works as designed. The fact that it's design is not very stable is also a problem.
Sorry, I don't understand what you mean here. The runtime?
> A simplified version of Rust might be a better target than C, if somebody ever creates it, but a lot of the safety would have to be left out of it.
I _think_ with the above comments, you mean the Rust langauge overall, not the runtime, right? So, Rust _is_ very stable; post 1.0 we have made extremely small breaking changes, corresponding to the way new C++ standards sometimes introduce minor technical backwards incompatibilities. As for correctness proofs, there are multiple academic institutions working on it; and they don't have to "leave the safety out"; that's what they're trying to prove!
https://github.com/rust-lang/rust/blob/1ca100d0428985f916eea... is the "runtime". sys::thread contains all the TLS stuff (C has TLS too, so it's unclear if this is any more of a runtime). catch_unwind is for toplevel panic handling (which can be nop'd with panic=abort).
Finally, we link to jemalloc, which is an additional bit of runtime of its own (which can be turned off with alloc_system)
With Rust this tradeoff of safe-with-runtime vs. unsafe-without-runtime is no longer your only choice. Now you can choose safe-with-strong-compiler-no-runtime...
As far as I am aware Ada is not memory safe without a runtime and with the presence of non-static memory. So you either have a runtime, no deallocation, or unsafety.
C also needs a runtime, it is called crt0 on UNIX and it is what calls main(), sets up global initialization functions (constructor functions in GCC C), emulates floating point,...
As for Ada, you can selectively disable which parts of the runtime actually land on the generated executable via compiler pragmas.
Deallocation can be done via pools or controlled types.
Yeah, I realized after posting that this would be a critique. I should have stipulated that it's a C syntactic language, which seems to be many peoples preference based on the most successful languages out there.
I never did much with Ada or Modula, so I won't claim to know anything more than their syntax, etc.
Rust still has no support for true formal analysis. It is therefore pointless to use it for high safety. Use a real theorem checker instead.
Now Rust as a target for compiling proofs into a programming language is pretty nice. Safe code can have no aliasing due to borrow checker (which should also be formally verified). Threading issues are vastly simplified, though fairness still has to be verified, as well as a few other properties.
> the only way it'll ever get fixed is if we somehow get to the point where writing networking code in a non-memory-safe language is considered professional malpractice.
So using Linux, Windows, BSD or MacOS servers are malpractice? I think you might have over stated your case. So are you waiting for a memory safe Herd re-write? A memory safe any OS will be decades away if someone wanted to start tackling it now.
There are lawsuits open about windows 10 destroying user data. Windows of all flavors has given me nothing but pain. Microsoft is not the benchmark of quality software. Most of their real innovations are in the business model space.
In what system can you sue when something some came at no cost, no warranty (express or implied) and in a good faith effort to help everyone at the personal expense of the developer?
If that system exists, it sounds disgusting and unethical. Paying for something changes the nature of the transaction.
And the last windows 7 update black screened my pc and left me in limbo till I was able to restore. What exactly is your point? Windows is not a standard to be held higher than..well anything.
The parent commenter used Windows 10 as an example and you used Windows...7. His point, which is that Microsoft actively researches and implements OS-level security improvements, hasn't been rebutted by your statement about Windows 7.
Windows 7 was released in 2009. There are substantial proactive improvements that Microsoft cannot feasibly backport to older OS versions even if they're still within the support period, and the corporate culture was simply different when Windows 7 was released.
Substantial proactive improvements not relevant to the same build|OS which comes full circle to the linux comment which is so hated. That is: anyone with source and knowledge can make it work and those without...blackscreen.
I think the vast majority of people would say Windows has been getting much better in security since the days of XP. People can choose Linux, BSD, MacOS or Windows and have a good choice that is reasonably secure.
Of course they're overstated their case. Needs differ. When I wrote networking code a couple jobs ago it was in assembler, because a new frame will be along in 67ns whether you're ready or not.
> So using Linux, Windows, BSD or MacOS servers are malpractice?
If you sell it as "secure", yes absolutely. And there are many applications where you would not be even considered as a contractor for doing so. There is increasing number of situations where you want to provide hard guarantees (written in contract) about quality of the software and service you are providing. "we are using Linux" sometimes is not good enough.
I get the feeling that very few old school authors would come forward and say "Hey, my project code is not good and needs to be re-written." Most of the time the message is more like "Sure we're using C but we're very careful and have built in a lot of safeguards."
I like your idea of a default mindset of internet tools should be written in a safe language with proper engineering techniques. I'm not sure I would go as far as malpractice but it might be a good stick to use to force makers into better practices.
First major security breech through buffer -overflow was in the late 80s.
So when Java came out, they played it "safe" - the language, despite having pointers will be absolutely safe. NullPointerExceptions and ArrayOutOfBoundsException will cause the program to crash rather than corrupting the stack.
Perfect.
Except it wasn't. It ended up being so "holy" that it's now banned in browsers.
So, everyone said to move to JS. Another "perfectly safe" language.
But it's too slow.
So JIT it.
Now it's no longer "perfectly safe".
Rinse and repeat.
And rust won't help here, because while the " compiler " can be guaranteed safe, the code it outputs can't (think of a C compiler written in Rust).
Maybe the solution isn't to rely on language (except for the Kernel) but to make of easy to spawn OS processes that simply have no rights to call any syscalls and limited amount of memory (or a white-listed amount of syscalls).
Take it like this:
Firefox (the browser) has full rights. It starts a process (which can only connect to the network to IP RemoteHost).
If process dies (for whatever reason) or takes too long, tell user that "sorry, sites broken".
Now, malicious code causes the attacker to run arbitrary code? Who cares? You can't overwrite the browser's code and can't break out.
The browser just has to ensure that its subprocess gives you good output.
You're conflating the issues of running untrusted code with running trusted code on untrusted input. JS is untrusted code that you run on your computer. Your other examples generally do not fall into this camp.
Java being an allegedly safe server-side language has nothing to do with it being a very difficult language to sandbox when running untrusted code. It doesn't matter what language the malware I just downloaded was written in, because it's malware and it's running on my computer. What makes Java different from C in this case is that when I run a Java application on my server, I can be more confident in the fact that I will not have a buffer overflow, even when I expose the service to the entire internet.
>You're conflating the issues of running untrusted code with running trusted code on untrusted input. JS is untrusted code that you run on your computer. Your other examples generally do not fall into this camp.
No. The issue is that in C (or "release (not debug) rust" (someone here left a comment that rust programs in release mode don't check for array overflows)) untrusted input leads to untrusted code (through buffer-overflow).
So you can compromise a computer through a bug in imagemagick.
>Java being an allegedly safe server-side language has nothing to do with it being a very difficult language to sandbox when running untrusted code
Java was (in the 90s) advertised as safe client-side, hence applets.
No, release Rust does not do this. That is, the integer may overflow, but the buffer will not, unless you specifically go out of your way to use the unsafe unchecked access functions.
That's actually how many apps on macOS have worked for years. Anything that ships in the Mac App Store runs in a sandbox with limited privileges. In fact, every instance of an Open/Save panel on macOS run in a separate process to the main application and relay their visual appearance using XPC and NSRemoteView. It's pretty cool stuff!
That's also the direction Linux is heading with Flatpak and Flatpak portals.
I'm excited because there will finally be a reliable way for me to apply my preferred Open/Save dialogs to KDE and GTK+ apps alike. (It's being implemented at the toolkit level and you can run an app in Flatpak with sandboxing set to permit-all to get the portal hooking.)
> Maybe the solution isn't to rely on language (except for the Kernel) but to make of easy to spawn OS processes that simply have no rights to call any syscalls and limited amount of memory (or a white-listed amount of syscalls).
We have that [1] [2]. It's cumbersome to use, but it's based on the bpf JIT used for tcddump etc.
We also have a variety of other mechanisms you can combine with that to tie down a process. But even then it's hard to get this right in a way that's both performant and secure, because you have to protect every user every time, while for an attacker a very low success rate can still be worth it.
This is ignoring the fact that sensitive data may actually be inside the (vulnerable, locked-down) process. Or checks inside the process may in some way be influenced (remember register_globals in PHP? This happens even when you execute PHP in a locked-down container).
A safe language greatly helps to prevent the sort of bug that compromises within-process security.
While this doesn't so much apply to libcurl (but see below), there is a third alternative to "write everything in C" or "write everything in <some other safer language>". That is: use a safer language to generate C code.
End users, even those compiling from source, will still only need a C compiler. Only developers need to install the safer language (even Curl developers must install valgrind to run the full tests).
Where can you use generated code?
- For non-C language bindings (this could apply to the Curl project, but libcurl is a bit unusual in that it doesn't include other bindings, they are supplied by third parties).
- To describe the API and generate header files, function prototypes, and wrappers.
- To enforce type checking on API parameters (eg. all the CURL_EASY_... options could be described in the generator and then that can be turned into some kind of type checking code).
- Any other time you want a single source of truth in your codebase.
Programmatically generating C code not without problems. How can you prove that the C you're generating is free from problems solved by the safer language? Cloudbleed came from computer generated C code: https://blog.cloudflare.com/incident-report-on-memory-leak-c....
See quote from the author of Ragel in the comments:
There is no mistake in ragel generated code. What happened was that you turned on EOF actions without appropriate testing. The original author most certainly never intended for that. He/She would have known it would require extensive testing. Legacy code needs to be tested heavily after changes. It should have been left alone.
PLEASE PLEASE PLEASE take some time to ensure the media doesn't print things like this. It's going to destroy me. You guys have most certainly benefitted from my hard work over the years. Please don't kill my reputation!
And I'd like to add that what made this a catastrophic error was that different requests were served in the same address space, rather than using address space isolation as in process-per-request/fork() architectures of old. For years now many network daemon programs have been written in an event-based, single-address space style, but I have never seen the alleged process creation overhead quantified (except for maybe multi-threaded programs). Even OpenBSD's httpd disses eg. CGIs as "slowcgi" (when you'd expect the OpenBSD developers take pride in the fact that their httpd uses ASLR etc. features of the O/S rather than inventing their own ad-hoc mechanisms to defeat deterministic memory allocation in user space, and would take the opportunity to tune O/S process creation). I don't have facts to share either, I'm just puzzled that we're re-inventing O/S mechanisms in user space with performance arguments without backing this up by numbers (or are there any?).
Well, the general point still applies. The bug occurred using code that was written in a safe language and compiled to C. It's just that there are multiple ways for that to go wrong. The generator could have had a bug -- it's software, so it almost certainly does. Or, as in this case, the user didn't use it correctly. Either way, the idea that you can write code in a safe language and compile to C to eliminate the type of bugs that C allows isn't true.
Are such errors less likely? Possibly so, but they're not categorically eliminated. It becomes a risk assessment exercise rather than a simple thing that everyone should do. Note that it also opens the door to Java-style problems, where once the generator becomes ubiquitous, it becomes the most valuable target for exploit-hunting because a vulnerability in the generator gets the keys to all the houses.
You are arguing that no language X is safer than writing program manually in Y when program in X is compiled to Y. Because compiler from X to Y may have bugs.
Therefore no code written in Rust (X) executed on x86 CPU (Y) is safer than manually written x86 assemby, because Rust compiler (and LLVM) may have errors.
And well, we can actually go deeper. There is CPU frontend that is generating micro code, which may have bugs. There is also CPU backend which is executing micro code, which also may have bugs. All in all there is no hope in programming. There might be bugs everywhere so you can never be sure what your program does.
That's not what I'm saying. I'm saying "rewrite it in Rust (or whatever)" isn't some silver bullet that fixes security problems. It's always about assessing risk -- both risk of security issues as well as risk of upsetting your users, etc. Basically exactly what the article says.
> Either way, the idea that you can write code in a safe language and compile to C to eliminate the type of bugs that C allows isn't true.
Is a bit different statement than:
> I'm saying "rewrite it in Rust (or whatever)" isn't some silver bullet that fixes security problems.
The first one is wrong, the second one is true.
Using a higher level language rules out some classes of programming errors which are possible in lower level languages. The fact that compilers have bugs does little to diminish those gains.
Semantics of Haskell does not allow to express program that generates double free [0]. Perhaps one of the compilers will compile some Haskell code to binary that frees memory twice. However, this bug in compiler is far more less likely that a programmer making this mistake in C. Whats more when this bug in compiler is detected and fixed. The problem can be fixed in all affected code bases without need to change the original source code. Thus chances of bugs are lower.
Nobody really argues that Rust (or OCaml, or Haskell, or whatever) is a silver bullet, i.e. solution to all problems that will miraculously make programmers produce no bugs at all. Obviously we will have software bugs even with most restrictive languages. No amount of formal proofs will save us form misunderstanding specifications or making typos. And then again we will also have bugs in implementation of those high level abstractions.
And for the record I am really annoyed with movement to rewrite everything in Rust.
[0] Yes, you can call free through FFI with whatever arguments you like, as many times as you like. But for sake of brevity let's assume this is not how you write your everyday Haskell.
The hope is writing a formal description of required architecture functionality (formal proof) and then validating the proof. Not 100% safe against non deterministic issues or very complex but good against most others.
So no code is safe? All code before execution has to be lowered to some evil, unsafe language, most commonly the assembly language of targeted CPU.
The mystical process of "programmatically generating code" in also known as compilation. The case you are describing is a compiler bug. The compiler wasn't able to generate target code (in this case C code) with semantics and/or guarantees of the source language.
More generally, I don't understand this argument. Assuming you can trust the C compiler (big if, but at least some validated (large subset of) C compilers exist; see CompCert), I don't get why this would be worse then generating machine code in a safe language.
This is simply not true. C in this case is just an intermediate representation of the source program. Going through multiple intermediate representation of the source code is fairly standard practice when compiling anything. If anything it is easier to target C than directly generate target CPU assembly, because of the high level nature of C (you finish the compilation earlier, without last couple of lowering steps).
You're forgetting the elephant in the room: undefined behaviour.
Sure, you can target one compiler and be sure you'll be generating the desired machine instructions, but it can be much more difficult to ensure that your code will produce safe machine code when compiled with all possible C compilers, and the techniques used may result in a slower end result.
If you go straight from a high-level language to a compiler IR, you have a much lower risk of having to choose between either underspecifying your invariants or overspecifying them at the expense of performance.
TL;DR: C wasn't designed as a compiler IR and that complicates things.
I agree with that. It is tempting to top it by saying C wasn't designed to do anything well and that has complicated things over the last 45 years. On the other hand it's not like there has been a traditional wealth of wonderful ready-made IRs with cross-architecture backends for your high level language to chose from either, so I'm still not convinced that compiling to C is harder than to do compile to machine code yourself, especially in the common case where you don't have to get the last ounce of possible performance out.
Well, we can agree to disagree about this, but in my experience third party tools (like helpful debugging symbols) in particular suffer when there are extra intermediate languages. Extra metadata needs to be passed through more layers of abstraction.
And as a human I have had the same issues acting as a meat-implemented debugger. I had to drill through more layers to figure out why low level things happened.
Of course metadata is lost if not encoded anywhere on the way. The argument was made regarding code generation being more complex when code is saved on the intermediate level.
You're understating the problem a bit. There's no standard way to mark up C code as mapping back to the original source code so that metadata (source lines, memory mapping back to data structures) can be passed on to the compiled binaries. If the original language generated DWARF-encoded objects, then debuggers would just work, etc.
Compiling X to C and then C to assembly is not more complex that compiling X straight to assembly. In your orignal comment you wrote that the complexity of such setup is bigger, to which I responded: no, not really.
Yes, C was not designed to be intermediate compilation step and this yields losses of some information (e.g. debugging metadata, but also some semantics of source language may get lost). I never argued with that. I never said that this is a perfect setup that doesn't introduce any new problems. I just said that compiling to C is very close to what is actually happening inside the compiler targeting assembly from higher level language.
You just have a narrower scope of what counts as complexity. Mine includes things that complicate humans and debuggers understanding and analyzing the ultimate bytecode.
The techniques and difficulty in implementing the compiler itself are related but not really the same subject.
No we cannot prove that. However it is still better than the "write it in C" option because once you fix a bug in the generator, it's fixed in all current and future generated code. In other words, we no longer make the same mistakes over and over again.
how is that different from just writing it in another language? End users who need to compile will be able to regardless of the generated C code, but the end users who need to do a _little_ modification will be given ugly generated C code! Seems stictly worse to me...
In the libguestfs generator (first link above) the generated C code is required to be completely readable. It must look like it was written by hand (albeit by a programmer who is impossibly consistent and perfect). So reading the generated C code is fine. Modifying the generated code is of course not fine except for tiny test hacks, but we also include in the generated code comments reflecting where in the generator the code comes from.
I've created a number of code generators in my projects. Invariably, developers say exactly what you just wrote, "how do I modify the generated code"?
The answer is not to modify the generated code. Modify the input to the code generator to make changes.
Even when I output a warning to this effect, that all modifications to target code will get overwritten, not to check the target code into version control, the source code is already checked into version control - invariably developers modify the target code right under the comment that says not to, then they check it into version control. They then wonder why there are bugs, and their modified target code no longer works after the target code gets regenerated after the next build.
It's almost as though you can't solve the problem of programmers making errors by having a different set of programmers whom you tell to not make errors.
The impetus wasn't that programmers make errors but to solve the problem of repeatability. Many instances of issues can be solved once. There is no need to recreate the solution a number of times if it is already solved.
A code generator allows one to focus on the actual meta-problem, which is often smaller and easier to solve.
For something like curl, where the library is as popular as the command line tool, preserving the C ABI compatibility is probably the strongest reason.
Rust could expose a C ABI while keeping safe internals. The interface itself would be unsafe of course. There are a few things that rust doesn't handle natively (like varargs functions IIRC) but other than that you could probably write a Rurl that would be completely backward compatible with Curl.
Well, we can call into vararg functions, but not define them.
Since vararg functions have the same ABI as the function with only one of the vararg one idea I've always had is to write a macro that lets you write a one-arg function and have it desugar via asm hax.
Not only is curl based on C, but so are operating systems, IP stacks and network software, drivers, databases, Unix userland tools, web servers, mail servers, parts of web browsers and other network clients, language runtimes and libs of higher-level languages, compilers and almost all other infrastructure software we use daily.
I know there's a sentiment here on HN against C (as evidenced by bitter comments whenever a new project dares to choose C) but I wish there'd be a more constructive approach, acknowledging the issue isn't so much new software but the large collection of existing (mostly F/OSS) software not going to be rewritten in eg. Rust or some (lets face it) esoteric/niche FP language. Even for new projects, the choice of programming language isn't clear at all if you value integration and maintainability aspects.
I think there's two major against-C groups: those of us who have worked with C for decades and those who never worked with it. I'll try and speak for those of us who've used it for decades. The popular high-level languages that have arrived since ~1995 (Java, Python, JS, C# and friends) are excellent productivity increases. In general, they sacrifice memory and performance in favor of robustness and security. For enormous software problem domains, we just don't need C's complexity or error-proneness.
Until Rust, there's been very close to zero serious competitors for C if I wanted to write a bootloader, OS, or ISR. Not even C++ could do those (without being extremely creative on how it's built/used). The ~post-2000 languages (golang, swift, D etc) can't do that (perhaps D's an exception but it wasn't an initial goal AFAICT). This is huge, IMO.
We've groaned and grumbled about how hard it is to parse C/C++ code for decades. This is a big deal for tooling. Because of the language's design, even if you use something "simple" like libclang to parse your code, you still have to reproduce the entire build context just to sanely make an AST. All of those other new languages above probably address this problem but also add all kinds of other stuff which we can't have for specialized problem domains (realtime/low-latency requirements, OSs, etc).
> collection of ... software not going to be rewritten in eg. Rust or some (lets face it) esoteric/niche FP language
IMO it's not appropriate to lump Rust in with "nice FP language"s. And don't look now but lots of stuff is being rewritten in Rust. Fundamental this-is-the-OS-at-its-root stuff: coreutils [1], "libc" [2], kernels [3], browser engines [4].
> IMO it's not appropriate to lump Rust in with "nice FP language"s.
Maybe I should have expressed it better, but I didn't intend to lump these together.
>And don't look now but lots of stuff is being rewritten in Rust.
I'm myself cautiously optimistic re Rust, but having been burnt by C++ in the past I'm not enthusiastic about fighting language idiosyncrasies (though modern C++ certainly deserves a second look). Then there's the issue (some might argue it's a plus) that Rust is at the same time a language, a lib, and the only compiler implementation (unlike C or C++ which give you choice).
The rust coreutils is an excellent example of the issues of having such a mess of abstractions, the resulting binaries are literally magnitude larger than busybox equivalents.
> issues of having such a mess of abstractions, the resulting binaries are literally magnitude larger
They're significantly larger, yes -- it's a fair complaint of rust. But it's mostly because of static linkage AFAIK [1] and not "a mess of abstractions".
Actually, the culprit is Rust's decision to statically link its standard library and all its dependencies by default.
Things like libunwind, libbacktrace, embedded debugging symbols for backtraces, and the jemalloc allocator aren't free.
If you ask for dynamic linkage (with the caveat that Rust doesn't have a stable ABI yet), you get a ~8K Hello World binary.
It's also possible to prune down the statically-linked size by opting out of various conveniences like jemalloc. (They're working toward making the system allocator default but don't want to regress Servo in the interim.)
...and if opt into static linking with GCC and G++ (and ask Rust to make its link to libc static), Rust can actually outdo them on a Hello World.
> Actually, the culprit is Rust's decision to statically link its standard library and all its dependencies by default.
No it really isn't, static linking does not imply bloat as commonly perpetuated.
> It's also possible to prune down the statically-linked size by opting out of various conveniences like jemalloc
Try this: opt out of everything except the standard library, create something somewhat trivial and idiomatic in both rust and c, compile and see what you get.
> Rust can actually outdo them on a Hello World.
Hello word is hardly a use of the standard library.
> No it really isn't, static linking does not imply bloat as commonly perpetuated.
I never said it implied bloat. I said that, if you ask Rust to link dynamically despite the lack of a stable ABI, you'll get binaries of a size similar to C and C++.
> Try this: opt out of everything except the standard library, create something somewhat trivial and idiomatic in both rust and c, compile and see what you get.
I'll need you to be a bit more specific than "somewhat trivial", given that "Hello world" uses println! or printf() but you consider it ineligible.
> Hello word is hardly a use of the standard library.
println! aside, it's a data point and that's all I meant by it.
Not wyldfire, and I think that claim is a mischaracterization, but the main obstacle to using C++ in the kernel is that some of its language features require runtime support (new/delete, globals/statics with constructors, exceptions).
You can of course just ignore those when writing kernel code- they get ignored in application code much of the time! But I suppose at that point it could be argued that you're just writing C with a C++ compiler?
I mean, if you're writing a kernel in Rust you have the same issue. In that case you'd use no_std, which takes away the part of the stdlib that depends on allocation and such (also threads and other niceties).
You can lose new/delete and .bss statics and still write reasonable, even "safe", C++. Rust doesn't have .bss statics by design (lazy_static emulates this for you though). new isn't necessary for the "modern" C++ safety stuff and you can write pretty good modern C++ without new. All new gets you is a nice wrapper around allocation, and when writing a kernel you can't and shouldn't allocate anyway. In Rust, too, you would not be allocating, either via memmap/malloc or via Box::new().
So it wouldn't be "C with a C++ compiler", it would be "C++ without allocations", which is a restriction from the problem statement anyway.
I don't get it. AFAIK you can implement all of those things in your abstraction, and then use it like canonical C++. I think you are wrong. Please correct me.
I might have to walk that back. It seemed to me that no_std was "more straightforward" and/or "more formalized" than "#pragma interrupt" (etc). But I could be wrong there -- if so, mea culpa (the post is no longer editable).
Rustc did recently get a "x86-interrupt" calling convention, but that's unrelated to #[no_std], and only works on x86. Either way, "#pragma interrupt" should work just as well in C++ as in C, since C++ doesn't really change any aspect of the language that matters there.
Further, even in C I rarely see use of "#pragma interrupt"-like tools- rather, everyone still seems just to use per-platform assembly glue code. (To be fair, my experience is mostly in kernel code for things like Linux, rather than standalone embedded applications where "#pragma interrupt" would be more valuable.)
no_std is more formalized, though C++ enforces the same thing by failing to link if you try using malloc (or whatever) when writing a kernel. no_std also means that it's very easy to tell if a crate works without the stdlib, so you can use code from the ecosystem instead of rolling your own.
Ultimately the Rust OSes resort to some handwritten assembly as well. I think that's going to be a constant of writing a kernel. Rust is working to minimize it (e.g. with things like `extern "x86-interrupt" fn`), but at a kernel level there are just some kernel specific asm instructions (like all of the TLB stuff) that either compiler will probably never support generating without inline asm.
So while Rust may be better than C++ at writing OSes (I'm not sure! I haven't looked at all the stuff you need to write an OS in C++), I do think they're in the same ballpark, close enough that if Rust is a "serious" competitor C++ probably is too :)
> Until Rust, there's been very close to zero serious competitors for C if I wanted to write a bootloader, OS, or ISR … We've groaned and grumbled about how hard it is to parse C/C++ code for decades.
I honestly think that Common Lisp can do this quite well. It was designed to be a high-level language, but it's completely capable of working at the machine level, pleasantly and easily. Unlike C, most of the time one has safety, but one can disable safety when necessary with a simple (declare (safety 0))).
Performance is extremely good with modern compilers, although I don't know how good they would have been back in the old days.
From what I can tell of Rust, it doesn't look easier to parse than C (but I've not looked deeply); certainly, it's orders of magnitude more difficult to parse than Lisp.
I believe that Standard ML or OCaml could do similar things as well, albeit at the cost of being more difficult to parse. Smalltalk is maybe a little less capable, but somewhat easier to parse.
Yes, it's harder than lisp, but it's still much easier than C. C and C++ have issues due to ambiguities that make them context-sensitive. C++ has it worse because parsing is dependent on typechecking because of templates.
Rust is not 100% context-free, but the feature that is non-context-free (raw strings, a rarely used feature) is still pretty easy to parse, and even if you capped it at 6-level raw strings you'd probably be able to parse all the Rust code out there.
> I honestly think that Common Lisp can do this quite well
I haven't used any lisp dialects for decades, so I have naive questions: is there really sufficient support from compilers+linkers to write a bootloader in lisp? Do I have to do a lot of bootstrapping in assembly to bring up lisp interpreter before I can execute the lisp code or does the ahead-of-time-build result in executable machine code? Can I do inline assembly (not required but a really key benefit IMO)? Are there numerous examples where someone's already written one in lisp?
https://github.com/dym/movitz is a Common Lisp system that runs on bare metal x86. The source code is quite readable.
The rest of this post is an excerpt from an email I sent 6 years ago.
The following comments on runtime systems are partially based on a long c.l.l thread with posts by Lucid, Symbolics, and Franz alumni.
Franz uses a 3-layer approach: CL, a low-level Lisp, and C.
Lucid started with Lisp that generated assembler but reluctantly added some C.
Symbolics Lisp Machines used bootstrap code in a Pascal-level language with prefix syntax. A Symbolics alum said that in retrospect they should have used C.
Most Lisp implementations have subprimitives - low-level functions that can circumvent the type system, often with a prefix such as % or :.
Assembly language integration dates to Lisp 1.5 and there are several common approaches.
1. turn the optimizer off - this is easy to use and implement.
2. optimize the assembler block - Naughty Dog GOAL did this.
3 sounds like where GCC got its inline asm concept from: annotate the assembly with what are the inputs and output operands with constraints (do they have to be certain kinds of registers), and whether anything has surprising side effects.
Parsing is IMO of all the complaints you could make about C/++ rather bikesheddy. Parsing is a solved problem. Modern compilers can parse millions of LoC per second. And most of the specific parsing-related complaints (pointer dereference or multiplication?) about C/++ are also true of Rust. (Edit: Nope, brain fart on my part, see below). And, AFAIK, all C/++ parsing is well-defined, if counterintuitive in certain edge cases.
> most of the specific parsing-related complaints (pointer dereference or multiplication?) about C/++ are also true of Rust.
This should not be true, and we fought hard to keep it that way. There's one spot of Rust's grammar that's context-sensitive, for something very rarely used, and other than that, it's all much simpler.
You're right. AFAIK types and identifiers are always unambiguous in Rust. I was thinking visually (same operator) instead of in terms of specification and implementation. Shows me to make flippant comments from the toilet!
My larger point is that there are plenty of very good reasons to criticize C/++, and parsing is a minor one since parsing is fast, and even if the creation of the AST isn't context-sensitive, verifying its correctness (is this identifier in scope?) still is.
What is the context-sensitive spot in Rust's grammar?
> plenty of very good reasons to criticize C/++, and parsing is a minor one
Ok, fair bit, it's a frustration for me but admittedly not as important as the other differences.
I mention it because it's a wart in C's language design and I figured Rust's safety features are already well-known and heavily discussed. If I want to write a simple tool "ask this tree of .c files how often they use an identifier with name 'X' or type 'Y'", I have to find out the include paths, defines, all kinds of other "noise" just to find out what could be a relatively simple query of the source base.
This also means that autocomplete tools usually need to be taught how to build a project. YCM has this whole conf file where you specify the header locations and stuff and it's like rewriting half the makefile.
Please, connect the dots for me. An initial skim of the commit messages did not yield any egregious "Utter Disregard for Git Commit History" [1]. Even if it did, it may just mean that the maintainer is focused more on results and robustness than preserving a pristine history of the project.
Don't think that was where I was going. It was more of the none of this stuff is really done, it is hard in this language and help would be appreciated..as understood through the project splash page and then examined through commits?
I hope Rust doesn't face the same fate as other ambitious projects by Mozilla. Rust has a quite unusual syntax compare to any other systems programming language. Also, there is a big learning curve. Keeping all the benefits aside, I really hoped Rust had a simpler syntax. I really think, one day a language will borrow the good parts of Rust with a simpler syntax and get ahead of it. Rust in its current form will never be as successful as C/C++.
I am not against Rust. Rust has some great ideas and intent. I just feel they should have created simpler syntax. A more complex and unusual syntax doesn't have any real benefits IMO.
You still haven't pointed out what syntax is problematic exactly. I've never programmed in Rust, but I don't have any trouble reading it coming from a C and C# background.
What exactly is the problem with the example you linked to?
Not having programmed in Rust, the fact that Rust requires all type parameters to be used, thus ruling out proper phantom types, was semantically surprising to me. But I don't understand what syntactic issue the other poster had with that example.
In my understanding, it has to do with variance. This happened a very long time ago, before 1.0, and so I don't know where the discussion happened, off the top of my head.
It still seems bizarre to me that a purely type-level expression is forced by an effectively non-existent term. That RFC specifically states that the main problem is that the results of variance inference are largely erased by assuming invariance. That seems like a sensible default for unused type parameters too.
It seems from the conclusion of that post that PhantomData only survived because this was the smallest change they had to make to get this all to work better, and because some of this PhantomData could be used for other analyses in the compiler (although it's not clear if better type information could have replaced these uses anyway).
That's quote ,a library that generates code for you at compile time ,it takes code as input,has it's own syntax. It's like compile-time reflection.
It's not code rust programmers would normally write, I'm one of those programmers. I'm glad some libraries like serde,rocket and diesel are using it to generate code instead of doing run-time analysis.
> the large collection of existing (mostly F/OSS) software not going to be rewritten in eg. Rust or some
It is happening and will keep happening and it is really necessary at some point. Sure, I don't expect large project to be rewritten overnight, but every large project is being redone eventually. Especially when C becomes main source of problems. And you can introduce better languages gradually.
Many of those are often written in userspace, where you are free to use any language.
Several of them will provide sufficient performance.
> databases, Unix userland tools, web servers, mail servers, parts of web browsers and other network clients, language runtimes and libs of higher-level languages, compilers and almost all other infrastructure software we use daily.
For all of those you'll find Rust implementations. Some are work in progress, some are already widely used.
C was originally a watered down version of BCPL because the computers they were targeting had very limited capabilities compared to other computers of the time. The first version of C didn't even have floating point numbers.
I am student and I like C, I've tested Rust/Go, I like the feeling of C. Maybe that sentiment will change later, but for now, I like C. It's simple and sharp and there's lots of doc/books.
C seems relatively simple, but it's also incredibly easy to do the wrong thing usually without realising it. That's the problem. I suggest you read up about 'undefined behaviour'.
Not OP, but I actually think string manipulation in C is really elegant. Many people who complain about it have too many allocations in their code and are trying to port the allocation-heavy non-C way of thinking to C. The C way I know focuses mainly on character at a time iteration with emphasis on not copying the source string.
I'm reminded of a time a colleague needed something like string.split, and working in c++ he filled a std::vector<std::string> with the result. Using a more C way he'd really only have needed a couple of pointers on the stack.
It's a little naïve to think it works on English without the coöperation of the users. (Also, less glibly, things like emoji seem to be becoming more and more popular.)
You can, and then you get a 4-byte long character 1-byte before the end of your data, you skip over the null-terminator and into the stack, and bang.
Yes, you can avoid this if you're careful and you understand the intricacies of utf-8 (or some other multi-byte encoding), but it very quickly stops being elegant.
What do you mean by "character"? If you mean code point or "unicode scalar value", sure, but if you mean user-visible character (grapheme), it's much more complicated: even something "simple" like ö could be one or two code points.
This is not true. A zero-byte in a utf-8 string is the null-terminator and utf-8 strings can be treated exactly like C strings in terms of where the string ends.
What you do need to look out for is malformed utf-8, for example, 1 byte before the null terminator you get a lead byte saying the next character is 4-bytes long.
If you're not checking each byte for null and just skipping based on the length indicated by the lead byte then you're in for a crash.
Where utf-8 strings differ from C strings is slicing. You can't just slice the string at some random point without doing extra validation to make sure you only slice on codepoint boundaries.
> A zero-byte in a utf-8 string is the null-terminator and utf-8 strings can be treated exactly like C strings in terms of where the string ends.
No, the parent was correct: UTF-8 encodes NUL (i.e. \0) as a single zero byte (e.g. in contrast, Modified UTF-8[1] uses an overlong for NUL, so there's never any possibility of an internal zero). Of course, an application/library can choose to restrict itself to only handling UTF-8 that doesn't contain internal NULs, but the spec itself allows for zero bytes in a string.
The point is, if you handle strings the C way, you're not in conformance with UTF-8.
If someone passes you a text file that is verified to be valid UTF-8 and contains, say, access permissions, then you better not stop parsing it at the first '\0' character.
None of this is a huge problem, but it's something to be aware of. C string handling is incompatible with UTF-8.
File processing and string processing are not the same. If you have a file that has a specific data format outside of the encoding, and that format includes NUL bytes as part of the data, then obviously process the file based on that format.
That's separate from string handling.
UTF-8 was originally designed to be compatible with NUL terminated strings and keep NULs out of well formed text.
In fact it was the first point in the 'Criteria for the Transformation Format', mentioned in the initial proposal for utf8.
>File processing and string processing are not the same
The UTF-8 spec doesn't make that distinction as far as I know. There's a simple fact: A valid UTF-8 byte sequence can contain nul characters. So you can't naively use C string handling functions on it. And as someone else has correctly pointed out, the same is true for ASCII.
I'm just pointing out a potential pitfall and a source of security issues. Some might assume that after validating UTF-8 text input, you could just dump it in a C string and process it using C's string functions. But that's not the case.
Unless you have U+0000 there isn't any other sequence of code points that has an 0x00 byte in UTF-8. I don't see this as a huge problem.
If you really do need it there are some C language libraries that use "pascal-ish" structs to do strings. UNICODE_STRING in Windows comes to mind. Doing strings in C doesn't force you to use C strings, it's just the most common thing to do.
So it's somehow C's fault that Unicode uses variable-length encoding, which is automatically going to be harder to process correctly at a byte-by-byte level than a fixed-length method, and also included known-C-incompatible null bytes?
> So it's somehow C's fault that Unicode uses variable-length encoding
Parent said string handling in C was elegant. My point is that it becomes fraught with (even more) issues once you throw non-English language at it.
It is C's decision to handle strings in this way, and the decision of many C programmers to treat all strings as if they are just iterable character pointers.
I am the parent you are talking about. I've made this argument many times with people: Unicode is crazy complicated in any programming language. People think that widening the char width will help - well you seem to be somebody who knows Unicode so you probably know the horrors of surrogates, combining characters vs. pre-composed diacritics, zero-width joiners, Han unification, variation selectors, BiDi... This is in no way just a C thing to deal with all that nonsense. I've not seen any language or library that I'd say does it "well" and saves individual programmers from considering the above. They all punt the issue to the programmer.
I've heard (mostly here) that Swift does something different and treats glyphs as the basic unit. I haven't had a chance to look at precisely what that does. Given all the issues I've seen elsewhere I'm skeptical that someone, anyone can pull that off correctly.
UTF-8 at least has one elegance (there's that word again) in the design in that you can do some "dumb" ASCII things and if your code does not know what to do with fancy unicode, you can check the high bit of any given octet and "safely" skip over it and any adjacent nonascii sequence if you don't know what it means. This may or may not be applicable to a task at hand.
> This is in no way just a C thing to deal with all that nonsense. I've not seen any language or library that I'd say does it "well" and saves individual programmers from considering the above.
This is true, however even something as simple as storing the (byte) length as part of the string reduces the complexity and the likelihood for bugs.
Other languages also prevent accidental buffer overruns so while they still need to deal with all the same Unicode problems you mentioned, the program likely won't crash if the programmer gets things wrong. The same is not necessarily true of C.
FWIW in Rust you also tend to avoid allocations, since all string manipulation is done via slices -- safe (ptr, len) pairs. It's pretty neat.
IIRC C++ is getting slices too, so it might be able to get better APIs around string manip. But I've seen decent string manip code that avoided allocations.
> Using a more C way he'd really only have needed a couple of pointers on the stack.
This is pretty much how it's done in Rust too via slices. For example, the standard way to split a string is to create an iterator and it won't do any allocations.
A lot of mentioned software was started many years ago, when other languages were hardly viable options. And those programs are good enough now, so there's no much movement to replace them. It's not a good argument for C, IMO. It's like telling that Windows is awesome because so many users use it. But when people started from scratch (mobile world), it turned out, that Windows is not the best OS.
Actually when I'm reading about new software, it's very rare to encounter C. Usually it's something else.
Didn't know that curl was stuck back on C89, that's really optimizing for portability.
If anyone is confused by the "curl sits in the boat" section header, that's basically a Swedish idiom being translated straight to English. That rarely works, of course, and I'm sure Daniel knows this. :)
The closest English analog would be "curl doesn't rock the boat", I think the two expressions are equivalent (if you sit, you don't rock the boat).
In the curl project we’re deliberately conservative and
we stick to old standards, to remain a viable and reliable
library for everyone. Right now and for the foreseeable
future. Things that worked in curl 15 years ago still work
like that today. The same way. Users can rely on curl. We
stick around. We don’t knee-jerk react to modern trends.
We sit still in the boat. We don’t rock it.
I see a lot of inertia in there. While it's a great record to maintain 15-year consistency but in the era of every changing InfoSec outlook, it could be a legacy and baggage if the authors resist to change. One thing we know for sure is that human will make mistakes, no matter how skillful you are. In the context of writing a fundamental piece of software with an unsafe programming language, that means we are guarantee to have memory-safety induced CVE bugs in curl in the future.
Some of other points that the author raised are valid too. If there is a trade-off that we can have a safer piece of fundamental software by almost eliminating a whole category of memory safety related bugs, and with the downside of less compatibility with legacy systems, more dependencies etc., perhaps we should consider it? I believe the tradeoff is well worthy in the long run and option is ripe for explore.
How is the author resistant to change? He specifically said new code should be written in a language that meets the priorities for that code. He specifically said someone has or would write a competitor to curl in Rust or some other safer language and that a good one will take off. He welcomed that.
What he doesn't welcome is rewriting something that's had those bugs and the types of logic bugs not related to the language already worked out. There's a saying about a baby and bathwater.
Not everything is a dichotomy, and you shouldn't be reading the article as if the author is against newer languages. He specifically says that given a fresh start with the availability of these languages he might use something besides C. Carefully weighing options is wise. Throwing away years of actual progress for the appearance of quick progress is foolish.
I specially quoted the section head "curl sits in the boat" and the entire section ends with "We sit still in the boat. We don’t rock it". Now read it again, and then tell me if that's welcoming changes or resisting changes.
> He specifically said someone has or would write a competitor to curl in Rust or some other safer language and that a good one will take off. He welcomed that.
Sure there might already be some alternatives out there. But those are not curl, they are at most forks.
> He specifically says that given a fresh start with the availability of these languages he might use something besides C.
Nope, he used the word "Maybe. Maybe not." Might is a stronger word.
I see "we don't rock the boat" as completely synonymous with "I am resistant to change".
(Note that I suspect you think that I or the original poster are suggesting that he's resistant to all change, not just change within curl. I don't believe he's resistant to all change, but I do believe he's resistant to change within curl, which is what we're talking about here.)
He's not resistant to change in the problem space. He's resistant to very particular types of change in one very particular codebase, and for very sound reasons.
He doesn't want to break the ecosystem around curl, which is huge, while getting back to feature parity and compatibility during a full rewrite. Something that comes along and replaces curl externally needn't be completely compatible and therefore is freer to leverage their new, fresh start much more fully. He welcomes a competitor, which means even a possible complete replacement. That's not a resistance to change. That's being very judicious about what one changes and why.
> He doesn't want to break the ecosystem around curl, which is huge, while getting back to feature parity and compatibility during a full rewrite
Very good point! However, rewrite doesn't have to break the existing ecosystem and it can happen in a parallel track right? (curl has had 1.5k contributors so far so the community should have enough support to maintain existing codebase while developing new version on a new programming language hypothetically speaking.)
In fact, I would argue that the "it works now and has been working for 15 years so we don't rock the boat" attitude is negatively impacting the curl ecosystem in the long run. I'm a python developer, a good example I can give you to support my view is the python 2.x to py3k transition (If the "better unicode support" is analogous to "memory safety bug avoidance").
Python 2 to Python 3 was basically world-breaking for many projects. Most every non-trivial piece of code needed to be modified to work with 3. Many people maintain older code on 2 to this day even with newer code being written on 3 by the same people.
> it could be a legacy and baggage if the authors resist to change.
Every few years there is a new batch of programming languages that come out and they all gain a small passionate community that tries to convince the internet how much better that language is.
They inevitably use the argument that code not written in the new language is 'legacy' and 'resistant to change'.
Neither of those assertions are accurate or enlightening unless you can provide a proposed replacement and prove the superiority of the new code.
Simply telling other programmers to rewrite there code in xyz language with such arguments is primarily a case of armchair development.
If you really think it could be done better then do it and prove it.
> If you really think it could be done better then do it and prove it.
What I said is my believe and opinion, obviously. I don't have the skill set neither have time to do/prove it, unfortunately. But that shouldn't prevent me from speaking out my opinion, should it? Just like a lot of people really think space travel could be done better but not all of them have the capability and resources to do it and prove it.
> I don't have the skill set neither have time to do/prove it, unfortunately. But that shouldn't prevent me from speaking out my opinion, should it?
Given the fact that programming language fanboy noise is a constant problem in language threads it would be nice to see less posts of higher quality then the same arguments that are always made. Its not really going to change anyone mind that has experience and the people who do argue are usually inexperienced with vapid counter arguments.
So I guess the answer is no. If you have an uninformed opinion then its better to not add noise and let the people who really know there topic enlighten us with a well argued not very noisy discussion.
I think the value of NH is very much quality over quantity.
It's extremely simple. If you think Curl would be better in another language then port it, release your alternative, and maintain it for a long time.
Even if your language (Rust, Erlang, LISP, Go) is "better", it's still a minimal part of the equation. A maintainer is what makes the tool. It's hard work to decide which PRs to accept (and worse yet, reject), to backport fixes to platforms for which you can't get a reliable contributor, coordinating fundraising/donations, keeping up with evolving standards...
Anyway. Thank you, thank you, thank you Daniel Stenberg. Use whatever damn language you want.
I wouldn't presume to speak for Daniel, but I got the feeling that he just wanted to publish this to point people to rather than send the same canned response to inquiries about porting to Rust et al.
Drakma sounds great. Tone is tough to get right online. I don't like people doing drive-by suggestions like "you should rewrite X in Y". But if people are really willing to roll up their sleeves, write the tool (in any language) and keep it going for the long haul, I applaud them. I just have great respect and empathy for project maintainers, I think some don't appreciate what a huge PITA it is to BDFL a successful project.
> A library in another language will add that language (and compiler, and debugger and whatever dependencies a libcurl written in that language would need) as a new dependency to a large amount of projects that are themselves written in C or C++ today. Those projects would in many cases downright ignore and reject projects written in “an alternative language”.
Why would I be vendoring my own copy of libcurl in my project? Who does? This is how I (or rather, the FFI bindings my language's runtime uses) consume libcurl:
dlopen("libcurl.so")
I rely on a binary libcurl package. The binary shared-object file in that package needed a toolchain to build it, but I don't need said toolchain to consume it. That would still be true even if the toolchain required for compiling was C++ or Rust or Go or whatever instead of C, because either the languages themselves, or the projects, ensure that the shared-object files they ship export a C-compatible ABI.
An example of a project that works the way I'm talking about: LLVM. LLVM is written in C++, but exports C symbols, and therefore "looks like" C to any FFI logic that cares about such things. LLVM is a rather heavyweight thing to compile, but I can use it just fine in my own code without even having a C++ compiler on my machine.
(And an example of a project that doesn't work this way: QT. QT has no C-compatible ABI, so even though it's nominally extremely portable, many projects can't or won't link QT. QT fits the author's argument a lot better than an alternate-language libcurl would.)
Agreed 100%. Definitely going to be trotting this article out next time I see someone blindly arguing for rewriting xyz in Rust.
I particularly like the mention of portability. No other language comes even remotely close to the portability of C. What other language runs on Linux, NT, BSD, Minix, Mach, VAX, Solaris, plan9, Hurd, eight dozen other platforms, freestanding kernels, and nearly every architecture ever made?
I mean, sure, and if you have users running VAX or the Hurd, that matters. But it turns out that most of us use one of Linux, NT or OS X. And even if you add BSD and Solaris (and a few other Unixes) you can still find languages without C's known problems that cover 100% of users. "But embedded." Embedded can maintain their own software, they do all the time. How long are we going to insist that end users run software that cannot be secure because of the lowest common denominator of programming languages?
I think this is a flawed mindset for a number of reasons.
First, I'd rather appeal to every user than most users. That one user I didn't have to appeal to is going to be a much more faithful and grateful user than the "normal" ones. Most of my software work is open source (remember this context is a discussion about curl), and this encourages active collaboration with users with niche situations. If I choose technologies that make using my software attainable for these people, odds are they aren't going to stop at just porting it to their platform.
Limiting your platforms to Linux, OSX, and NT also stifles innovation. These platforms are all deeply flawed. Their popularity isn't due to having the best design, but rather to having a good enough design and being entrenched. They're old platforms, we've learned a lot since they were started. New or niche platforms bring a lot of value to the table. The BSDs are a great example, as it's the best suited platform for a wide variety of applications.
All a new platform has to do to be able to run nearly all general purpose software is port a C compiler. Not even that - they just need a cross compiler. This is a great thing, IMO.
>Embedded can maintain their own software, they do all the time
This is a pretty silly argument. Most embedded developers don't ship their own implementation of HTTP, they ship curl!
> Their popularity isn't due to having the best design, but rather to having a good enough design and being entrenched. They're old platforms, we've learned a lot since they were started.
I think one could say the same thing about C's popularity as a language.
C was well-designed for its time, but "extremely" well-designed is a stretch given the much better designs that came immediately before (ALGOL 60 and 68, Pascal, Scheme) or after (Ada, Modula, ML) it. C was optimized to be fast to implement (and won out for that reason — "worse is better," and because UNIX was the first usable OS written in a high-level language), not for the best practices in safety or even performance, even as understood at the time.
I'd really disagree. All those languages are both safer and more expressive (if more verbose in the case of those with Pascal-like syntax) than any version of C, and, except in the case of ML, Scheme and ALGOL 68 with the optional garbage collection, there's no reason they couldn't be as fast or faster than C. Their main fault was simply in being too ahead of their time: too difficult or impossible to implement well on a PDP-11.
(I deleted the part about FORTRAN 77; seems I was confusing it with F90, which is the version that first allowed identifiers longer than 6 characters, dynamic memory allocation and user-defined types).
There are cases where you need to be close to the hardware -- the kernel, graphics drivers, low-latency graphics and audio. Why does using a URL to retrieve a file over the network require being close to the hardware?
- Var parameters instead of pointers for out parameters
- Real modules with type encapsulation
- Type safe function pointers
- Language support for concurrency
- Open arrays for variable length parameters
- Exceptions
All of this available in 1978.
By no means anything exceptional, Niklaus Wirth inspired himself on the programming language Mesa, used by Xerox PARC to create the Pilot OS and the Star workstation, as Xerox wanted to move away from BCPL in 1977.
Also many of these features were already available in Algol.
I find it rather implausible that we've managed to learn a lot of new things about operating system design and nothing about language design when the last major new operating system design to see significant adoption was probably NT in 1993, and we've had boatloads of new languages see adoption since then. Because the talent and effort is going to go where the rewards are, and if designing new languages is more productive than developing new operating systems, I would think that's where most of the energy is going. The inverse of Sturgeon's Law is that 20% of everything isn't crap, and the more of something you have, the larger that 20% is.
The main difference is that operating systems are complicated and programming languages (are supposed to be) simple. The biggest strength of C is its simplicity - there's not much that can go wrong with such a small feature set. I find Go to be pretty strong for similar reasons.
C is not simple. It is incredibly complex, due to the way the standard specifies all operations in terms of an abstract VM and offers absolutely no guidance on what to do when code goes out of a small set of behaviors. Because undefined behavior is so easy to trigger, essentially all large production C code relies on undefined behavior.
It depends on whether you actually want to know precisely what your code does. For me, that's essential to writing reliable software.
In my view, one of the reasons why we consistently fail to produce reliable software is that we continue to use a language from the 1970s that makes it very hard to determine what the meaning of a program is.
"Classic" C was actually simpler and safer than modern C.
Before optimizers, C was a WYSIWYG language. Yes, you can shoot your foot (gets) but you know what's happening where, and can manually check everything.
Modern C with language lawyering can "optimize out" your safety checks, leading to exploits.
Yet those optimization passes have been essential for keeping C alive. Optimizing C well depends on exploiting undefined behavior. And if not for optimizing compilers, I think C would have been replaced a long time ago.
For example, when everything can legally alias everything else (as in the case of "classic" C), it's hard for a compiler to prove anything about the contents of memory. This prevents a lot of seemingly-obvious optimizations. The problem with C is that you need to violate many programmers' assumptions about how the language operates in order to make it fast.
It's not a coincidence that a lot of C/C++ compiler developers have ended up moving on to other languages.
The long lists other users have posted in this thread of bugs in curl that wouldn't have been possible in another language suggests that in fact there's a lot that can go wrong with C.
Most of libcurl's users are in the embedded space, where they might not even be running an OS at all. So portability does still have to be a primary concern.
I think the strongest argument is "rewriting would introduce lots of new bugs that we don't have now". It's a lot easier to justify staying the course with C on a project that has its troubled youth long behind it, than it is to justify starting a new project in C now.
Curl had more vulnerabilities reported this past year than the previous two years combined. The number of bugs is growing year over year now that automated checking tools are mainstream. The number of problems revealed is accelerating, not slowing down as you seem to think.
Just looking at the numbers is misleading: curl had a security audit last year ( https://daniel.haxx.se/blog/2016/11/23/curl-security-audit/ ), which may mean much fewer are reported over the next few years, as they were all found last year.
> The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.
This statement is laughable nonsense. Shall we go into their bug history and point out counterexamples left and right? [Edit:user simias has done this; thanks!]
Every single bug you ever make interacts with the language somehow.
Even if you think some bug is nothing but pure, that logic is part of a program, embedded in the program's design, whose organization is driven by language.
>There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.
That's wrong. A lot of the C mistakes are indeed "logical mistakes in the code", but most of them would be indeed fixed by changing to a language that prevents those mistakes in the first place.
In my view, the problem with C in general is that it's a loaded gun with no safety or trigger guard. It's trivial to shoot yourself (or someone else) in the foot, and it requires knowledge, meticulous care and lots of forethought to avoid getting shot.
I very much agree that rewriting existing, stable software written in C is likely not worth the trouble in many cases, but I can't accept claims that the limitations of C aren't the direct cause of tens of thousands of security vulnerabilities, either.
In Rust, even a less experienced developer can fearlessly perform changes in complicated code because the language helps make sure your code is correct in ways that C does not. And you can always turn off the safeties when you need to.
Experienced developers should feel all the more empowered by simply not having to always worry about things like accidental concurrent access, use-after-free, object ownership, null pointers or the myriad other trivial ways to cause your program to fail that are impossible in safe Rust. You get to worry about the non-trivial failure modes instead, which is much more productive.
While I'm definitely not suggesting we replace Curl with a rewrite in Rust (since the current Curl has had decades of good testing and auditing done on it), I am actually very curious how a rewrite in a safer language like Rust, OCaml, Haskell, or Go would fair in comparison in regards to performance and whatnot.
If I were ambitious enough, I'd do it myself in Haskell, but I think it'd be too much work for a simpler curiosity.
Most likely I/O will take more time than whatever code is running, so in that aspect it would make no difference. Memory overhead is main concern here. Rust doesn't use GC, so you'll have full control and there should be not much difference in that aspect. Other languages do, which means more sophisticated runtime, less control and more overhead (or writing ugly code to avoid it). Libcurl written in Go/Ocaml/Haskell would require anyone using it to also include runtime of the language, which is usually rather large.
This seems like a no-brainer for a re-implementation in rust, but I wouldn't expect that someone would rewrite curl itself in rust, but a new library that does the same things.
Most languages already have HTTP client libraries. (In particular, Rust has Hyper. Ruby/Python/Node/Go have HTTP clients built-in in the stdlib, Haskell has http-client, etc.) Who uses libcurl really? (Spoiler alert… PHP.)
Of course libcurl does FTP and Gopher and all the things, but these aren't commonly required, most applications just need HTTPS.
People that write C and often C++ use libcurl. A better library for C/C++ developers would be nice and I believe it could be written in Rust, although that would be a bit of a pain because then you need to integrate Rust into your build system.
Do you? For building C projects I certainly don't have to build libcurl, it comes packaged and ready to use with my distribution. The same could be the case with a hypothetical HTTP library written in Rust.
libcurl is still heavily used even inside the rust community. Right now on crates.io the curl crate is sitting at about 220k downloads to Hyper's 925k. Sure, hyper has a lot more, but not to the level of "who uses libcurl really?"
Shorter code, since rust is a higher level of abstraction.
Safer code, so half of the vulnerabilities wouldn't have existed.
Rust's ecosystem (package manager and libraries).
Of course you lose portability and you probably appeal to fewer developers, at least for now. So there is a trade-off.
I wish Rust compiled to C. It would be my dream language. The only reason I can't choose rust half the time is because it doesn't support targets I need to support.
I don't know that you'd gain much in the real world though. Starting such a project now? Rust, for sure, for me anyway. But is it worth rewriting Curl? I agree with the author there, it most probably isn't.
I think Rust community increasingly behave like this[1]. They are big on suggesting others the better 'ideas' instead of implementing themselves. So they keep using 'curl' and 'openssl' but tell others to rewrite their software with Rust.
Like, a lot of the rust community is putting the effort into rewriting things in Rust.
From within the community I don't see any coherent effort to tell folks to rewrite in Rust. We're very happy to see Rust rewrites, but not many people are pushing for it except the folks actually putting effort into it.
Yes, every time a vulnerability pops up someone will say "rewrite it in Rust", but half the time it's not even a Rust programmer (often, Rust programmers come and disagree and say "Rust wouldn't fix this", IME).
I would say a portion of Rust community---that said, it is that portion that is the most visible, and I think the community well understands what it does mean.
I'm not sure the community well understands it at all. The usual rejoinder is something along the lines of "well, I don't see that in the Rust community I'm in."
Both /r/rust [1] and the official forum [2] are reasonably regulated and I often see less informed members got warned about their use of aggressive or insulting languages, often directed to non-Rust languages. There are surely other venues with less enforcement (they still commonly observe the Rust Code of Conduct however), but at least I think the main venues and corresponding community heavily tries not to be offensive.
I don't think C is a bad language, although I think it could use lists and dictionaries in standard library. std::vector and std::map are the only things that make me pick C++ in an instant, given the choice.
While C by itself is not safe, I would argue that no sane development environment uses C by itself. Over the decades of its production use dozens of tools have been developed that make it far safer: *grind suite, coverage tools, sanitizers, static analyzers, code formatters and so on. Those tools are external, otherwise they would make C slower. Something for something.
I think it's a bit weird that C and curl are used. If we look at C and OpenBSD or so things might look a bit different.
Also one has a hard time comparing curl with another language, simply because something with curl's properties (take portability for example) doesn't exist.
And no that isn't in defense of anything, just me thinking thinking that measurable points brought up in the discussions don't make sense or exist.
The topic is also a bit broader, as you can easily add in static code analysis, compiler flags, stuff like W^C, stuff like seccomp, capsicum, cloudabi, pledge which might not work (well) in other cases.
It's a great philosophical discussion topics and I don't wanna stop anyone, just hoping people keep that in mind, when they participate, so we don't end up with new dogmas that get thrown around for the next few year, without knowing contexts or meaning of phrases.
Other than that: I really enjoy this discussion. :)
Are such extensions popular, and if not, why not? I assume there's always some performance hit, but that might not be a big deal in an HTTP client, for example.
CPython also has many vulnerabilities in python rather than C.
It's hilarious reading rust marketers talk about how people should use rust, and yet their software doesn't work as well. It has plenty of bugs.
Then they go on and on about issues which post modern C doesn't have. Guess what? C has a lot of tooling, and yes, it's been improving over the years too. CQual++ exists. AFL exists. QuickCheck exists.
Can your rust project from two years ago even compile? Does it have any users at all?
There's a formally proven C compiler. How's that LLVM swamp going you've built your castle on?
Rust brought a modern knife to a post modern gun fight -- and lost.
> Can your rust project from two years ago even compile?
Post 1.0's release date, which is just short of two years, the vast, vast majority should, yes. We've had one or two soundness fixes in those times that would take a trivial amount of updating to do, but that only hit a very small part of the ecosystem.
Quite recently I happily used libcurl for C++ project rather than any of those C++ wrappers found at github. Granted there is some non-elegance when you adapt C-style error codes to C++ exceptions and non-C++-idiomatic code style right next to any C lib. Yet libcurl is battle tested (AKA proved to be rather bug free) and has nice clean API unlike.
IMHO it might eventually make sense to use other language/tech/whatever but the bar is quite high and it will quite probably take some serious sustained effort.
> The plain fact, that also isn’t really about languages but is about plain old software engineering: translating or rewriting curl into a new language will introduce a lot of bugs. Bugs that we don’t have today.
Don't rewrites, even in the same language usually lead to a better version of the software? I can't really imagine a seasoned C developer introducing completely new bugs in a code base they are already very familiar with
What is everyone using curl for that it needs to be written in C (or Rust?).
If it think about my usage, it's like get or post something and see what the returned json looks like. If I need to download something wget usually works without having to remember -O.
But higher level things like httpie are easier to deal with, sane defaults and all that. Maybe they use libcurl...
Are there any re-write userland in ${safe-high-level-lang} projects?
Maybe curl could be rewritten in C++ step by step like mpd (https://musicpd.org). C++ has RAII for resource management which can help a lot by itself. In my opinion the most hateful thing in C is freeing resources on all exit paths.
Although, curl in C++ - the naming would became inappropriate...
I feel blaming a language for errors is like blaming a gun for killing people.
The fact is, mistakes will happen, but in general if you follow the best practices you'll be fine. Failing to follow the best practices means you could be a better programmer. Just because the language gives you an option to do something, doesn't mean you should.
But in practice, people always make mistakes. Some guns/programming languages limit the damage of those mistakes more than others. All other things being equal, those guns/programming languages are better.
> mistakes will happen, but in general if you follow the best practices you'll be fine
Everyone fails to follow best practices at times, precisely because "mistakes will happen!" Encouraging people to be better programmers can't change that.
Therefore we want languages that catch as many mistakes as possible at build time and minimize the damage caused by other mistakes that slip through.
Maybe they should attempt writing it using the Isabelle/HOL transpilers to C from SEL4 project. I don't care if it is C or machine code as long as the proof of correctness is complete, down to at least C library.
Curl is small enough to make it relatively easy and used widely enough to make it worthwhile.
This has probably been said, in this thread even, but if curl is insecure (for some value of "insecure") then its ubiquity and ease of embedding are a problem rather than a feature. Fuzzy thinking.
Software is almost a perfectly open market. If proponents of rust really think their preferred language is better in every way, they are free to rewrite the world in rust, and see the adoption numbers they get. After all, if rust is better in every way, we'd expect the adoption numbers to go up for their rust OS, with a rust http stack and rust web browser. Right?
Telling others to use their language instead of putting their money where their mouth is is truly what irks me about the rust community the most.
Want a rust world? Go write it and ship it.
Oh, and you don't get to complain about C until your PC runs more rust than C
I'm not sure why I can't complain about problems I have today -- if nobody complained about C ever, I doubt Rust, Go or for that matter, basically any other language project, would have ever gotten started.
It's okay to identify flaws in tools, that's how we make them better. It doesn't make sense to say '[car on fire] you can't complain about that Honda Civic until you develop your own better car, sir, and more people are driving it than not, now please leave the service center -- until then consider the fire normal'.
It is utterly amazing to me to see so many people's attitudes on this issue.
If I cut myself by hasty use of a knife, is it the fault of the knife maker? How is that even remotely rational? If you aren't willing (or don't know how) to use the tool correctly, don't use it.
It's not bad because it's old, it's bad because of how unsafe it is. There are older languages than C that are safer, but they didn't gain the popularity C did.
This is a great little read and encapsulates the other side of the 'rethink the way' trend-ism of some HN new lang advocacy. C is fine, C is good. It is widely understood, it is a systems staple, and it is not dangerous in knowledgeable hands. Rocking the boat is fashionable.
C isn't as dangerous if you really know what you're doing and you never make mistakes. I would bet fewer people know what they're doing than think they know what they're doing, and the set of people who never make mistakes is entirely empty.
Well you know..that's just like your opinion man.
Opting out of this site at this point. A bunch of 6-12 year idiots (recognizably) telling people what they think they know that they don't know. It's silly recursive.
I highly doubt anyone in the Rust community looks at it this way. As a huge Rust fan, I'm also a C fan.
C existed before Rust; before Rust it was my favorite language, though I rarely chose it for work b/c of safety.
The strongest argument in this piece is that Curl is written in the most portable language available. That's a great reason! And it's been a wildly successful project! Kudos to the developer(s)!
The only problem with this piece is the claim on what are logic bugs vs. C bugs; but that should not detract from an excellent project. The real question being debated is this: if starting a new project like this, should you do it in a safe language to guard against security issues, or should you do it in the oldest most portable language ever.
I'd argue that by the time any project becomes successful after being written in Rust, the LLVM (and Rust) will probably close that portability gap.
>C is not the primary reason for our past vulnerabilities
>There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.
So I looked at https://curl.haxx.se/docs/security.html
#61 -> uninitialized random : libcurl's (new) internal function that returns a good 32bit random value was implemented poorly and overwrote the pointer instead of writing the value into the buffer the pointer pointed to.
#60 -> printf floating point buffer overflow
#57 -> cookie injection for other servers : The issue pertains to the function that loads cookies into memory, which reads the specified file into a fixed-size buffer in a line-by-line manner using the fgets() function. If an invocation of fgets() cannot read the whole line into the destination buffer due to it being too small, it truncates the output
This one is arguably not really a failure of C itself, but I'd argue that Rust encourages a more robust error handling through its Options and Results when C tends to abuse "-1" and NULL return types that need careful checking and can't usually be enforced by the compiler.
#55 -> OOB write via unchecked multiplication
Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.
#54 -> Double free in curl_maprintf
#53 -> Double free in krb5 code
#52 -> glob parser write/read out of bound
And I'll stop here, so far 7 out of 11 vulnerabilities would probably have been avoided with a safer language. Looks like the vast majority of these issues wouldn't have been possible in safe Rust.