Curl is C

simias · on March 27, 2017

I have no problem with Curl being written in C (I'll take battle-tested C over experimental Rust) but this point seemed odd to me:

>C is not the primary reason for our past vulnerabilities

>There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.

So I looked at https://curl.haxx.se/docs/security.html

#61 -> uninitialized random : libcurl's (new) internal function that returns a good 32bit random value was implemented poorly and overwrote the pointer instead of writing the value into the buffer the pointer pointed to.

#60 -> printf floating point buffer overflow

#57 -> cookie injection for other servers : The issue pertains to the function that loads cookies into memory, which reads the specified file into a fixed-size buffer in a line-by-line manner using the fgets() function. If an invocation of fgets() cannot read the whole line into the destination buffer due to it being too small, it truncates the output

This one is arguably not really a failure of C itself, but I'd argue that Rust encourages a more robust error handling through its Options and Results when C tends to abuse "-1" and NULL return types that need careful checking and can't usually be enforced by the compiler.

#55 -> OOB write via unchecked multiplication

Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.

#54 -> Double free in curl_maprintf

#53 -> Double free in krb5 code

#52 -> glob parser write/read out of bound

And I'll stop here, so far 7 out of 11 vulnerabilities would probably have been avoided with a safer language. Looks like the vast majority of these issues wouldn't have been possible in safe Rust.

attractivechaos · on March 27, 2017

> Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.

That bug is only triggered by an unrealistic corner case (username longer than 512MB). Run-time check in debug build won't help unless you realize the possibility before hand and put a unit test for that. I am more interested in the second part of your comment: "the OOB wouldn't be possible". How does rust protect against such integer overflow caused by multiplication? Thanks in advance.

jerf · on March 27, 2017

"That bug is only triggered by an unrealistic corner case (username longer than 512MB)."

What makes you call that an "unrealistic" case? You're probably imagining some sort of world where a randomly distributed set of usernames are sent to this function drawn from the distribution of "real usernames that people use". Since 0 of them are longer than 512MB, you intuitively assume a 0 probability of exploit.

But in the security world, that's the wrong distribution. You have to assume a hostile, intelligent adversary. It isn't hard at all to construct a case of some web service allowing you to specify remote resources accessible with a username & password of your choosing (not corresponding to the username you're using to log in), an attacker specifying one of their own resources, reading the incoming headers to notice that you're using a vulnerable version of libcurl, and stuffing 512MB+exploit into the username field of your web app. If you don't add any other size restrictions between the attacker and the libcurl invocation, they may well pass it right in. (And your same intuition will lead you to not put any size restrictions there; "why would anybody have a multi-megabyte username?" You won't even have thought the question explicitly, you just won't put a length restriction on the field because it'll never even cross your mind.) By penetration standards, that doesn't even rise to the level of a challenge.

ambulancechaser · on March 27, 2017

I think you missed the point of the post you are responding to. That person understands the value of the bug, and is just pointing out that Rust runs with unchecked math in production and would therefore be just as vulnerable in production. The benefit would come from it running with checked math in debug mode, but you, as the parent posted noted, would have to have a unit test or an integration test that realized this vulnerability to begin with and tried to exploit it.

The use of "unrealistic" was meant in the sense that you wouldn't think of it, not that you shouldn't guard against it once known.

steveklabnik · on March 27, 2017

> therefore be just as vulnerable in production.

I think there's two components here: yes, this might lead to a _logic bug_, but it should never lead to a _memory safety_ bug. That is, to get that CVE, you need both, and Rust will still protect you from one.

ars · on March 27, 2017

No, by default rust production builds run without memory safety turned on.

Edit: I believe I confused rust with another language.

steveklabnik · on March 28, 2017

What do you mean?

jerf · on March 27, 2017

My point is that it is not an unrealistic case. In a security context, it is perfectly realistic.

castis · on March 27, 2017

I agree that the case is perfectly realistic.

> How does rust protect against such integer overflow caused by multiplication?

To me, the question was whether or not Rust would have been able to natively protect against this, considering it was a runtime issue?

arcticbull · on March 27, 2017

To be fair, I would probably also run my security test suite against a debug binary too if I knew this kind of thing could be caught.

kscz · on March 27, 2017

In Rust overflows generally will cause a panic. So "2 + 2" will return 4 or panic if you're on a 2-bit system (they provide .saturating_add() and .wrapping_add() to get code that will never panic)

Thus, you could have caused a denial of service by crashing a rust-based Curl, but the crash would have been modeled and just uncaught.

_wc0m · on March 27, 2017

To be clear, Rust only panics from integer overflow when in debug mode. In release mode it will silently overflow just as happily as C.

steveklabnik · on March 27, 2017

To be extra extra extra clear:

1. Rust specifies that if overflow happens, it is a "program error", but it is well-defined as two's compliment overflow.

2. Rust specifies that in debug builds, overflow must be checked, and panic if it happens.

In the future, if overflow checking ever has acceptable overhead, this allows us to say that it must always be checked. But for now, you will get a well-formed result.

SiVal · on March 27, 2017

I'm not a Rust user (yet), but I'm a little surprised that with its emphasis on safety Rust doesn't maintain all debug safety checks in production by default. You could then do some profiling and turn off only those few (or one or none) that actually turned out to provide enough real-world, provable benefit to be worth turning off in this specific piece of code.

Since you would turn them off one at a time explicitly, rather than having a whole set of them disappear implicitly, you would probably also tend to have a policy of requiring a special test suite to really push the limits of any specific safety issue before you would allow yourself to turn that one off.

Obviously, if this occurred to me at first glance, it occurred to the designers, who decided to do it the other way after careful consideration, so I'm just asking why.

steveklabnik · on March 27, 2017

Basically, overflow doesn't lead to memory safety in isolation. That's the key of it. The worst that can happen is a logic errors, and we are not trying to stop all of those with the compiler :) Justifying a 20%-100% speed hit (that was cited in the thread) for every integer operation to save something that can't introduce memory safety is a cost we can't afford to pay.

If you want the full details, https://github.com/rust-lang/rfcs/blob/1f5d3a9512ba08390a222... is the RFC, and https://github.com/rust-lang/rfcs/pull/560 was the associated discussion, there is a lot of it.

EDIT: oh, one more thing that may have significance you may or may not have picked up: one reason why under/overflow in C and C++ is dangerous is that certain kinds are undefined behavior. It's well-defined in Rust. Just doing that alone helps, since the optimizer isn't gonna run off and do something unexpected.

amadvance · on March 27, 2017

Modern C compilers also allow to control the integer overflow behavior. For example, with gcc -ftrapv you can make the program to abort on overflow.

simias · on March 27, 2017

I think you can opt-in the checked arithmetic in release mode if you can stomach the performance cost.

Anyway, the buffer overflow itself is:

>If this happens, an undersized output buffer will be allocated, but the full result will be written, thus causing the memory behind the output buffer to be overwritten.

In Rust's case the output buffer would be returned into a dynamically sized type, probably a Vec<>. Attempts to put more data in the Vec than it can hold would either cause a runtime error or cause the Vec<> to reallocate (which could cause a performance issue, or maybe even a DoS if you could cause the system to allocate GBs of memory, but it wouldn't allow access to invalid memory).

So maybe something like:

    /* XXX potential multiplication overflow */
    let mut buf = Vec::with_capacity(insize * 4 / 3 + 4);

    while let Some(input) = get_input() {
        // Reallocs when capacity is exceeded
        buf.extend_from_slice(input);
    }

So even if the multiplication overflow is not caught it's by design impossible to have a buffer overrun in Rust.

anp · on March 27, 2017

By default rust doesn't allow out of bounds indexing (indexing and iteration are checked, although bounds checks are often optimized out), you have to explicitly write unsafe code to read off the end of an array or vector.

caconym_ · on March 27, 2017

Can you elaborate on bounds checks being optimized out? Is it only in certain situations where compiler can prove that they are completely unnecessary? Otherwise, they can't be relied on, so what's the point of having them at all?

steveklabnik · on March 27, 2017

Yes, if the compiler can prove they're not needed, then it can remove the checks. But it has to prove it.

dredmorbius · on March 27, 2017

Reality is unrealistic.

https://medium.com/@blakeross/mr-fart-s-favorite-colors-3177...

hzhou321 · on March 27, 2017

> The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code.

All bugs are logical errors at various scale.

zzzcpan · on March 27, 2017

I think you are being too generous. All of those 11 vulnerabilities were caused by the language, its lack of memory safety, limited expressiveness, poor abstractions it encourages, etc.

simias · on March 27, 2017

That's not completely fair. It's true that there are some issues I've not counted that could be caused by the "limited expressiveness" of C. Notably the latest vulnerability, #62, which is caused by bogus error checking and would probably not have occurred in idiomatic Rust and could easily be blamed on C's terrible error handling ergonomics (or lack thereof).

However for others it's not immediately obvious how a safer language would've helped, for instance #59: Win CE schannel cert wildcard matches too much

This is clearly a logical error due to a badly implemented standard. There's no silvet bullet here.

kazinator · on March 27, 2017

The patch that they put out for that issue combines multiple topics into one.

* A preprocessor #if statement is altered with defined(USE_SCHANNEL)&&defined(_WIN32_WCE) which is OR-ed with the other conditions, so that code that was previously not compiled on WinCE is now potentially defined.

* A local buffer in the code is increased from 128 to 256 characters. A comment in the patch refers to a "buffer overread". So there is a C issue in here!

* In a call that appears to be a Microsoft API function CertGetNameString, a flag argument that was zero is now specified as CERT_NAME_DISABLE_IE4_UTF8_FLAG. Unless I don't understand something in the patch comment, it doesn't appear to be remarking on this at all.

* Code that was taking on the responsibility for doing some matching logic is replaced by something that appears to be using proper API's within curl (Curl_cert_hostcheck).

Why didn't the programmer know about the existing function? One possibility is that it didn't exist yet at the time that code was written. Other such ad hoc matching code may have been refactored to use the matching function; this wasn't found. Or maybe the function did exist, but wasn't well documented. A review process isn't in place that would allow someone to raise a red flag "this should be calling Curl_cert_hostcheck and not itself using string matching at all, let alone be checking for a * wildcard and incrementing over it."

kazinator · on March 27, 2017

How a safer language helps with the non-obvious errors is:

* makes code smaller with less distracting repeated boiler-plate, so things like that stand out more.

* programmers waste less time on fighting language-related ergonomic pains, and so more of their attention is available to spot these errors.

If your language is such that you shout "hooray" and pat yourself on the shoulder when it compiles and the code passes the address sanitizer and Valgrind and whatnot, then actual functional problems will slip under the rug.

Permit me, also, to indulge in some argumentum ad lingua obscura: could we spot in a Brain##### program that some certificate wildcard matches too much? :)

rossy · on March 27, 2017

It's easy to see the advantages of memory safety, since it eliminates an entire class of bugs that C programs tend to suffer from, but I think talking about language "ergonomics" is much more tenuous.

One thing that C programmers tend to love about C is that it's simple. C doesn't have that many language constructs and that makes C programs easier to reason about, read and audit, which also makes it easier to find these non-obvious errors. A C programmer approaching an unfamiliar codebase can feel assured that it is made out of the same concise set of constructs as any other C program. On the other hand, modern languages like Rust, C# and Go are a lot more complicated. They have most of the language constructs of C, plus some extra, so not only do you have to understand the concepts in C, you also have to understand things like ownership, variance or pattern matching. Every new feature increases cognitive burden on the programmer. Adding something like exception handling, for example, means that control flow suddenly becomes more complex. Now, instead of only leaving at return statements, a function has a potential exit point anywhere it calls another function. "Smart" features that encourage terse code and reduce boilerplate can also result in code that is totally obtuse to anyone other than the person who wrote it.

I'm not saying that C is at a sweet-spot for language complexity (Brainfuck is probably too simple, but OTOH it only has eight symbols!) I'm just saying that it's important to understand why a lot of developers really like C, and C is often praised for its simplicity. Any language that intends to replace C needs to understand why C programmers use C.

ssokolow · on March 28, 2017

I'd argue that talking about C's simplicity requires one big caveat... it gains that simplicity by delegating a lot of essential complexity to the underlying platform and the spec leaves a lot of details up to the compiler writers.

While I'm sure experienced C developers will be used to that, one cannot simply read The C Programming Language, set a language reference on their desk, and safely learn how C will behave through experience because of all the unspecified or counter-intuitive things which often don't even trigger compiler warnings with -Weverything.

While Rust may have more concepts to grasp and more grammar to remember, I find it far less of a mental burden to code in for the same reason that OOP proponents trumpet encapsulation... C forces me to constantly double-check that I haven't forgotten some detail of the language or GCC's implementation which is sensible if you understand the low-level implementation, but completely counter to my intuition. Rust allows me to audit the heck out of modules containing unsafe code to make sure the unsafety can't escape, then set it aside to think about another piece of the logic.

Rust also places a stronger emphasis on making it possible to reason locally, bolstered by things like hygienic macros.

Finally, simplicity does not automatically make something intuitive. (That's something I hear quite commonly among HCI people bemoaning how Apple has brilliantly used simplicity to convince iOS users that, if they get stumped by the UI, it's their own fault, not a failing on Apple's part.)

Too · on March 28, 2017

> A C programmer approaching an unfamiliar codebase can feel assured that it is made out of the same concise set of constructs as any other C program.

Plus a shit load of user defined macros to make it look like a modern language.

Every major c repo will have their own macros for foreach, cleanup on exit, logerrorandreturn, etc. Another example is extensive use of attributes like _cleanup_ from gcc to simulate raii, not plain c89 at all.

mcguire · on March 27, 2017

In the future, there will be no bugs because the languages will be so expressive that they cannot be expressed. The only mistakes a programmer can make will be choosingthe wrong language.

astrobe_ · on March 27, 2017

> Looks like the vast majority of these issues wouldn't have been possible in safe Rust.

Is it not where you're supposed to point to re-implementations or equivalents written in Rust?

eddieroger · on March 27, 2017

> Of course that leaves a share of problems that could’ve been avoided if we used another language. Buffer overflows, double frees and out of boundary reads etc, but the bulk of our security problems has not happened due to curl being written in C.

He addressed all of those points in the second short paragraph. None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language. Avoidance of problems in a safer language doesn't mean when things happen, it's the language's fault.

alkonaut · on March 27, 2017

> None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language.

The point of type system features e.g. Option types instead of nulls, or linear/affine types avoiding use-after-free is to make programmer mistakes turn into compiler errors. Nothing more. There is no point in talking about whether or not something is the "languages fault" or not. We know C is like juggling knives. It's a tool. It has drawbacks and benefits. Being widespread and fast are the benefits. Not turning many forms of programmer errors into compiler errors is the drawback. That means the programmer can't make mistakes because they will be shipped. But programmers invariably make mistakes.

The grandparent argued that for the sample of issues he looked at, a lot would in fact be avoided by the type system of e.g. Rust - contradicting the argument in the blog post (could be because of the small sample though).

I think it's perhaps less important to focus on the number of issues of each kind, and instead look at the severity of them. If the kinds of issues avoided by better type systems are typically trivial issues, but the kind of issues coming from logic errors are severe security issues - then perhaps the case for stronger type systems isn't so strong after all. But I doubt that's the case.

321983120931 · on March 27, 2017

You can write Standard ML in C if you so desire and only use tagged unions (which ARE the option types of C).

It is not even that laborious, it just requires some discipline.

Chinjut · on March 27, 2017

Discipline is precisely the thing which shouldn't be required to avoid disaster, if a language is to be well-designed (this is a feature of good design in general).

masklinn · on March 27, 2017

> He addressed all of those points in the second short paragraph. None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language.

They are absolutely the fault of the language, given other languages would have made these bugs impossible.

Errare humanum est is hardly a new concept, blaming humans for not being computers is inane, and in fact qualifies for in errare perseverare diabolicum as out of misplaced pride you persevere in the core original error of using C.

simias · on March 27, 2017

I agree with the gist of your comment but I think it's a bit harsh to say that using C for curl was an error. If I can trust wikipedia the original release of the library was in 1997, at the time it was a perfectly reasonable choice IMO.

I doubt the library would have reached its current adoption levels if it had been written in any other language (and I presume an other C library would've taken its place).

galdosdi · on March 27, 2017

You're speaking past each other. You're talking about "blame" in a moralistic sense, but the other posters just mean simple causation -- "would this fault have occurred if a safer language was used?"

The former sense would be useful... if you want to sue someone or something I guess? But the latter is more useful in real life, so that's what we're discussing

renox · on March 28, 2017

Maybe, maybe not: from Wikipedia the first official validation of GNAT (the Free Ada compiler) was done in 1995. So there were already safer languages available.

pjmlp · on March 27, 2017

These are surely C vulnerabilities and contradict the statement regarding zero policy static analysis errors.

#55 -> OOB write via unchecked multiplication

#54 -> Double free in curl_maprintf

#53 -> Double free in krb5 code

#52 -> glob parser write/read out of bound

the_why_of_y · on March 27, 2017

> These are surely C vulnerabilities

Yes ...

> and contradict the statement regarding zero policy static analysis errors.

Not necessarily - I've found that static analysis tools (Coverity etc.) have limitations, and I'd expect them to find less than half of these kind of bugs - serious fuzzing tends to find more.

The static analysis tool has to work with the type system of the language, and C's type system isn't particularly helpful, so the tool has to find a balance between flagging almost every pointer dereference as "potentially a problem", most of them being false positives, and flagging only the small percentage of actual problems that can be unambiguously proven to be problems with the context available inside a single function definition.

(Just stating generalities, haven't looked at the fixes for these bugs.)

mikulas_florek · on March 27, 2017

Double free is not necessary a C issue, but it can be also a program logic issue - I expect to have an object, but it's already deleted. So it's one of

1. object should not exist and the second free is incorrect 2. object should exist and the first free is incorrect 3. object existance is uncertain and the second free must somehow check that

Although I can double-free only in unsafe languages, the wrong logic behind it can be the same in safe languages. It just have different consequences.

xorblurb · on March 27, 2017

Of course it is a C issue, in the same sense that ALL logic errors are in the case you describe -- so either C issues do not exist or you did not manage to find the correct definition: for ex OOB access is caused by faulty program logic, and the consequences are dramatic in unsafe languages. That is an issue of the language, despite you being able to compute OOB indexes in any language. Same thing for double free; the language issue is that the result is catastrophic, not that you can write for ex faulty logic attempting a liberation too early, or an extra one. (Because in safe languages, the result is not as catastrophic as in unsafe languages).

That is the whole concept of safety.

mikulas_florek · on March 28, 2017

Let's take some safe language, e.g. c#

1. object should not exist and the second free is incorrect

In C# I can have to variables pointing to the same object, I null only one of them. The second should be nulled too, but it's not. That's a logic error. So in C# I end up with some object that should not be there, but it is. Which is better - doube-free or undestroyed object - depends on use case.

pjmlp · on March 28, 2017

The big difference that you are overlooking is that double free leads to memory corruption, with undefined behavior of program execution.

It can crash right away, in a few seconds, minutes, hours later, or never and just keep generating corrupt data.

Having a reference that the GC doesn't collect doesn't lead to memory corruption, just more being used than it should be.

mikulas_florek · on March 28, 2017

> Having a reference that the GC doesn't collect

Using such object is/can be as dangerous.

In fact I find double-free safer because it usually crashes (and in my code I do checks so it almost certainly crashes), while in C# I can happily use such object without knowing it. But as I said, it depends on specific use case.

stymaar · on March 28, 2017

> In fact I find double-free safer because it usually crashes (and in my code I do checks so it almost certainly crashes)

You don't know what an undefined behavior is, do you ? You cannot be sure it crashes since the compiler is allowed to do anything with the assumption it doesn't happen. It's absolutely legit for the compiler to remove all the code you added to check a double-free didn't happen because it is assuming that's dead code.

See this post[1] from the LLVM blog which explains why you can't expect anything when you're triggering an UB.

[1]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

mikulas_florek · on March 28, 2017

I know very well what UB is and I bet there is not a single big program which does not have undefined behaviour. I even rely on UB sometimes, because with well defined set of compilers and systems, it's in reality well defined.

I was talking in general about "unsafe" languages. I use c++ in my projects and use custom allocators everywhere, so there is no problem with UB there. The custom allocators also do the checking against double-free.

xorblurb · on March 28, 2017

What do you mean by checking against double-free? Either you pay a high runtime cost, or use unconventional (and somehow impractical in C++) means (e.g. fancy pointers everywhere with a serial number in them), or you can't check reliably. Standard allocators just don't check reliably, and thus do not provide sufficient safety.

Anyway, double-free was only an example. The point is that a language can, or not, provide safety by itself. Not just allow you to create you own enriched subset that is safer than the base language (because you often are interested in safety of 3rd party components not written in your dialect and local idioms of the language)

In the case of C and C++, they are full of UB, and in the general case UB means you are dead. I find that extremely unfortunate, but this is the reality I have to deal with, so I don't pretend it does not exist...

mikulas_florek · on March 28, 2017

> What do you mean by checking against double-free?

I pay small runtime cost for the check by having guard values around every allocation. At first I wanted to enable it only in debug builds, but I am too lazy to disable it in release builds, so it's there too. Anyway the overhead is small and I do not allocate often during runtime.

> Anyway, double-free was only an example. The point is that a language can, or not, provide safety by itself.

I can write safe code in modern C++ (and probably in C) and I can write unsafe code in e.g. Rust, only difference is which mode is default for the language. On the other hand I have to be prepared to pay the performance (or other) price for safe code.

> In the case of C and C++, they are full of UB, and in the general case UB means you are dead.

I doubt there is a big C or C++ program without UB, does that mean they are all dead? I do not think so.

> I find that extremely unfortunate, but this is the reality I have to deal with, so I don't pretend it does not exist...

I do not like UB in C++ too, but mostly because it does not make sense on platforms I use. On the other hand I can understand that the language can not make such platform-specific assumptions. I can pretend UB does not exist with some restrictions. UB in reality does not mean that the compiler randomly do whatever he wants, it do whatever he wants but consistently. But as I said it twice, it depends on use case. Am I writing for SpaceX or some medical instruments? Probably not a good idead to ignore UB. Am I making writing a new Unreal Engine? Probably not a good idea to worry much about UB, since I would never finish.

xorblurb · on March 29, 2017

> UB in reality does not mean that the compiler randomly do whatever he wants, it do whatever he wants but consistently.

There is nothing consistently consistent about UB. The exact same compiler version can one day transform one particular UB to something, the other day to something else because you changed an unrelated line of code 10 lines under or above, and the day after tomorrow if you change your compiler version or even just any compile option, you get yet another result even when your source code did not changed at all.

EDIT: and I certainly do find extremely unfortunate that compiler authors are choosing to do that to us poor programmers, and that they mostly dropped the other saner interpretation expressively allowed by the standard and practiced by "everybody" 10 years ago; that UB can also be for non portable but well-defined constructs. But, well, compiler authors did that, so let's live with it now.

mikulas_florek · on March 30, 2017

> There is nothing consistently consistent about UB.

Yet, for years I am memmove-ing objects which should not be memmoved. Or using unions the way they should not be used.

> and that they mostly dropped the other saner interpretation expressively allowed by the standard and practiced by "everybody" 10 years ago

Do you have any example?

> that UB can also be for non portable but well-defined constructs.

Do you mean instead of signed integer overflow being UB it should be defined as 2 complement or something like that?

xorblurb · on March 30, 2017

> Yet, for years I am memmove-ing objects which should not be memmoved. Or using unions the way they should not be used.

There can be two cases:

A. you rely on additional guarantee of one (or several) of the language implementation you are using (ex: gcc, clang). Each compiler usually has some. They are explicitly documented, otherwise they do not exist.

B. you rely on undocumented internal details of your compiler implementation, that are subject to change at any time, and just have happened to not have changed for several years.

> Do you have any example?

I'm not sure that compiler did "far" (not just intra-basic-block instruction scheduling) time-traveling constraint propagation on UB 10 or 15 years ago. For sure, some of them do now. This means you should better use fno-delete-null-pointer-checks and all its friends, because that might very well save you completely in practice from some technically UB but not well known by your ordinary programmer colleague - so likely to appear in lots of non-trivial code bases.

Simpler example: behavior of signed integer overflow. (Very?) old compilers simply translated to the most natural thing the target ISA did, so in practice you got 2s complement behavior in tons of cases and tons of programs started to rely on that. You just can't rely on that so widely today without special care.

More concerning is the specification of << and >> operators. On virtually all platforms they should map to shifting instructions that interpret unsigned int a << 32 as either 0 or a (and same thing for a>>0), and so regardless of the behavior (a<<b) | (a>>(32-b)) should do a ROL op. Unfortunately, mainly because some processors do one behavior and others do the other one (for a single shift), the standard specified it as UB. Now in the standard spirit, UB can be the sign something that is non-portable but perfectly well-defined. Unfortunately now that compiler authors have collectively all "lost" (or voluntarily burned) that memo, and are actively trying to trap other programmers and kill all their users, either it is already handled as all other UB in their logic (=> nasal daemons) or it is only an event waiting to happen...

Maybe a last example: out-of-bound object access was expected to reach whatever piece of memory is at the position of the intuitively computed address, in the classical C age. This is not the case anymore. Out-of-bound object access now carry the risk of nasal-daemons invocation, regardless of what you know about your hardware.

Other modern features of compilers also have an impact. People used to assume all kind of safe properties at TU boundaries. Those where never specified in the standard, and they have been dropped through the window with WPO. It is likely that some code-bases have "become" incorrect (become even in practice, given they always have been in theory with the most risky interpretations of the standard, that compiler authors are now unfortunately using)

> Do you mean instead of signed integer overflow being UB it should be defined as 2 complement or something like that?

Maybe (or at least implementation specified). I could be mistaking, but I do not expect even 50% of C/C++ programmers knowing that signed overflow is UB, and what it means precisely on modern implementations. I would even be positively surprised if 20% of them know about that.

And before anybody through them at me:

* I'm not buying the performance argument at least for C, because the original intent of UB certainly was not to be yielded this way, but merely to specify the lowest common denominator of various processors -- its insanely idiotic to not be able to express a ROL today because of that turn of events and the modern brain-fucked interpretation of compiler authors -- and more importantly because I happen to know how modern processors work, and I do not expect stronger and safer guarantees to notably slow down anything)

* I'm not buying the "specify the language you want yourself or shut up" argument either, for two at least reasons: - I also have an opinion about safety features in other aspects of my life, yet I'm not an expert in those area (e.g. seat belt). I am an expert in CS/C/C++ programming/System Programming/etc... and I'm a huge user of compilers, in some case in areas where it can have an impact on people health. Given that perspective, I think any argument to just specify my own language or write my own compiler would just be plain stupid. I expect people actually doing that for a living (or as a main voluntary contributor, etc..) to use their brain and think of the risks they impose on everybody with their idiotic interpretations, because regardless of they want it or I want it or not, C and C++ will continue to be used in critical systems. - The C spec is actually kind of fine, although now that compiler author have proven they can't be trusted with it, I admit it should be fixed at the source. But would have them be more reasonable, the C spec would have been continued to be interpreted like in the classical days, and most UB would merely have been implementation defined or "quasi-implementation defined" (in some cases by defining all kind of details like a typical linear memory map, crashing the program in case of access to unmapped are, etc...) in the sense you are thinking of (mostly deterministic -- at least way more than it unfortunately is today). The current C spec do allow that and my argument would be that doing otherwise (except if the performance price is extremely highly unbearable, but the classical implementations have proven it is not!). So I don't even need to write an other less dangerous spec, they should just stop to write dangerous compilers...

caconym_ · on March 27, 2017

I don't want to jump on the "Rust fixes everything" train, but lifetimes, scope-based destructors and reference counted pointers seem like they help with this sort of thing beyond just preventing literal accesses of freed memory; they can make sure objects are around when you need them and that they're destroyed automatically when you don't anymore.

Of course, you don't get it all for free; you have to wrestle with mutability and lifetimes, which can get hairy.

coldtea · on March 27, 2017

>Avoidance of problems in a safer language doesn't mean when things happen, it's the language's fault.

Actually that's exactly what it means.

That's the only way people mean "It's because of C". That a safer language would have prevented those classes of errors"

That's regardless whether a more careful programmer would also have prevented them in C too.

_ph_ · on March 27, 2017

Catching these kind of programmers mistakes is exactly what distinguishes a language as "safer". Be it because they are ruled out by the type system, memory system, or at least the run-time checks.

pdpi · on March 27, 2017

> None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language.

Would you ever declare a bug to be the language's fault, other than compiler bugs?

trav4225 · on March 27, 2017

Personally, I certainly wouldn't.

irishsultan · on March 27, 2017

A compiler bug isn't the fault of the language though (edit: although on rereading you may not have intended to agree with that part).

pdpi · on March 27, 2017

Yeah, even I only put that in there as a way to handle cases (like Rust) where there is a single implementation of the language, which makes "implementation bugs" and "language bugs" one and the same.

trav4225 · on March 28, 2017

yeah, agreed.

wamaral · on March 27, 2017

If the language does not prevent these mistakes from happening where it would be possible to do so, isn't the language at fault?

zeusk · on March 27, 2017

But you can't have your cake and eat it too. The language does not implement those checks because perhaps they're not so trivially implementable in the language.

Sure, rust is "safer" than C, but is it a better language? that's arguable.

Aaargh20318 · on March 27, 2017

> Sure, rust is "safer" than C, but is it a better language? that's arguable.

Obviously there are more aspects to a language than safety, so I'll give you that, but yes Rust is a better language than C in this aspect.

Don't forget that programming languages are meant for humans, not for computers. One of the primary goals of a programming language is to prevent humans from making dumb mistakes.

macintux · on March 27, 2017

> One of the primary goals of a programming language is to prevent humans from making dumb mistakes.

That clearly wasn't the case when C was designed.

anp · on March 27, 2017

I'd wager that using C prevents large classes of errors that would be common place in assembly.

That said, colloquial history seems to say that there were plenty of languages which did a better job at that than C.

fiedzia · on March 27, 2017

> Avoidance of problems in a safer language doesn't mean when things happen, it's the language's fault.

My definition of fault is "what should be changed to prevent such error". And you are not going to change developers.

staticassertion · on March 28, 2017

" ... a programming language designer should be responsible for the mistakes that are made by the programmers using the language. [...] It's very easy to persuade the customers of your language that everything that goes wrong is their fault and not yours. I rejected that..."

- Tony Hoar

I think this is a particularly solid point by Tony Hoare (made during his talk on 'the billion dollar mistake', null).

We find it very easy to blame developers for mistakes that really shouldn't have been possible to make ta all.

flogic · on March 28, 2017

I think there needs to be a grain of salt with the billion dollar mistake thing. The really dangerous part isn't dereferencing the null so much as what happens afterwards. If you dereference a null and the program terminates, that sucks. But, you debug it and life moves on. If "something" happens that's a whole different much bigger issue. Rust doesn't have null but you can ".unwrap()" an option type. Which may terminate the program and force you to debug. If memory serves, Haskell doesn't enforce exhaustive patterns. So once again, the program terminates and you're stuck debugging. That's really just life.

Now, I do in general prefer languages "without" null as they generally have more warning about when you can actually encounter null. Though my personal experience says that in sensibly written programs dealing with null isn't that big a deal. Is it a billion dollar mistake? I would have little trouble believing that. However, that would likely make dynamic typing an order of magnitude or two larger in the mistake dept.

staticassertion · on March 28, 2017

The quote wasn't really about null - that's just the talk he was giving. It was about checked array indexing.

I would also say there really is no comparison between null, which escapes typechecking, and unwrap, which does not. But that's not my point, nor is it Hoare's. It's that we blame people for problems that are better solved by languages.

ameliaquining · on March 27, 2017

I'm kind of torn on this.

On the one hand, Curl is a great piece of software with a better security record than most, the engineering choices it's made thus far have served it just fine, and its developers quite reasonably view rewriting it as risky and unnecessary.

On the other hand, the state of internet security is really terrible, and the only way it'll ever get fixed is if we somehow get to the point where writing networking code in a non-memory-safe language is considered professional malpractice. Because it should be; reliably not introducing memory corruption bugs without a compiler checking your work is a higher standard than programmers can realistically be held to, and in networking code such bugs often have immediate and dramatic security consequences. We need to somehow create a culture where serious programmers don't try to do this, the same way serious programmers don't write in BASIC or use tarball backups as version control. That so much existing high-profile networking software is written in C makes this a lot harder, because everyone thinks "well all those projects do it so it must be okay".

marcosdumay · on March 27, 2017

At extreme cases, where you want to prove your network stack, you'd have to write it in C (or something equivalently dangerous), because you can't prove big runtimes.

But in mid-way applications, dangerous languages are dangerous, and should be avoided.

I'd think the most "correct" way to handle this is to create small bits of network code with proof of correctness that handles down parsed and tagged (typed) data into code in high level languages. In that case, Curl type of code would still be in C.

pohl · on March 27, 2017

...you'd have to write it in C (or something equivalently dangerous), because you can't prove big runtimes...

In order for this argument to hold, we would have to live in a world where no one has created a safe language with no runtime.

We don't live in such a world, fortunately.

marcosdumay · on March 27, 2017

Wait, is there a high level language (not assembly) with no runtime out there?

I don't think we are talking about the same thing at all.

pjmlp · on March 27, 2017

Modula-2, Oberon, Ada, Turbo Pascal, Free Pascal, Object Pascal, Quick Basic, Oberon-2, Active Oberon, Oberon-07, CLU, System C#, Modula-2+, Modula-3, D, Cyclone, ATS, Mesa, Algol, PL/I, PL/M,...

steveklabnik · on March 27, 2017

They're making an oblique reference to Rust; it has the same level of runtime that C does.

Many people say "no runtime" to mean "small runtime like C or C++ have" rather than literally no runtime. This is due to the confusion around interpreters sometimes being called "runtimes."

marcosdumay · on March 27, 2017

I suspect it was something about compiled vs. interpreted.

The Rust runtime is larger than C's (I imagine it's not much larger than C++, but this is already way too large). AFAIK, nobody ever proved it works as designed. The fact that it's design is not very stable is also a problem.

Now, with some more thought about it, C itself is not a great target for correctness proofs. Too much is left unspecified, and there is little agreement over some of the lower level stuff. A simplified version of Rust might be a better target than C, if somebody ever creates it, but a lot of the safety would have to be left out of it.

Manishearth · on March 27, 2017

IIRC Rust by default brings in a few things (panic handling is the main one, and also jemalloc) that make its runtime larger than C (but not larger than C++), but they can be disabled with compile flags. Many folks do, for embedded stuff.

Rust also lets you write a custom runtime by specifying a start function if you want.

> The fact that it's design is not very stable is also a problem.

Rust is stable.

steveklabnik · on March 27, 2017

> The Rust runtime is larger than C's (I imagine it's not much larger than C++, but this is already way too large).

I believe that it is smaller than C++'s but larger than C's, but I'm not a mega expert here. I used to know where those files were located in-tree, but don't anymore.

I think there's also several distinctions you can draw here; for example, strictly speaking, almost no runtime is required, but if you're using say, the standard library, you still might end up with a bunch of code. The actual requirements are very small though.

> AFAIK, nobody ever proved it works as designed. The fact that it's design is not very stable is also a problem.

Sorry, I don't understand what you mean here. The runtime?

> A simplified version of Rust might be a better target than C, if somebody ever creates it, but a lot of the safety would have to be left out of it.

I _think_ with the above comments, you mean the Rust langauge overall, not the runtime, right? So, Rust _is_ very stable; post 1.0 we have made extremely small breaking changes, corresponding to the way new C++ standards sometimes introduce minor technical backwards incompatibilities. As for correctness proofs, there are multiple academic institutions working on it; and they don't have to "leave the safety out"; that's what they're trying to prove!

Manishearth · on March 27, 2017

https://github.com/rust-lang/rust/blob/1ca100d0428985f916eea... is the "runtime". sys::thread contains all the TLS stuff (C has TLS too, so it's unclear if this is any more of a runtime). catch_unwind is for toplevel panic handling (which can be nop'd with panic=abort).

Finally, we link to jemalloc, which is an additional bit of runtime of its own (which can be turned off with alloc_system)

steveklabnik · on March 27, 2017

Ah right, it's in src\libstd\rt.rs, not src\rt, ha!

Manishearth · on March 27, 2017

It used to be!

There's also compiler-rt which does some of the eh stuff for panics and whatnot.

bluejekyll · on March 27, 2017

With Rust this tradeoff of safe-with-runtime vs. unsafe-without-runtime is no longer your only choice. Now you can choose safe-with-strong-compiler-no-runtime...

pjmlp · on March 27, 2017

This tradeoff wasn't already an issue with languages like Modula and Ada.

staticassertion · on March 27, 2017

As far as I am aware Ada is not memory safe without a runtime and with the presence of non-static memory. So you either have a runtime, no deallocation, or unsafety.

pjmlp · on March 27, 2017

C also needs a runtime, it is called crt0 on UNIX and it is what calls main(), sets up global initialization functions (constructor functions in GCC C), emulates floating point,...

As for Ada, you can selectively disable which parts of the runtime actually land on the generated executable via compiler pragmas.

Deallocation can be done via pools or controlled types.

staticassertion · on March 27, 2017

OK, I should have said garbage collector, not runtime.

bquinlan · on March 27, 2017

Are you claiming that Module-2 is memory safe?

What about:

  MODULE Boom

  VAR Foo : POINTER TO INTEGER;

  BEGIN
    Foo^ = 123;
  END Boom.

pjmlp · on March 27, 2017

If I remember correctly that is a compiler error, because Foo is not initialized, so no boom.

But if you still want to get your point through and have a boom.

  MODULE Boom;
  IMPORT SYSTEM;

  VAR Foo : POINTER TO INTEGER;

  BEGIN
    Foo := SYSTEM.CAST(POINTER TO INTEGER, 43414);
    Foo^ := 123;
  END Boom.

Notice the use of IMPORT SYSTEM and SYSTEM.CAST, explicit, easy to search for, and to forbid via compiler switch (no unsafe code).

bluejekyll · on March 27, 2017

Yeah, I realized after posting that this would be a critique. I should have stipulated that it's a C syntactic language, which seems to be many peoples preference based on the most successful languages out there.

I never did much with Ada or Modula, so I won't claim to know anything more than their syntax, etc.

AstralStorm · on March 27, 2017

Rust still has no support for true formal analysis. It is therefore pointless to use it for high safety. Use a real theorem checker instead.

Now Rust as a target for compiling proofs into a programming language is pretty nice. Safe code can have no aliasing due to borrow checker (which should also be formally verified). Threading issues are vastly simplified, though fairness still has to be verified, as well as a few other properties.

bluejekyll · on March 27, 2017

Could this be used on the LLVM intermediate code?

http://www.cis.upenn.edu/~stevez/vellvm/

AstralStorm · on March 28, 2017

Good find. Likely yes.

0xFFC · on March 28, 2017

look at this : https://math.mit.edu/research/highschool/primes/materials/20...

baldfat · on March 27, 2017

> the only way it'll ever get fixed is if we somehow get to the point where writing networking code in a non-memory-safe language is considered professional malpractice.

So using Linux, Windows, BSD or MacOS servers are malpractice? I think you might have over stated your case. So are you waiting for a memory safe Herd re-write? A memory safe any OS will be decades away if someone wanted to start tackling it now.

pjmlp · on March 27, 2017

Microsoft does research how to improve security at OS level and sometimes those efforts do end up on Windows.

Latest examples, Windows 10 secure kernel and Device Driver protection.

https://myignite.microsoft.com/sessions/36925

Or the new Windows USB stack, written in the P language.

https://github.com/p-org/P

UNIXes, not so much beyond patching C exploits.

sqeaky · on March 27, 2017

There are lawsuits open about windows 10 destroying user data. Windows of all flavors has given me nothing but pain. Microsoft is not the benchmark of quality software. Most of their real innovations are in the business model space.

pjmlp · on March 27, 2017

At least people know who to sue when Windows destroys data, not sure who to sue when distribution X does it.

sqeaky · on March 28, 2017

In what system can you sue when something some came at no cost, no warranty (express or implied) and in a good faith effort to help everyone at the personal expense of the developer?

If that system exists, it sounds disgusting and unethical. Paying for something changes the nature of the transaction.

baldfat · on March 28, 2017

When can people sue Apple for HFS+ issues. Those must be multiple of times higher then anything caused by Windows 10?

madphrodite · on March 27, 2017

And the last windows 7 update black screened my pc and left me in limbo till I was able to restore. What exactly is your point? Windows is not a standard to be held higher than..well anything.

dsacco · on March 27, 2017

The parent commenter used Windows 10 as an example and you used Windows...7. His point, which is that Microsoft actively researches and implements OS-level security improvements, hasn't been rebutted by your statement about Windows 7.

Windows 7 was released in 2009. There are substantial proactive improvements that Microsoft cannot feasibly backport to older OS versions even if they're still within the support period, and the corporate culture was simply different when Windows 7 was released.

petee · on March 27, 2017

Counter-point: Windows 10 updates locked up 2 of my machines - Microsoft has a tough time implementing most things properly...

madphrodite · on March 29, 2017

Substantial proactive improvements not relevant to the same build|OS which comes full circle to the linux comment which is so hated. That is: anyone with source and knowledge can make it work and those without...blackscreen.

madphrodite · on March 27, 2017

Did you know that I can still run, patch and support linux distributions written in 2009 if I want to? Did you know that?

baldfat · on March 27, 2017

I think the vast majority of people would say Windows has been getting much better in security since the days of XP. People can choose Linux, BSD, MacOS or Windows and have a good choice that is reasonably secure.

kps · on March 27, 2017

Of course they're overstated their case. Needs differ. When I wrote networking code a couple jobs ago it was in assembler, because a new frame will be along in 67ns whether you're ready or not.

freddref · on March 27, 2017

writing, not using.

fiedzia · on March 27, 2017

> So using Linux, Windows, BSD or MacOS servers are malpractice?

If you sell it as "secure", yes absolutely. And there are many applications where you would not be even considered as a contractor for doing so. There is increasing number of situations where you want to provide hard guarantees (written in contract) about quality of the software and service you are providing. "we are using Linux" sometimes is not good enough.

Corrado · on March 28, 2017

I get the feeling that very few old school authors would come forward and say "Hey, my project code is not good and needs to be re-written." Most of the time the message is more like "Sure we're using C but we're very careful and have built in a lot of safeguards."

I like your idea of a default mindset of internet tools should be written in a safe language with proper engineering techniques. I'm not sure I would go as far as malpractice but it might be a good stick to use to force makers into better practices.

greenhouse_gas · on March 27, 2017

Small history of security:

First major security breech through buffer -overflow was in the late 80s.

So when Java came out, they played it "safe" - the language, despite having pointers will be absolutely safe. NullPointerExceptions and ArrayOutOfBoundsException will cause the program to crash rather than corrupting the stack.

Perfect.

Except it wasn't. It ended up being so "holy" that it's now banned in browsers.

So, everyone said to move to JS. Another "perfectly safe" language.

But it's too slow.

So JIT it.

Now it's no longer "perfectly safe".

Rinse and repeat.

And rust won't help here, because while the " compiler " can be guaranteed safe, the code it outputs can't (think of a C compiler written in Rust).

Maybe the solution isn't to rely on language (except for the Kernel) but to make of easy to spawn OS processes that simply have no rights to call any syscalls and limited amount of memory (or a white-listed amount of syscalls).

Take it like this:

Firefox (the browser) has full rights. It starts a process (which can only connect to the network to IP RemoteHost).

If process dies (for whatever reason) or takes too long, tell user that "sorry, sites broken".

Now, malicious code causes the attacker to run arbitrary code? Who cares? You can't overwrite the browser's code and can't break out.

The browser just has to ensure that its subprocess gives you good output.

Same with JS, CSS, or image libraries.

doomrobo · on March 27, 2017

You're conflating the issues of running untrusted code with running trusted code on untrusted input. JS is untrusted code that you run on your computer. Your other examples generally do not fall into this camp.

Java being an allegedly safe server-side language has nothing to do with it being a very difficult language to sandbox when running untrusted code. It doesn't matter what language the malware I just downloaded was written in, because it's malware and it's running on my computer. What makes Java different from C in this case is that when I run a Java application on my server, I can be more confident in the fact that I will not have a buffer overflow, even when I expose the service to the entire internet.

greenhouse_gas · on March 27, 2017

>You're conflating the issues of running untrusted code with running trusted code on untrusted input. JS is untrusted code that you run on your computer. Your other examples generally do not fall into this camp.

No. The issue is that in C (or "release (not debug) rust" (someone here left a comment that rust programs in release mode don't check for array overflows)) untrusted input leads to untrusted code (through buffer-overflow).

So you can compromise a computer through a bug in imagemagick.

>Java being an allegedly safe server-side language has nothing to do with it being a very difficult language to sandbox when running untrusted code

Java was (in the 90s) advertised as safe client-side, hence applets.

steveklabnik · on March 27, 2017

No, release Rust does not do this. That is, the integer may overflow, but the buffer will not, unless you specifically go out of your way to use the unsafe unchecked access functions.

arcticbull · on March 27, 2017

That's actually how many apps on macOS have worked for years. Anything that ships in the Mac App Store runs in a sandbox with limited privileges. In fact, every instance of an Open/Save panel on macOS run in a separate process to the main application and relay their visual appearance using XPC and NSRemoteView. It's pretty cool stuff!

ssokolow · on March 28, 2017

That's also the direction Linux is heading with Flatpak and Flatpak portals.

I'm excited because there will finally be a reliable way for me to apply my preferred Open/Save dialogs to KDE and GTK+ apps alike. (It's being implemented at the toolkit level and you can run an app in Flatpak with sandboxing set to permit-all to get the portal hooking.)

vidarh · on March 27, 2017

> Maybe the solution isn't to rely on language (except for the Kernel) but to make of easy to spawn OS processes that simply have no rights to call any syscalls and limited amount of memory (or a white-listed amount of syscalls).

We have that [1] [2]. It's cumbersome to use, but it's based on the bpf JIT used for tcddump etc.

We also have a variety of other mechanisms you can combine with that to tie down a process. But even then it's hard to get this right in a way that's both performant and secure, because you have to protect every user every time, while for an attacker a very low success rate can still be worth it.

[1] https://www.kernel.org/doc/Documentation/prctl/seccomp_filte...

[2] https://wiki.mozilla.org/Security/Sandbox/Seccomp

misterdata · on March 27, 2017

This is ignoring the fact that sensitive data may actually be inside the (vulnerable, locked-down) process. Or checks inside the process may in some way be influenced (remember register_globals in PHP? This happens even when you execute PHP in a locked-down container).

A safe language greatly helps to prevent the sort of bug that compromises within-process security.

rwmj · on March 27, 2017

While this doesn't so much apply to libcurl (but see below), there is a third alternative to "write everything in C" or "write everything in <some other safer language>". That is: use a safer language to generate C code.

End users, even those compiling from source, will still only need a C compiler. Only developers need to install the safer language (even Curl developers must install valgrind to run the full tests).

Where can you use generated code?

- For non-C language bindings (this could apply to the Curl project, but libcurl is a bit unusual in that it doesn't include other bindings, they are supplied by third parties).

- To describe the API and generate header files, function prototypes, and wrappers.

- To enforce type checking on API parameters (eg. all the CURL_EASY_... options could be described in the generator and then that can be turned into some kind of type checking code).

- Any other time you want a single source of truth in your codebase.

We use a generator (written in OCaml, generating mostly C) successfully in two projects: https://github.com/libguestfs/libguestfs/tree/master/generat... https://github.com/libguestfs/hivex/tree/master/generator

KuiN · on March 27, 2017

> generate C code.

Programmatically generating C code not without problems. How can you prove that the C you're generating is free from problems solved by the safer language? Cloudbleed came from computer generated C code: https://blog.cloudflare.com/incident-report-on-memory-leak-c....

patrec · on March 27, 2017

No, it didn't.

See quote from the author of Ragel in the comments:

There is no mistake in ragel generated code. What happened was that you turned on EOF actions without appropriate testing. The original author most certainly never intended for that. He/She would have known it would require extensive testing. Legacy code needs to be tested heavily after changes. It should have been left alone.

PLEASE PLEASE PLEASE take some time to ensure the media doesn't print things like this. It's going to destroy me. You guys have most certainly benefitted from my hard work over the years. Please don't kill my reputation!

tannhaeuser · on March 27, 2017

+1

And I'd like to add that what made this a catastrophic error was that different requests were served in the same address space, rather than using address space isolation as in process-per-request/fork() architectures of old. For years now many network daemon programs have been written in an event-based, single-address space style, but I have never seen the alleged process creation overhead quantified (except for maybe multi-threaded programs). Even OpenBSD's httpd disses eg. CGIs as "slowcgi" (when you'd expect the OpenBSD developers take pride in the fact that their httpd uses ASLR etc. features of the O/S rather than inventing their own ad-hoc mechanisms to defeat deterministic memory allocation in user space, and would take the opportunity to tune O/S process creation). I don't have facts to share either, I'm just puzzled that we're re-inventing O/S mechanisms in user space with performance arguments without backing this up by numbers (or are there any?).

deong · on March 27, 2017

Well, the general point still applies. The bug occurred using code that was written in a safe language and compiled to C. It's just that there are multiple ways for that to go wrong. The generator could have had a bug -- it's software, so it almost certainly does. Or, as in this case, the user didn't use it correctly. Either way, the idea that you can write code in a safe language and compile to C to eliminate the type of bugs that C allows isn't true.

Are such errors less likely? Possibly so, but they're not categorically eliminated. It becomes a risk assessment exercise rather than a simple thing that everyone should do. Note that it also opens the door to Java-style problems, where once the generator becomes ubiquitous, it becomes the most valuable target for exploit-hunting because a vulnerability in the generator gets the keys to all the houses.

mbel · on March 27, 2017

You are arguing that no language X is safer than writing program manually in Y when program in X is compiled to Y. Because compiler from X to Y may have bugs.

Therefore no code written in Rust (X) executed on x86 CPU (Y) is safer than manually written x86 assemby, because Rust compiler (and LLVM) may have errors.

And well, we can actually go deeper. There is CPU frontend that is generating micro code, which may have bugs. There is also CPU backend which is executing micro code, which also may have bugs. All in all there is no hope in programming. There might be bugs everywhere so you can never be sure what your program does.

deong · on March 27, 2017

That's not what I'm saying. I'm saying "rewrite it in Rust (or whatever)" isn't some silver bullet that fixes security problems. It's always about assessing risk -- both risk of security issues as well as risk of upsetting your users, etc. Basically exactly what the article says.

mbel · on March 27, 2017

> Either way, the idea that you can write code in a safe language and compile to C to eliminate the type of bugs that C allows isn't true.

Is a bit different statement than:

> I'm saying "rewrite it in Rust (or whatever)" isn't some silver bullet that fixes security problems.

The first one is wrong, the second one is true.

Using a higher level language rules out some classes of programming errors which are possible in lower level languages. The fact that compilers have bugs does little to diminish those gains.

Semantics of Haskell does not allow to express program that generates double free [0]. Perhaps one of the compilers will compile some Haskell code to binary that frees memory twice. However, this bug in compiler is far more less likely that a programmer making this mistake in C. Whats more when this bug in compiler is detected and fixed. The problem can be fixed in all affected code bases without need to change the original source code. Thus chances of bugs are lower.

Nobody really argues that Rust (or OCaml, or Haskell, or whatever) is a silver bullet, i.e. solution to all problems that will miraculously make programmers produce no bugs at all. Obviously we will have software bugs even with most restrictive languages. No amount of formal proofs will save us form misunderstanding specifications or making typos. And then again we will also have bugs in implementation of those high level abstractions.

And for the record I am really annoyed with movement to rewrite everything in Rust.

[0] Yes, you can call free through FFI with whatever arguments you like, as many times as you like. But for sake of brevity let's assume this is not how you write your everyday Haskell.

AstralStorm · on March 28, 2017

The hope is writing a formal description of required architecture functionality (formal proof) and then validating the proof. Not 100% safe against non deterministic issues or very complex but good against most others.

mbel · on March 27, 2017

So no code is safe? All code before execution has to be lowered to some evil, unsafe language, most commonly the assembly language of targeted CPU.

The mystical process of "programmatically generating code" in also known as compilation. The case you are describing is a compiler bug. The compiler wasn't able to generate target code (in this case C code) with semantics and/or guarantees of the source language.

patrec · on March 27, 2017

More generally, I don't understand this argument. Assuming you can trust the C compiler (big if, but at least some validated (large subset of) C compilers exist; see CompCert), I don't get why this would be worse then generating machine code in a safe language.

humanrebar · on March 27, 2017

Generating C code that (waving hands here) generates machine code is more complex than just generating machine code.

mbel · on March 27, 2017

This is simply not true. C in this case is just an intermediate representation of the source program. Going through multiple intermediate representation of the source code is fairly standard practice when compiling anything. If anything it is easier to target C than directly generate target CPU assembly, because of the high level nature of C (you finish the compilation earlier, without last couple of lowering steps).

ssokolow · on March 28, 2017

You're forgetting the elephant in the room: undefined behaviour.

Sure, you can target one compiler and be sure you'll be generating the desired machine instructions, but it can be much more difficult to ensure that your code will produce safe machine code when compiled with all possible C compilers, and the techniques used may result in a slower end result.

If you go straight from a high-level language to a compiler IR, you have a much lower risk of having to choose between either underspecifying your invariants or overspecifying them at the expense of performance.

TL;DR: C wasn't designed as a compiler IR and that complicates things.

patrec · on March 28, 2017

I agree with that. It is tempting to top it by saying C wasn't designed to do anything well and that has complicated things over the last 45 years. On the other hand it's not like there has been a traditional wealth of wonderful ready-made IRs with cross-architecture backends for your high level language to chose from either, so I'm still not convinced that compiling to C is harder than to do compile to machine code yourself, especially in the common case where you don't have to get the last ounce of possible performance out.

humanrebar · on March 27, 2017

Well, we can agree to disagree about this, but in my experience third party tools (like helpful debugging symbols) in particular suffer when there are extra intermediate languages. Extra metadata needs to be passed through more layers of abstraction.

And as a human I have had the same issues acting as a meat-implemented debugger. I had to drill through more layers to figure out why low level things happened.

mbel · on March 27, 2017

Of course metadata is lost if not encoded anywhere on the way. The argument was made regarding code generation being more complex when code is saved on the intermediate level.

humanrebar · on March 27, 2017

> if not encoded anywhere

You're understating the problem a bit. There's no standard way to mark up C code as mapping back to the original source code so that metadata (source lines, memory mapping back to data structures) can be passed on to the compiled binaries. If the original language generated DWARF-encoded objects, then debuggers would just work, etc.

mbel · on March 27, 2017

Compiling X to C and then C to assembly is not more complex that compiling X straight to assembly. In your orignal comment you wrote that the complexity of such setup is bigger, to which I responded: no, not really.

Yes, C was not designed to be intermediate compilation step and this yields losses of some information (e.g. debugging metadata, but also some semantics of source language may get lost). I never argued with that. I never said that this is a perfect setup that doesn't introduce any new problems. I just said that compiling to C is very close to what is actually happening inside the compiler targeting assembly from higher level language.

humanrebar · on March 27, 2017

You just have a narrower scope of what counts as complexity. Mine includes things that complicate humans and debuggers understanding and analyzing the ultimate bytecode.

The techniques and difficulty in implementing the compiler itself are related but not really the same subject.

rwmj · on March 27, 2017

No we cannot prove that. However it is still better than the "write it in C" option because once you fix a bug in the generator, it's fixed in all current and future generated code. In other words, we no longer make the same mistakes over and over again.

fiedzia · on March 27, 2017

> How can you prove that the C you're generating is free from problems solved by the safer language?

By formal verification. There are ways to do so and several verified compilers already exist.

mushiake · on March 27, 2017

FFTW[0] is also written like that (generator written in OCaml emitting C).

[0]http://www.fftw.org/

chii · on March 27, 2017

> generate C code.

how is that different from just writing it in another language? End users who need to compile will be able to regardless of the generated C code, but the end users who need to do a _little_ modification will be given ugly generated C code! Seems stictly worse to me...

rwmj · on March 27, 2017

In the libguestfs generator (first link above) the generated C code is required to be completely readable. It must look like it was written by hand (albeit by a programmer who is impossibly consistent and perfect). So reading the generated C code is fine. Modifying the generated code is of course not fine except for tiny test hacks, but we also include in the generated code comments reflecting where in the generator the code comes from.

IncRnd · on March 27, 2017

I've created a number of code generators in my projects. Invariably, developers say exactly what you just wrote, "how do I modify the generated code"?

The answer is not to modify the generated code. Modify the input to the code generator to make changes.

Even when I output a warning to this effect, that all modifications to target code will get overwritten, not to check the target code into version control, the source code is already checked into version control - invariably developers modify the target code right under the comment that says not to, then they check it into version control. They then wonder why there are bugs, and their modified target code no longer works after the target code gets regenerated after the next build.

deong · on March 27, 2017

It's almost as though you can't solve the problem of programmers making errors by having a different set of programmers whom you tell to not make errors.

IncRnd · on March 27, 2017

The issue had nothing to do with programmers.

The impetus wasn't that programmers make errors but to solve the problem of repeatability. Many instances of issues can be solved once. There is no need to recreate the solution a number of times if it is already solved.

A code generator allows one to focus on the actual meta-problem, which is often smaller and easier to solve.

fiedzia · on March 27, 2017

The difference is that you don't need compiler for this language. There are many hardware platforms that only come with C compiler.

tyingq · on March 27, 2017

For something like curl, where the library is as popular as the command line tool, preserving the C ABI compatibility is probably the strongest reason.

simias · on March 27, 2017

Rust could expose a C ABI while keeping safe internals. The interface itself would be unsafe of course. There are a few things that rust doesn't handle natively (like varargs functions IIRC) but other than that you could probably write a Rurl that would be completely backward compatible with Curl.

steveklabnik · on March 27, 2017

To be clear, Rust itself does not have varargs, but can handle them with the C ABI.

Manishearth · on March 27, 2017

Well, we can call into vararg functions, but not define them.

Since vararg functions have the same ABI as the function with only one of the vararg one idea I've always had is to write a macro that lets you write a one-arg function and have it desugar via asm hax.

tyingq · on March 27, 2017

Yes...the post above me was talking about Ocaml. Similar arguments for not redoing curl in Go.

wtetzner · on March 29, 2017

You do still need a compiler for the language. It's just that the target language is C, instead of assembly.

ndesaulniers · on March 27, 2017

I wonder how cloudflare feels about that? Ragel

tannhaeuser · on March 27, 2017

Not only is curl based on C, but so are operating systems, IP stacks and network software, drivers, databases, Unix userland tools, web servers, mail servers, parts of web browsers and other network clients, language runtimes and libs of higher-level languages, compilers and almost all other infrastructure software we use daily.

I know there's a sentiment here on HN against C (as evidenced by bitter comments whenever a new project dares to choose C) but I wish there'd be a more constructive approach, acknowledging the issue isn't so much new software but the large collection of existing (mostly F/OSS) software not going to be rewritten in eg. Rust or some (lets face it) esoteric/niche FP language. Even for new projects, the choice of programming language isn't clear at all if you value integration and maintainability aspects.

wyldfire · on March 27, 2017

> I know there's a sentiment here on HN against C

I think there's two major against-C groups: those of us who have worked with C for decades and those who never worked with it. I'll try and speak for those of us who've used it for decades. The popular high-level languages that have arrived since ~1995 (Java, Python, JS, C# and friends) are excellent productivity increases. In general, they sacrifice memory and performance in favor of robustness and security. For enormous software problem domains, we just don't need C's complexity or error-proneness.

Until Rust, there's been very close to zero serious competitors for C if I wanted to write a bootloader, OS, or ISR. Not even C++ could do those (without being extremely creative on how it's built/used). The ~post-2000 languages (golang, swift, D etc) can't do that (perhaps D's an exception but it wasn't an initial goal AFAICT). This is huge, IMO.

We've groaned and grumbled about how hard it is to parse C/C++ code for decades. This is a big deal for tooling. Because of the language's design, even if you use something "simple" like libclang to parse your code, you still have to reproduce the entire build context just to sanely make an AST. All of those other new languages above probably address this problem but also add all kinds of other stuff which we can't have for specialized problem domains (realtime/low-latency requirements, OSs, etc).

> collection of ... software not going to be rewritten in eg. Rust or some (lets face it) esoteric/niche FP language

IMO it's not appropriate to lump Rust in with "nice FP language"s. And don't look now but lots of stuff is being rewritten in Rust. Fundamental this-is-the-OS-at-its-root stuff: coreutils [1], "libc" [2], kernels [3], browser engines [4].

[1] https://github.com/uutils/coreutils

[2] https://github.com/japaric/steed

[3] https://github.com/redox-os

[4] https://github.com/servo/servo

tannhaeuser · on March 27, 2017

> IMO it's not appropriate to lump Rust in with "nice FP language"s.

Maybe I should have expressed it better, but I didn't intend to lump these together.

>And don't look now but lots of stuff is being rewritten in Rust.

I'm myself cautiously optimistic re Rust, but having been burnt by C++ in the past I'm not enthusiastic about fighting language idiosyncrasies (though modern C++ certainly deserves a second look). Then there's the issue (some might argue it's a plus) that Rust is at the same time a language, a lib, and the only compiler implementation (unlike C or C++ which give you choice).

cat199 · on March 27, 2017

> Then there's the issue (some might argue it's a plus) that Rust is at the same time a language, a lib, and the only compiler implementation

This was essentially the case for C and C++ as well for somewhere around 5-15 years a piece, depending on how you measure it.

nwmcsween · on March 27, 2017

The rust coreutils is an excellent example of the issues of having such a mess of abstractions, the resulting binaries are literally magnitude larger than busybox equivalents.

wyldfire · on March 27, 2017

> issues of having such a mess of abstractions, the resulting binaries are literally magnitude larger

They're significantly larger, yes -- it's a fair complaint of rust. But it's mostly because of static linkage AFAIK [1] and not "a mess of abstractions".

[1] https://github.com/uutils/coreutils/issues/747

nwmcsween · on March 27, 2017

And what do you think pulls in all the symbols for static linking? Maybe the mess of abstractions perhaps?

ssokolow · on March 28, 2017

Actually, the culprit is Rust's decision to statically link its standard library and all its dependencies by default.

Things like libunwind, libbacktrace, embedded debugging symbols for backtraces, and the jemalloc allocator aren't free.

If you ask for dynamic linkage (with the caveat that Rust doesn't have a stable ABI yet), you get a ~8K Hello World binary.

It's also possible to prune down the statically-linked size by opting out of various conveniences like jemalloc. (They're working toward making the system allocator default but don't want to regress Servo in the interim.)

...and if opt into static linking with GCC and G++ (and ask Rust to make its link to libc static), Rust can actually outdo them on a Hello World.

Here's a detailed exploration: https://lifthrasiir.github.io/rustlog/why-is-a-rust-executab...

nwmcsween · on March 28, 2017

> Actually, the culprit is Rust's decision to statically link its standard library and all its dependencies by default.

No it really isn't, static linking does not imply bloat as commonly perpetuated.

> It's also possible to prune down the statically-linked size by opting out of various conveniences like jemalloc

Try this: opt out of everything except the standard library, create something somewhat trivial and idiomatic in both rust and c, compile and see what you get.

> Rust can actually outdo them on a Hello World.

Hello word is hardly a use of the standard library.

ssokolow · on March 28, 2017

> No it really isn't, static linking does not imply bloat as commonly perpetuated.

I never said it implied bloat. I said that, if you ask Rust to link dynamically despite the lack of a stable ABI, you'll get binaries of a size similar to C and C++.

> Try this: opt out of everything except the standard library, create something somewhat trivial and idiomatic in both rust and c, compile and see what you get.

I'll need you to be a bit more specific than "somewhat trivial", given that "Hello world" uses println! or printf() but you consider it ineligible.

> Hello word is hardly a use of the standard library.

println! aside, it's a data point and that's all I meant by it.

Manishearth · on March 27, 2017

Could you expand on why you think Rust is a serious C competitor for OS/ISR/bootloaders but not C++? This statement intrigued me.

I thought C++ had naked functions and all the things you need to write an OS.

Rusky · on March 27, 2017

Not wyldfire, and I think that claim is a mischaracterization, but the main obstacle to using C++ in the kernel is that some of its language features require runtime support (new/delete, globals/statics with constructors, exceptions).

You can of course just ignore those when writing kernel code- they get ignored in application code much of the time! But I suppose at that point it could be argued that you're just writing C with a C++ compiler?

Manishearth · on March 27, 2017

I mean, if you're writing a kernel in Rust you have the same issue. In that case you'd use no_std, which takes away the part of the stdlib that depends on allocation and such (also threads and other niceties).

You can lose new/delete and .bss statics and still write reasonable, even "safe", C++. Rust doesn't have .bss statics by design (lazy_static emulates this for you though). new isn't necessary for the "modern" C++ safety stuff and you can write pretty good modern C++ without new. All new gets you is a nice wrapper around allocation, and when writing a kernel you can't and shouldn't allocate anyway. In Rust, too, you would not be allocating, either via memmap/malloc or via Box::new().

So it wouldn't be "C with a C++ compiler", it would be "C++ without allocations", which is a restriction from the problem statement anyway.

0xFFC · on March 28, 2017

I don't get it. AFAIK you can implement all of those things in your abstraction, and then use it like canonical C++. I think you are wrong. Please correct me.

kyberias · on March 27, 2017

https://en.wikipedia.org/wiki/Symbian

pjmlp · on March 28, 2017

Add BeOS, OS X drivers (IO Kit) and OS / 400 after they started rewriting the PL/M code, to that list.

wyldfire · on March 27, 2017

I might have to walk that back. It seemed to me that no_std was "more straightforward" and/or "more formalized" than "#pragma interrupt" (etc). But I could be wrong there -- if so, mea culpa (the post is no longer editable).

Rusky · on March 27, 2017

Rustc did recently get a "x86-interrupt" calling convention, but that's unrelated to #[no_std], and only works on x86. Either way, "#pragma interrupt" should work just as well in C++ as in C, since C++ doesn't really change any aspect of the language that matters there.

Further, even in C I rarely see use of "#pragma interrupt"-like tools- rather, everyone still seems just to use per-platform assembly glue code. (To be fair, my experience is mostly in kernel code for things like Linux, rather than standalone embedded applications where "#pragma interrupt" would be more valuable.)

Manishearth · on March 27, 2017

no_std is more formalized, though C++ enforces the same thing by failing to link if you try using malloc (or whatever) when writing a kernel. no_std also means that it's very easy to tell if a crate works without the stdlib, so you can use code from the ecosystem instead of rolling your own.

Ultimately the Rust OSes resort to some handwritten assembly as well. I think that's going to be a constant of writing a kernel. Rust is working to minimize it (e.g. with things like `extern "x86-interrupt" fn`), but at a kernel level there are just some kernel specific asm instructions (like all of the TLB stuff) that either compiler will probably never support generating without inline asm.

So while Rust may be better than C++ at writing OSes (I'm not sure! I haven't looked at all the stuff you need to write an OS in C++), I do think they're in the same ballpark, close enough that if Rust is a "serious" competitor C++ probably is too :)

zeveb · on March 27, 2017

> Until Rust, there's been very close to zero serious competitors for C if I wanted to write a bootloader, OS, or ISR … We've groaned and grumbled about how hard it is to parse C/C++ code for decades.

I honestly think that Common Lisp can do this quite well. It was designed to be a high-level language, but it's completely capable of working at the machine level, pleasantly and easily. Unlike C, most of the time one has safety, but one can disable safety when necessary with a simple (declare (safety 0))).

Performance is extremely good with modern compilers, although I don't know how good they would have been back in the old days.

And it makes parsing the language dead simple. Here's the entire reader algorithm: http://clhs.lisp.se/Body/02_b.htm

From what I can tell of Rust, it doesn't look easier to parse than C (but I've not looked deeply); certainly, it's orders of magnitude more difficult to parse than Lisp.

I believe that Standard ML or OCaml could do similar things as well, albeit at the cost of being more difficult to parse. Smalltalk is maybe a little less capable, but somewhat easier to parse.

They've all been around for decades.

Manishearth · on March 27, 2017

Yes, it's harder than lisp, but it's still much easier than C. C and C++ have issues due to ambiguities that make them context-sensitive. C++ has it worse because parsing is dependent on typechecking because of templates.

Rust is not 100% context-free, but the feature that is non-context-free (raw strings, a rarely used feature) is still pretty easy to parse, and even if you capped it at 6-level raw strings you'd probably be able to parse all the Rust code out there.

wyldfire · on March 27, 2017

> I honestly think that Common Lisp can do this quite well

I haven't used any lisp dialects for decades, so I have naive questions: is there really sufficient support from compilers+linkers to write a bootloader in lisp? Do I have to do a lot of bootstrapping in assembly to bring up lisp interpreter before I can execute the lisp code or does the ahead-of-time-build result in executable machine code? Can I do inline assembly (not required but a really key benefit IMO)? Are there numerous examples where someone's already written one in lisp?

tjalfi · on March 27, 2017

https://github.com/dym/movitz is a Common Lisp system that runs on bare metal x86. The source code is quite readable.

The rest of this post is an excerpt from an email I sent 6 years ago.

The following comments on runtime systems are partially based on a long c.l.l thread with posts by Lucid, Symbolics, and Franz alumni.

Franz uses a 3-layer approach: CL, a low-level Lisp, and C.

Lucid started with Lisp that generated assembler but reluctantly added some C.

Symbolics Lisp Machines used bootstrap code in a Pascal-level language with prefix syntax. A Symbolics alum said that in retrospect they should have used C.

Most Lisp implementations have subprimitives - low-level functions that can circumvent the type system, often with a prefix such as % or :.

Assembly language integration dates to Lisp 1.5 and there are several common approaches.

1. turn the optimizer off - this is easy to use and implement.

2. optimize the assembler block - Naughty Dog GOAL did this.

3. annotate the assembler block with pragmas that indicate side effects. This can be error prone and difficult to use. https://www.pvk.ca/Blog/2014/08/16/how-to-define-new-intrins... is an example of this approach.

Edited to add the SBCL intrinsic link.

kazinator · on March 27, 2017

3 sounds like where GCC got its inline asm concept from: annotate the assembly with what are the inputs and output operands with constraints (do they have to be certain kinds of registers), and whether anything has surprising side effects.

sanxiyn · on March 27, 2017

For an example of operating system kernel completely written in Lisp, see https://github.com/froggey/Mezzano

smitherfield · on March 27, 2017

Parsing is IMO of all the complaints you could make about C/++ rather bikesheddy. Parsing is a solved problem. Modern compilers can parse millions of LoC per second. And most of the specific parsing-related complaints (pointer dereference or multiplication?) about C/++ are also true of Rust. (Edit: Nope, brain fart on my part, see below). And, AFAIK, all C/++ parsing is well-defined, if counterintuitive in certain edge cases.

steveklabnik · on March 27, 2017

> most of the specific parsing-related complaints (pointer dereference or multiplication?) about C/++ are also true of Rust.

This should not be true, and we fought hard to keep it that way. There's one spot of Rust's grammar that's context-sensitive, for something very rarely used, and other than that, it's all much simpler.

smitherfield · on March 27, 2017

You're right. AFAIK types and identifiers are always unambiguous in Rust. I was thinking visually (same operator) instead of in terms of specification and implementation. Shows me to make flippant comments from the toilet!

My larger point is that there are plenty of very good reasons to criticize C/++, and parsing is a minor one since parsing is fast, and even if the creation of the AST isn't context-sensitive, verifying its correctness (is this identifier in scope?) still is.

What is the context-sensitive spot in Rust's grammar?

wyldfire · on March 27, 2017

> plenty of very good reasons to criticize C/++, and parsing is a minor one

Ok, fair bit, it's a frustration for me but admittedly not as important as the other differences.

I mention it because it's a wart in C's language design and I figured Rust's safety features are already well-known and heavily discussed. If I want to write a simple tool "ask this tree of .c files how often they use an identifier with name 'X' or type 'Y'", I have to find out the include paths, defines, all kinds of other "noise" just to find out what could be a relatively simple query of the source base.

Manishearth · on March 27, 2017

This also means that autocomplete tools usually need to be taught how to build a project. YCM has this whole conf file where you specify the header locations and stuff and it's like rewriting half the makefile.

steveklabnik · on March 27, 2017

Fair enough :)

https://news.ycombinator.com/item?id=13915595

madphrodite · on March 27, 2017

No competent C programmer is going to switch to rust based on the projects you've just mentioned. The comments in the commits speak for themselves.

wyldfire · on March 27, 2017

Please, connect the dots for me. An initial skim of the commit messages did not yield any egregious "Utter Disregard for Git Commit History" [1]. Even if it did, it may just mean that the maintainer is focused more on results and robustness than preserving a pristine history of the project.

[1] https://zachholman.com/posts/git-commit-history/

madphrodite · on March 29, 2017

Don't think that was where I was going. It was more of the none of this stuff is really done, it is hard in this language and help would be appreciated..as understood through the project splash page and then examined through commits?

mundanevoice · on March 27, 2017

I hope Rust doesn't face the same fate as other ambitious projects by Mozilla. Rust has a quite unusual syntax compare to any other systems programming language. Also, there is a big learning curve. Keeping all the benefits aside, I really hoped Rust had a simpler syntax. I really think, one day a language will borrow the good parts of Rust with a simpler syntax and get ahead of it. Rust in its current form will never be as successful as C/C++.

naasking · on March 27, 2017

> Rust has a quite unusual syntax compare to any other systems programming language.

In what way? It's very C-like in its basic syntactic structure, just more regular.

mundanevoice · on March 28, 2017

Well, I was not talking about basic syntactic structure. For eg: take a look at this https://github.com/servo/servo/blob/master/components/conste...

I am not against Rust. Rust has some great ideas and intent. I just feel they should have created simpler syntax. A more complex and unusual syntax doesn't have any real benefits IMO.

naasking · on March 28, 2017

You still haven't pointed out what syntax is problematic exactly. I've never programmed in Rust, but I don't have any trouble reading it coming from a C and C# background.

What exactly is the problem with the example you linked to?

pcwalton · on March 28, 2017

What's the issue with that example?

naasking · on March 28, 2017

Not having programmed in Rust, the fact that Rust requires all type parameters to be used, thus ruling out proper phantom types, was semantically surprising to me. But I don't understand what syntactic issue the other poster had with that example.

steveklabnik · on March 28, 2017

That's what https://doc.rust-lang.org/stable/std/marker/struct.PhantomDa... is for!

naasking · on March 28, 2017

Yes I get it, it just seems unnecessary to require every type parameter to be used. Do you have a link to the discussion motivating this choice?

steveklabnik · on March 28, 2017

In my understanding, it has to do with variance. This happened a very long time ago, before 1.0, and so I don't know where the discussion happened, off the top of my head.

naasking · on March 28, 2017

Found it: https://github.com/rust-lang/rfcs/blob/master/text/0738-vari...

It still seems bizarre to me that a purely type-level expression is forced by an effectively non-existent term. That RFC specifically states that the main problem is that the results of variance inference are largely erased by assuming invariance. That seems like a sensible default for unused type parameters too.

It seems from the conclusion of that post that PhantomData only survived because this was the smallest change they had to make to get this all to work better, and because some of this PhantomData could be used for other analyses in the compiler (although it's not clear if better type information could have replaced these uses anyway).

pcwalton · on March 27, 2017

Can you elaborate as to the specific syntax problems you see with Rust?

mundanevoice · on March 28, 2017

This is from servo codebase - https://github.com/servo/servo/blob/master/components/domobj...

Take a look at this. It isn't as simple and clear as C/C++ for example and requires significant investment in parsing and understanding the code.