I have talked to GCC developers about integrating a Rust frontend and they said, the main blocker is the fact that the language specification isn't stable and a fast moving target.
As long as the Rust language specification isn't stable, GCC's Rust implementation would just fall behind all the time.
As an alternative, it has been suggested providing a GCC frontend for the LLVM-IR language which is generated by the Rust compiler since that specification is stable.
If Rust's specification becomes stable at some point, it would probably a good idea to start a Bountysource campaign (like we recently did for the m68k [1] and AVR backends [2]) to help the Rust frontend brushed up and ready for being merged into the GCC source tree.
I do not think that LLVM-IR is stable either, so I do not think that is a good option. I think this project does the right thing by using the Mir which while not stable either should be more stable than Rust and not any worse than LLVM-IR, while retaining more high level information (e.g. generating LLVM-IR loses MIR's knowledge of restrict).
I think the instability the GCC devs are concerned about is the release schedule. Rust has 9 releases per year while GCC has 4-5 releases. Unless you update your compiler quickly, you won't be able to use much of the crates.io ecosystem as crates are quick at requiring a new compiler version. Often the implementation of a new language feature is still getting last-minute fixes up to 6 weeks before the release, and sometimes even after that beta period.
Compare that to C/C++ which has new releases every 3 years. A compiler that only supports C++14 is still perfectly usable for most C++ codebases out there. In fact, Godot engine still hasn't migrated to C++11 yet, and it's no exception from the norm at all. Even if your compiler has parity with Rust 1.31, released one year ago, you'll have trouble with most projects as even if the project itself doesn't depend on newer compiler releases, one of the transitive dependencies will.
I could be wrong since I am not a Rust developer but I do not think the MIR changes 9 times per year. I would imagine most releases do not touch the MIR these days.
I think we might be talking past each other, because I never claimed otherwise. My point was that LLVM-IR is not a solution to anything becase LLVM-IR would have the same problem, LLVM has a separate release schedule from gcc. I think using the MIR is the best alternative which does not require writing and maintaining a separate frontend from rustc.
I thought your reply was referring to the idea mentioned by cbmuser to have a rust frontend in the proper GCC project.
Because then you are bound by the release schedule of GCC. Of course one can think about separate projects like the gcc-rust one where you can have whatever release schedule you want. But then it'd be a separate project building on GCC and not a part of the larger GCC project itself.
LLVM having a separate release schedule from rustc isn't a problem. rustc can miss out an LLVM release without problem. A rustc 1.35.0 compiled with LLVM 5 can compile a rustc 1.36.0 compiled with LLVM 7. However, GCC can't miss out releases. The rustc 1.36.0 frontend needs at least a 1.35.0 frontend to compile itself. And most Rust programs in the ecosystem work with rustcs compiled with older LLVM releases, but most Rust programs that have dependencies do need newer rustc releases.
As for stability of MIR: currently I think MIR is serialized to disk in a glorified mmap way. Basically following the Rust memory representation. It's great if both the creator of the MIR as well as the part that reads the MIR are written in Rust. Furthermore, there are libraries about how memory layout should look like that are provided to codegen backends. Those libraries are written in Rust and not really usable outside of Rust. So currently unless someone serializes MIR using e.g. bincode and provides C bindings for those layout libraries, there are good reasons to write the codegen backend in Rust itself, at least the part that translates MIR to the next stage.
In this fashion, LLVM IR is different as it allows a multitude of languages to communicate with it.
Considering that it's more difficult to install upstream GCC than it is to install upstream rustc, there's ample opportunity for the distro snapshot that users would see to fall behind.
Ah, interesting. But can this implementation actually be built for architectures not supported by LLVM? I looked at the build instructions and the first build step involves building parts with the standard Rust compiler.
I also just realized that it's not the same backend as the one written by "redbrain" here [1].
gcc-rust currently needs LLVM to build because that's the only good way to build Rust code, and gcc-rust (unlike, say, mrustc) uses Rust code. Once gcc-rust is on track, Rust code used by gcc-rust can be built by gcc-rust, and LLVM becomes unnecessary.
Sounds very good. I'll keep an eye on the project. I would love to see such a frontend merged into GCC upstream and I would definitely shell out some money for a Bountysource campaign to support the effort.
Interesting. This is the first I've heard of Cranelift being used as anything other than a code generator for wasm runtimes. When would Cranelift be used instead of LLVM, and vice versa?
Were you able to get a sense of what counts as stable in this case? Is it about the rate at which new features are added, or more about keeping old code working?
I believe all old Rust code is kept 100% compatible with compiler updates. It's more about new features being added to the language at a rapid clip compared to languages like C++.
That will slow down I'm sure, but the Rust developers and users are aware of a number of things which are still holding the language back and need to be designed and implemented in the next few years. Big functionality stuff and usability stuff. But also a lot of work on the tooling and ecosystems which won't have anything to do with any language changes.
But I would expect in a few more years we'll see the rate of Rust language change will be significantly lower.
The title is somewhat misleading as to the scope of this project, as this looks to be a project for adding GCC backend support to the existing Rust compiler (though this isn't a criticism, that's exactly the approach that I would take!).
The (simplified) way that rustc works is that it translates Rust source code into a custom intermediate language called MIR, does a bunch of analysis and a few optimizations, and at the very end translates MIR to LLVM IR and passes that into LLVM. This project looks to add support for translating MIR to GCC's own IR instead of LLVM IR, allowing it to reuse all the other work that has been done writing rustc.
For a from-scratch reimplementation of (most of) a Rust frontend, see mrustc ( https://github.com/thepowersgang/mrustc ), which is written in C++, and served to demonstrate that (a prior version of) rustc did not feature a reflections-on-trusting-trust attack (via diverse double compilation).
Lots of things are "in progress" (including rust support for gcc code generation, heh). But gcc has production-ready, mature backends for a ton of architectures that LLVM doesn't. It's a real feature.
Although note the Cranelift backend has quite different goals from GCC and LLVM, being optimized for compiler speed over runtime performance, and is very unlikely to support many of the obscure backends that GCC can target.
Not likely, as the community is trying to get Rust to get ISO 26262. If they changed the compiler and they will have a whole lot more work to get it certified.
GCC supports several targets like alpha, hppa, ia64, m68k, sh, v850 and vax (and more) that the LLVM-based Rust does not support because LLVM lacks support for these targets.
* Alpha is more than a decade past end of life.
* Hppa is EOL as of 6 years ago.
* ia64 goes EOL next month.
.. and the list goes on. These aren't going concerns outside of some hobbyist work, and GCC has had difficulty retaining maintainers for some of the hardware and has threatened deprecation of them (in fact IA-64 is facing this exact situation and is going to be marked deprecated in GCC 10, and vanish in 11 https://gcc.gnu.org/ml/gcc/2019-06/msg00125.html). With instruction set support in that state, the odds are reasonable that bugs will be exposed, and difficulty will be had trying to support and fix them (exasperated by scarcity of hardware)
Adding support for obsolete hardware is not much of a good argument for taking on the work of supporting two compilers.
There are some back-end changes coming to GCC, and no one had ported m68k to it (presumably there is no maintainer for m68k in GCC?) The back-end code the m68k code was relying upon was going away, and it needed ported to use the modern one. Someone has stepped up and done that work, so it gets to live another day.
You mean like LLVM and GCC? Both compilers for C and C++ (and a lot more with various front ends). I can only imagine they're both maintained because people enjoy working on them and find them interesting projects to develop. When I find something interesting to work on, I certainly don't care that someone else is doing something similar somewhere else.
LLVM got where it is primarily because of GCC's licensing (and because GCC's design is deliberately compromised to support that licensing). It's unlikely that LLVM would have been maintained otherwise, and I would expect GCC to fall by the wayside in due course.
I would expect GCC to fall by the wayside in due course
This currently shows no sign of happening. What do you expect to change such that this happens?
It's unlikely that LLVM would have been maintained otherwise
Again, why? You're making this assertion as if it's fact, but there are a huge number of maintained pieces of software that do the same task as other maintained pieces of software. A great many pieces of software exist that didn't even get started until after similar software was already established. Do you have any logic behind your thinking beyond the fact that they're two pieces of software that do similar tasks?
I think your parent is exaggerating, but they do have a point. I can't point you to a concrete example off the top of my head, but there is quite a bit of anti-GPL sentiment in (parts of) the LLVM community. Apple I think is somewhat invested in spreading FUD about the GPL. I once spoke to some LLVM developers from Apple who said that they didn't believe they were allowed to look at the assembly code emitted by GCC, for copyright reasons. Which is strange since GCC explicitly does not claim copyright on its output.
I think your parent is exaggerating, but they do have a point. I can't point you to a concrete example off the top of my head, but there is quite a bit of anti-GPL sentiment in (parts of) the LLVM community.
That's why the LLVM people won't switch over to GCC, sure. But what do the GCC people think of it? If they're content to keep working on GCC, what's going to change such that they want to stop? I could imagine that if the LLVM group gets orders of magnitude ahead (seems unlikely - thus far, they seem to be mostly keeping up with each other in terms of performance and so on - but it could happen), GCC might start to be seen as a niche piece of legacy software and start to head towards retirement. Any other obvious pathways for the end of GCC?
Sorry, I should clarify: I think the OP had a point about licensing being a big factor in LLVM's uptake and especially its adoption by the likes of Apple and Google. I don't agree with the OP that this implies that GCC will be abandoned.
> This currently shows no sign of happening. What do you expect to change such that this happens?
Just natural turnover. I expect GCC to struggle to attract new developers. When it was clearly best open-source optimizer there was a certain cachet to it, but now all the advantages are with LLVM.
> Do you have any logic behind your thinking beyond the fact that they're two pieces of software that do similar tasks?
Apple in particular are documented as having tested the limits of GCC's licensing before funding work on LLVM; I believe other corporate contributors have made comments along the same lines.
As far as I remember mrustc is not a full rust compiler/frontend, I.e. it is only usable with rust code which had be "proven" to be valid/correct rust by another compiler so you can't really use it for development. It's still useful for bootstrapping rustc as it's a rust compiler not written in rust. It probably can also be useful to compiler rust for some exotic hardware architecture which had no llvm support.
That's correct. Moreover, it's only attempting to compile rustc. Which means it could be completely broken for some other existing rust code. It is not recommended to use it for anything else than bootstrapping rustc (but this is already an incredible achievement).
It seems like it would be easier to make a WebAssembly front end rather than MIR. Just compile rust to WASM and use GCC compile to unusual instruction sets. There might be a small performance hit, but the WASM frontend could be likely be upstreamed into GCC.
WASM is designed for a particular sandbox environment, which currently is 32-bit-only. AFAIK WASM has its own calling conventions and sections, so recompilation back to something that works with platform-specific C would involve difficult guesswork or glue code.
MIR is Rust's accurate low-level representation with much more type information, and it's aware of native target's specification, so it can be optimized better, and has better interoperability with native code.
I agree with you that there would be glue code and I don't doubt your right that MIR can be optimized more.
But rust in the general case is agnostic about 32 vs 64 bit pointers and explicitly targets WASM.
I'm not familiar with GCC's IR, but unsandboxed AOT WASM compiled thru LLVM IR is astoundingly fast.
> This average benchmark has speed in microseconds and is compiled using GCC -O3 –march=native on WSL. “We usually see 75% native speed with sandboxing and 95% without. The C++ benchmark is actually run twice – we use the second run, after the cache has had time to warm up. Turning on fastmath for both inNative and GCC makes both go faster, but the relative speed stays the same”, the official website reads.
> “The only reason we haven’t already gotten to 99% native speed is because WebAssembly’s 32-bit integer indexes break LLVM’s vectorization due to pointer aliasing”, the WebAssembly researcher mentions. Once fixed-width SIMD instructions are added, native WebAssembly will close the gap entirely, as this vectorization analysis will have happened before the WebAssembly compilation step.
It's important for core infrastructure to have multiple competing implementations. On a related note, does Rust have a standard yet or are they still doing the reference implementation thing?
> I was under the impression that llvm is better than gcc?
And I thought that tabs were better than spaces, BSD beat Linux, Emacs was the one true god... what were we arguing about again?
My impression is that gcc produces smaller code that is roughly comparable with clang in terms of runtime performance (with a slight advantage when compiling the codebase I care about the most). Gcc has better debug info, especially when optimizing. I don't know about compile speeds. Clang has better infrastructure for writing static analysis tools. Clang is a much more realistic alternative to msvc on windows than gcc is. I don't know about their development velocities. Clang seems to have the edge in mindshare.
I'd be curious to know whether this would provide cross-language outlining during LTO using gcc. I believe some form of this is possible with llvm?
My understanding is that LTO (and thus any cross-language inlining) takes place using a low level IR where language barriers aren't relevant. The GCC and LLVM backends both have full support as far as I know. Hypothetically it's simple to implement support in a given frontend, but apparently it proved to be a bit tricky in practice for Rust (http://blog.llvm.org/2019/09/closing-gap-cross-language-lto-...).
I don't think Rust is even defined by a reference implementation, given that they release a new compiler every six weeks.
For many practical purposes I think the closest thing to a language definition is the set of testsuites visible to Crater.
(That is: when the compiler people are considering a change, they don't say "we can't change this because we're past 1.0 and the change is technically backwards-incompatible", or "we can't change this because the Reference specifies the current behaviour"; they say "let's do a Crater run and see if anything breaks".)
This is not correct. We do use crater to help with questionable cases, but we often say “we can’t change this because we’re past 1.0 and the change is backwards incompatible.”
Which is still not the same as having a spec or even a reference implementation.
It’s a bit weird how laser-focused the Rust community is on backwards compatibility, not seeming to believe that forward compatibility is also important.
e.g., if I write code targeting C++17, I can be reasonably sure it compiles with an older version of the compiler, as long as that version also claims to support C++17, modulo bugs. Not the case if I write code targeting Rust 2015 as they’re still adding features to that from time to time. Let alone Rust 2018 which changes every 6 weeks.
Will there ever be a version of Rust that the community agrees “OK, this language definition is not changing unless we find some major soundness bug” ?
This is a big blocker for mainstream adoption in Linux distributions since the maintainer wants to be able to land one specific version of rustc in the repositories, not rely on people downloading new versions with rustup continuously. But old versions of rustc are effectively useless due to the lack of forward compatibility guarantees.
It's funny you cite C++, which has the best example of forward compatibility breakage in terms of impacting people.
g++ 4.4 implemented several key parts of C++11, including notably rvalue references, and adapted libstdc++ to use rvalue references in C++11 mode. However, the committee had to make major revisions to rvalue references subsequent to this implementation, to the point that you can't use libstdc++ 4.4's header files (such as <vector>) with a compliant C++11 compiler. So when you try to use newer clang (which prefers to use system libstdc++ for ABI reasons) on systems with 4.4 installed (because conservative IT departments), the result is lots and lots of pain.
Furthermore, it absolutely is the case that newer versions of compilers will interpret old language standards differently than older versions of the compiler. You don't notice it for the most part because the changes tend to revolve around pretty obscure language wordings involving combinations of features that most people won't hit. Compilers are going to try hard not to care about language versions past the frontend of the compiler--if the specification change lies entirely in the middle or backend, then that change is likely to be retroactively applied to past versions because otherwise the plumbing necessary is too difficult.
g++ 4.4 came out before the C++11 standard was released. So I’m not sure how it’s a counterexample. C++11 was a standard under active development and so obviously you can expect changes; its status at that time was comparable to rust nightly.
It's not that Rust community ignores forward compatibility, it's just that right now is not the right time. Things recently landing on Rust is not new stuffs. Most of them were designed in, like, 2017. It just took years to stabilize.
When is the right time, then? I would have thought the move to the 2018 Edition should have been the perfect time to declare the 2015 Edition as stable and unchanging, no? But it is still receiving major changes like non-lexical lifetimes.
Editions are not designed to be unchanging snapshots of the Rust compiler at a specific moment in time. By design, all Rust editions share the same middle-end and backend, and only differ in the frontiest parts of the frontend. The idea is that people should be able to update their version of the compiler without being required to update code that compiled prior to the 2018 edition.
Yes, I understand that. I'm saying I don't understand why the Rust community has made that choice, since it essentially makes old versions of the compiler useless and therefore makes it difficult for software written in Rust to be part of a typical Linux distribution.
(And has a number of other disadvantages too, like constant cognitive load having to re-learn the language every 6 weeks).
Also, editions could be a snapshot of the language definition at a point in time, without being a snapshot of the compiler. There are still new versions of Clang and GCC coming out with new bugfixes, better optimizations, improved error messages, support for different hardware, WIP support for future language editions, etc., without changing the C++17 standard.
> I don't understand why the Rust community has made that choice, since it essentially makes old versions of the compiler useless
I mention the reason in my prior comment: to allow people to continually upgrade their compiler version without needing to change any code. Rust doesn't have a stable ABI, so all crates in a Rust project ultimately need to be built with the same compiler (and furthermore, crates must always be able to interoperate regardless of which edition they're on). That means that every new version of the compiler needs to support every old edition, because the alternative is to have users stuck on old versions of not just the compiler but also on old versions of dependencies that have since begun using features only supported by newer compilers. In Rust's case avoiding such a fundamental fracture in the community was more important, since, after all, there's still nothing stopping anyone from voluntarily sticking with an older version of the compiler if they're willing to deliberately endure such a situation.
> (And has a number of other disadvantages too, like constant cognitive load having to re-learn the language every 6 weeks).
This is quite hyperbolic. Rust introduces no more features than any other language, it simply rolls them out on a more fine-grained schedule. Furthermore, Rust hardly requires re-learning every six weeks; a "feature" introduced by a new version is often nothing more than a new convenience method in the standard library. The fact that we have established that Rust goes out of its way even to keep "old" code compiling and compatible with the rest of the ecosystem should demonstrate how little it demands that users re-learn anything.
How come, given that currently C++ has three years to release a new standard and then about the same until the latest release gets spread around all major compilers?
Right now with C++20 around the corner, C++14 is still the safest bet for portable code, whereas in Rust we still see relevant crates that depend on nightly.
Here is the list of all the major features added to Rust since 1.0:
* The new lifetime model (non-lexical lifetimes)
* Async fn
* Procedural macros
* ? operator
* Import name resolution changes
* impl Trait
* C-style unions
Here's the list of features added in C++17 and C++20:
* constexpr if
* Modules
* Structured binding
* Type deduction helpers
* Coroutines
* <=> operator
* Concepts
* Expanded the set of expressions and statements that qualify as constexpr to the point that it's a very different feature from what it was in C++11.
In the same amount of time, C++ has added roughly the same number of features, but I would qualitatively say that C++'s feature additions are more impactful than Rust's feature additions, especially in terms of making newer code unrecognizable to programmers used only to the old version.
That's what I meant by pace versus cadence--overall, C++ has changed more, but it tends to change in triennial bursts instead of every six weeks.
I write standard compliant C++ code that only works with the latest compiler versions because older ones incorrectly claim support for C++ standard, but their implementation is too buggy.
Which is the main reason why no usable alternative implementations exist yet and why Rust hasn't found its way to more low-level software projects like the Linux plumberland yet.
Rust dearly needs a stable specification, it is the main blocker why the language hasn't been more widely adopted.
I agree. That, and no need to rush new features, stabilize old ones and fix their bugs. I am still waiting for RFC0066[1] to be fixed. It is from 2014-05-04! Here is its GitHub Issue: https://github.com/rust-lang/rust/issues/15023. Backstory: I started writing a relatively simple Rust program many years ago when I ran into this issue. It was my first attempt at writing Rust, and I did not like the workaround.
Rust has LTS like releases based on years, but the last one didnt have async await so everyone just kept tracking. I think the nexts LTS type release may snag some people and slow stuff down. Cargo should handle that fine, but not sure about maintainers. I cant get cargo in my day job, just the rustc 1.3x shipped with rhel, so I'm already standing still. Rhel's version migh make a good defacto lts in the absence of cargo, but the thin std lib makes that hard, and random old rustc versions sometimes don't play nice with every crate.
Rust doesn't have LTS releases. Editions are both a way to market the new features introduced in the past years as a "bundle" to outside users, and a way for us to introduce some breaking changes for the users opting into the new edition. The release an edition is stabilized in (1.31 for Rust 2018) does not get any special treatment from the Rust team though, and that includes no LTS support (for example we won't backport bug fixes, even security ones).
Yikes! I had no idea. It takes my company years sometimes to approve a new point release of software. Rhel subscription was my backdoor to get rust at work. I knew I'd never get a cargo access or a mirror approved, but i thought some day I'd push for an lts release. I don't see my company ever doing sw approval and transfer to our development network faster than once a year.
As I pointed out in another comment, the definition of the 2015 edition is still changing (i.e., features from 2018 are getting backported to 2015), severely limiting the usefulness of the "edition" concept.
E.g., if someone thinks "I'm going to target 2015 because I want my code to run on the rustc shipped with various slow-moving Linux distros", it doesn't help, because you might still not be able to target their code, unless they specifically target an older version of rustc, which nobody does.
Editions solve an entirely separate problem, they were never meant to be LTS language snapshots. For example, C++ is considering adopting them in addition to their current versioning scheme: http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1881r0....
There has been discussion of a Rust LTS channel alongside stable/beta/nightly, which would try to solve that problem, but it has not been prioritized yet: https://github.com/rust-lang/rfcs/pull/2483
An actual frozen language is also a possibility, but probably won't happen until more work happens on an independent specification. Which, in fact, people are also working on: https://ferrous-systems.com/blog/sealed-rust-the-pitch/
I would say that rust doesn’t have a formal specification, what it does have is close to or better than many languages “specs” with the “Rust Reference”: https://doc.rust-lang.org/stable/reference/
It really depends on how strictly you define the term specification. The Rust Reference is not required to be accurate. Though many other language compilers/implementations don’t fully implement their respective specs so, :shrug:.
The Rust Reference is very far from being complete, or even correct in what it does cover.
If I have a question whose answer isn't obvious, it's far more likely that I have to go trawling around in RFCs than that there's an answer in the reference.
I think most languages of a similar age (eg Go, Swift) are doing better.
Rust is younger than Go (released in 2015 vs 2012) and way more ambitious, especially Rust 1.0 was released as kind of an MVP and many things have changed since then, which made the maintenance of such a reference an issue. The pace of change is slowing nowadays (that's especially visible if you look new and accepted RFCs), so I hope the reference will catch up eventually.
There are some people studying Rust with formal verification. For example in this paper https://plv.mpi-sws.org/rustbelt/rbrlx/paper.pdf However I do not know if the whole language is covered or only a core.
Llvm is better than gcc at modularity and extensibility (or at least it was when llvm was released, I haven't followed gcc evolutions in a while). People who work on new languages typically use llvm because it's designed to make such things simple.
Now, in terms of end results, llvm and gcc each have their qualities. When llvm was released, gcc typically produced faster binaries but llvm optimizations were easier to understand. Since then, both have evolved and I haven't tried to catch up.
Bottom line: having two back-ends for rust or any other language is good. Among other things, it's a good way to check for bugs within an implementation of the compiler, it can be used to increase the chances of finding subtle bugs in safety-critical code, etc.
One thing GCC excels over LLVM is quality of debug information. If you switch from Clang to GCC, you will see less "optimized out" in GDB. This is pretty much guaranteed.
And here are plenty that support the converse ;) I don't think there's anything that points to one being definitively better than the other in performance.
GCC does some things better than LLVM. It supports more architectures, has a not broken implentation of restrict (which should be useful for Rust), and optimizes some code better. They both have their own pros and cons.
I'm actually surprised that Rust enabled noalias usage with this known outstanding issue. When I worked on Rust years ago, it was definitely common knowledge on the compiler "team" that this was broken.
I'm equally surprised that GCC had that bug, since their pointer aliasing model is equipped to correctly handle this situation (and is why they were able to fix it quickly).
Eh, it’s not as one-sided as that. GCC has a larger number of targets, but LLVM supports several newer targets that GCC doesn’t, like WebAssembly and eBPF (although the latter is coming in GCC 10). But it would certainly be nice for Rust to support both sets of targets.
In theory, both GCC and LLVM take a front-end (in this case rust) and compile it down to an intermediate representation (IR). There will likely be some differences between the output from a front-end, but after successive optimisations have been applied this will likely disappear. By the time you get to generating assembly, you can't really tell the difference anymore so the semantics of the original language don't make an impact.
I'm sure there are a number of "reasonable" assumptions that aren't true–probably things like the number of bits in a byte, or the size of a particular integral type, or support for a particular platform behavior.
> The C standard uses the term byte to mean the minimum addressable unit in the implementation, which is char, which means a byte on these targets is 16 bits. This is in conflict with the widespread use of byte to mean 8 bits exactly. This is an unfortunate disagreement between C terminology and widespread industry terminology that TI can't do anything about.
Absolutely not. A byte is the smallest block of memory with an address. E.g you can't take the address of 7 combined bits on x86 but you can for 8.
In the past, architectures differed wildly in number of bits per byte, e.g 36 for the machine where the Pascal language was created.
Today, the industry mostly standardized on 8 bits per byte, but see e.g the PIC architecture for an example relevant today with a different choice: 8 bit bytes for data, but 10 bit bytes for instructions.
> A byte is the smallest block of memory with an address. E.g you can't take the address of 7 combined bits on x86 but you can for 8.
I think that's an anachronistic/incorrect usage. A lot of machines (including several with 36-bit words that you mentioned) supported larger basic addressable units of memory, but didn't call these larger units "bytes", and distinguished between "bytes" and "words". In fact, one of the elements of the early RISC philosophy was that CPU support for byte accesses (as opposed to word accesses) was extraneous, based on statistics gathered from real programs. Early MIPS/Alpha/etc. machines did not support byte addressing, but the people using them still called 8 bits a byte.
Arguably the first Alphas could have had a C compiler with 64 bit bytes but that would have made porting hard. Even then they were forced to add byte operations pretty early on.
Byte is also often defined as the smallest addressable unit in a computer. Which nowadays most commonly is 8 bit, to the point where you can generally assume it, but this was different in the past (6 and 9 bit being especially common alternatives) and is still in some niches like DSPs, which sometimes only can work on wider types. But at least those then are typically powers of two, which makes it easier for many tools.
> I've been under the impression that GCC still has much better hardware optimizations than LLVM has.
That is my experience too.
GCC for code with high level of nesting, meaning high potential for inlining (typically C++), is close to unbeatable. Including even compared Highly optimised compilers like Intel ICC.
GCC has a reputation of having confusing architecture. It is a very hard project to work on. LLVM is typically considered cleaner and more understandable. GCC is known to have still in 2019, a rather slight performance benefit.
LLVM also has a stable IR named LLVM itself, while GCC refuses to do so over the decades for political and strategic reasons.
> while GCC refuses to do so over the decades for political and strategic reasons.
That was a long time ago. Since GCC 4.5 (released in 2010) GCC supports external plugins. [3,4] These plugins, like the rest of GCC, use GENERIC and GIMPLE as their IR.
Having worked with both, I don't know what you mean by "confusing architecture". Both are OK to work with, but both have some glaring holes in their documentation. LLVM's data structures are typically nicer to use than GCC's linked lists in a lot of places, that much is true.
As long as the Rust language specification isn't stable, GCC's Rust implementation would just fall behind all the time.
As an alternative, it has been suggested providing a GCC frontend for the LLVM-IR language which is generated by the Rust compiler since that specification is stable.
If Rust's specification becomes stable at some point, it would probably a good idea to start a Bountysource campaign (like we recently did for the m68k [1] and AVR backends [2]) to help the Rust frontend brushed up and ready for being merged into the GCC source tree.
> [1] https://www.bountysource.com/issues/80706251-m68k-convert-th...
> [2] https://www.bountysource.com/issues/84630749-avr-convert-the...