Minimizing Rust Binary Size

tiborsaas · on June 12, 2020

Another post for size constraint coding in Rust:

https://www.codeslow.com/2020/01/writing-4k-intro-in-rust.ht...

ahartmetz · on June 12, 2020

I am surprised that dynamically linking to Rust libstd and other common libraries is not mentioned. Rust produces (as far as Rust code is concerned, not sure about libc if used) fully static binaries by default, right?

pornel · on June 12, 2020

1. Operating systems don't ship Rust's stdlib (unlike C's), so such binary would work only on developer's own machine.

2. Rust doesn't want to commit to a stable ABI yet, so even if an OS wanted to ship libstd, it'd be logistically difficult due to libstd version having to exactly match compiler version.

derefr · on June 12, 2020

It would still make a lot of sense to dynamically link the particular libstd used for the compilation—and ship it alongside the package—if one wanted to ship a package containing many executables, the way that e.g. ImageMagick works.

wnoise · on June 12, 2020

I thought that in recent times ImageMagick acted like busybox, with all the executables symlinked to one master one, and dispatching functionality based on name. Certainly that's the way the fork GraphicsMagick works.

ahartmetz · on June 12, 2020

1. seems like a chicken and egg problem. I'm sure that the Debian developers wouldn't mind shipping it if it made sense (see 2).

2. is a fair point. At this point in Rust's life I see how avoiding binary conpatibility questions makes sense. Is anybody even working on it, though? Otherwise Rust could be making system-provided, dynamically linked libraries almost impossible due to earlier design decisions without consideration for BC.

est31 · on June 12, 2020

> Is anybody even working on it, though?

There is no work on MVPs or RFCs for now, but stable ABIs are being discussed currently: https://internals.rust-lang.org/t/a-stable-modular-abi-for-r...

zozbot234 · on June 12, 2020

> Otherwise Rust could be making system-provided, dynamically linked libraries almost impossible

They're very much possible; they're just limited to using the C ABI when interacting with dynamically-linked code. More complex features can be implemented by providing thin wrappers as part of language-specific bindings.

jedbrown · on June 12, 2020

As Rust becomes more prevalent in utilities, routine apps, and workflows using process-based parallelism, there will be a memory advantage to using shared libraries so that stdlib only needs to be resident once instead of once for each process. So long as it is packaged so multiple versions can coexist, this would make sense for packagers.

est31 · on June 12, 2020

Rust's dynamic linking story isn't really good because the ABI is highly unstable and Rust makes heavy use of non-erased generics.

scns · on June 12, 2020

Reified Generics?

est31 · on June 12, 2020

Yeah that's the formal term. Thank you. Languages like Java have erased generics where you can't even put something of type T onto the stack because the generated code has to stay generic on the size of T. Super annoying to work with.

On the other hand, reified generics increases the binary size. The advantage is that this is more digestible for the optimizer for inlining and specific optimization, which gives you more predictable performance than the alternative of devirtualization, but of course size can also have negative consequences, like for example slower compile time. One good example is this recent PR that improved compile time by making parts of the Vec implementation in the standard library non-generic: https://github.com/rust-lang/rust/pull/72013

jen20 · on June 12, 2020

I was very much under the impression that Rust used monomorphization, and that there was no concept of generics at runtime (unlike, say the CLR). Am I missing something here?

zozbot234 · on June 12, 2020

Rust supports both monomorphized and type-erased implementations, the latter via the `dyn` keyword when instantiating a trait (similar to an interface in Java).

jacoblambda · on June 12, 2020

They support both but general rule of thumb I've found from the community is use the prior by default unless you have to use the latter.

Sharlin · on June 12, 2020

Or monomorphized generics.

efnx · on June 12, 2020

Not fully static by default, it links dynamically against glibc. If you want a fully static bin you can compile with musl.

Hamuko · on June 12, 2020

>Cargo defaults its optimization level to 3 for release builds, which optimizes the binary for speed. To instruct Cargo to optimize for minimal binary size, use the z optimization level in Cargo.toml

In what scenarios is optimizing for binary size preferred over optimizing for speed?

trevyn · on June 12, 2020

Sometimes a binary optimized for size is faster than one optimized for speed. (Due to cache residency, as mentioned in another comment.)

SQLite recommends optimizing for size, because it yields a significantly smaller binary with minimal speed impact: https://www.sqlite.org/footprint.html

magicalhippo · on June 12, 2020

Maybe Windows specific, but if you run an exe from a shared directory or removable drive then in some cases[1] Windows will copy the entire exe before running it.

In that case having a big exe file can really hurt start-up performance, so optimizing for that might be preferable if your application is not very speed sensitive.

[1]: https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-... (IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP or IMAGE_FILE_NET_RUN_FROM_SWAP)

AprilArcus · on June 12, 2020

if it makes the difference between fitting the whole binary in L1 cache or not, then the wins on memory latency could beat the losses on branch misprediction

mrich · on June 12, 2020

>In what scenarios is optimizing for binary size preferred over optimizing for speed?

64kb intros:

https://www.reddit.com/r/rust/comments/597hhv/logicoma_elysi...

https://www.pouet.net/prod.php?which=69658

Sharlin · on June 12, 2020

However, 8 kB (the binary size mentioned in the article) is still drastically too much overhead in a 64k intro, and 4k intros are obviously out of the question.

Fronzie · on June 12, 2020

Any service or deamon that hardly takes any cpu time regardless of optimization.

speedgoose · on June 12, 2020

In WebAssembly for example.

abjKT26nO8 · on June 12, 2020

Embedded systems.

ATsch · on June 12, 2020

Embedded systems frequently have flash size limitations in the megabytes or even kilobytes.

detaro · on June 12, 2020

When loading the larger binary from storage takes longer than the execution speed gain justifies.

wyldfire · on June 12, 2020

What about `RUSTC_FLAGS=-Z opt=-Os` (this won't work, I'm sure -- it is a total mishmash of things I've seen before but -Os is the relevant part). building your executable, your dependencies and/or libstd w/-Os could really pay off -- especially if you already have LTO enabled.

mkesper · on June 12, 2020

Not all of these should be used except when striving for absolute minimal sizes.

dijit · on June 12, 2020

Is there an additional reason?

I see some of them change behaviour or make things complex, but I don't see a performance impact or anything that indicates software will be more crashy.

RockIslandLine · on June 12, 2020

Optimizing for size versus performance often eliminates optimizations like unrolling loops.

At the assembly level, it's sometimes true that a single instruction or smaller sequence of instructions takes more cpu cycles.

Analyze the different instruction sequences here:

https://www.nxp.com/docs/en/supporting-information/MC680X0OP...

est31 · on June 12, 2020

Most of the options listed have a disadvantage which is why they aren't enabled by default. The LTO and codegen unit options increase compile time, removing std is worse for the development experience, using xargo (or cargo's build-std feature) requires a C compiler to be present and increases compile time (as std is being compiled).

Sharlin · on June 12, 2020

Well, not being able to use the standard library certainly makes it more difficult to write safe and performant programs.

gutino · on June 12, 2020

Sadly there is not before after result example.

brianm · on June 12, 2020

I was curious about this, so applied the easy ones for an existing tool (https://github.com/brianm/wsf).

This was not an exhaustive test of optimization combinations, just a single stack, but the results are interesting!

To read the table: default is no changes in the release profile, ie:

  [profile.release]
  # opt-level = 'z'
  # lto = true
  # panic = 'abort'
  # codegen-units = 1

After that each additional line gets uncommented and rebuilt, then sizes recorded before and after cargo-strip, so the final line is all four optimizations applied.

Results:

                       as generated    after cargo-strip
  default              8675776         5069328
  opt-level = 'z'      9023200         4676112
  lto = true           5943312         3586584
  panic = 'abort'      5062456         3135928
  codegen-units = 1    4747000         3013048

Tests run on ubuntu 20.04 (Linux d2836c103a22 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 GNU/Linux) with

  rustc 1.43.1 (8d69840ab 2020-05-04)  
  cargo 1.43.0 (2cbe9048e 2020-05-03)  
  cargo-strip - reduces the size of binaries using the `strip` command 0.2.2

Fascinatingly, opt-level='z' produced a LARGER binary than the default, before stripping. That was unexpected.

-Brian

fouc · on June 13, 2020

Do you think you could get it down to 300kb?

brianm · on June 15, 2020

I doubt it, the dependency chain is pretty big: https://gist.github.com/brianm/066797531d8cc1f1c6c563ea8db7b...

naetius · on June 12, 2020

I haven't been following Rust's development much lately, but I'm interested in understanding what's the state of ABI stability...

martinhath · on June 12, 2020

Like C++, they basically seem to take the stance of "no".

steveklabnik · on June 12, 2020

There's some subtleties there, on both the C++ and Rust side. I won't speak to the C++ stuff, but on the Rust side, it's more "not yet, and we don't know when, and maybe never, we'll see" than it is "no."

bluejekyll · on June 12, 2020

It would be accurate to say that the only stable ABI Rust currently supports is the C FFI ABI, right?

steveklabnik · on June 12, 2020

I would say that "the Rust ABI is not stable, but Rust also supports other ABIs." https://doc.rust-lang.org/stable/reference/items/external-bl...

saagarjha · on June 12, 2020

Do you think that there will ever be reified generics on the ABI boundary?

steveklabnik · on June 12, 2020

No idea, ABIs aren't my area of expertise.

qppo · on June 12, 2020

A large amount of cruft and performance issues in the C++ standard library remain because the committee is too afraid of breaking ABI compatibility, so it's not so much as a "no" as "oh no." Even MSVC hasn't broken ABI in a bit.