Hacker News new | past | comments | ask | show | jobs | submit login
Minimizing Rust Binary Size (github.com/johnthagen)
149 points by xanthine on June 12, 2020 | hide | past | favorite | 46 comments



Another post for size constraint coding in Rust:

https://www.codeslow.com/2020/01/writing-4k-intro-in-rust.ht...


I am surprised that dynamically linking to Rust libstd and other common libraries is not mentioned. Rust produces (as far as Rust code is concerned, not sure about libc if used) fully static binaries by default, right?


1. Operating systems don't ship Rust's stdlib (unlike C's), so such binary would work only on developer's own machine.

2. Rust doesn't want to commit to a stable ABI yet, so even if an OS wanted to ship libstd, it'd be logistically difficult due to libstd version having to exactly match compiler version.


It would still make a lot of sense to dynamically link the particular libstd used for the compilation—and ship it alongside the package—if one wanted to ship a package containing many executables, the way that e.g. ImageMagick works.


I thought that in recent times ImageMagick acted like busybox, with all the executables symlinked to one master one, and dispatching functionality based on name. Certainly that's the way the fork GraphicsMagick works.


1. seems like a chicken and egg problem. I'm sure that the Debian developers wouldn't mind shipping it if it made sense (see 2).

2. is a fair point. At this point in Rust's life I see how avoiding binary conpatibility questions makes sense. Is anybody even working on it, though? Otherwise Rust could be making system-provided, dynamically linked libraries almost impossible due to earlier design decisions without consideration for BC.


> Is anybody even working on it, though?

There is no work on MVPs or RFCs for now, but stable ABIs are being discussed currently: https://internals.rust-lang.org/t/a-stable-modular-abi-for-r...


> Otherwise Rust could be making system-provided, dynamically linked libraries almost impossible

They're very much possible; they're just limited to using the C ABI when interacting with dynamically-linked code. More complex features can be implemented by providing thin wrappers as part of language-specific bindings.


As Rust becomes more prevalent in utilities, routine apps, and workflows using process-based parallelism, there will be a memory advantage to using shared libraries so that stdlib only needs to be resident once instead of once for each process. So long as it is packaged so multiple versions can coexist, this would make sense for packagers.


Rust's dynamic linking story isn't really good because the ABI is highly unstable and Rust makes heavy use of non-erased generics.


Reified Generics?


Yeah that's the formal term. Thank you. Languages like Java have erased generics where you can't even put something of type T onto the stack because the generated code has to stay generic on the size of T. Super annoying to work with.

On the other hand, reified generics increases the binary size. The advantage is that this is more digestible for the optimizer for inlining and specific optimization, which gives you more predictable performance than the alternative of devirtualization, but of course size can also have negative consequences, like for example slower compile time. One good example is this recent PR that improved compile time by making parts of the Vec implementation in the standard library non-generic: https://github.com/rust-lang/rust/pull/72013


I was very much under the impression that Rust used monomorphization, and that there was no concept of generics at runtime (unlike, say the CLR). Am I missing something here?


Rust supports both monomorphized and type-erased implementations, the latter via the `dyn` keyword when instantiating a trait (similar to an interface in Java).


They support both but general rule of thumb I've found from the community is use the prior by default unless you have to use the latter.


Or monomorphized generics.


Not fully static by default, it links dynamically against glibc. If you want a fully static bin you can compile with musl.


>Cargo defaults its optimization level to 3 for release builds, which optimizes the binary for speed. To instruct Cargo to optimize for minimal binary size, use the z optimization level in Cargo.toml

In what scenarios is optimizing for binary size preferred over optimizing for speed?


Sometimes a binary optimized for size is faster than one optimized for speed. (Due to cache residency, as mentioned in another comment.)

SQLite recommends optimizing for size, because it yields a significantly smaller binary with minimal speed impact: https://www.sqlite.org/footprint.html


Maybe Windows specific, but if you run an exe from a shared directory or removable drive then in some cases[1] Windows will copy the entire exe before running it.

In that case having a big exe file can really hurt start-up performance, so optimizing for that might be preferable if your application is not very speed sensitive.

[1]: https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-... (IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP or IMAGE_FILE_NET_RUN_FROM_SWAP)


if it makes the difference between fitting the whole binary in L1 cache or not, then the wins on memory latency could beat the losses on branch misprediction


>In what scenarios is optimizing for binary size preferred over optimizing for speed?

64kb intros:

https://www.reddit.com/r/rust/comments/597hhv/logicoma_elysi...

https://www.pouet.net/prod.php?which=69658


However, 8 kB (the binary size mentioned in the article) is still drastically too much overhead in a 64k intro, and 4k intros are obviously out of the question.


Any service or deamon that hardly takes any cpu time regardless of optimization.


In WebAssembly for example.


Embedded systems.


Embedded systems frequently have flash size limitations in the megabytes or even kilobytes.


When loading the larger binary from storage takes longer than the execution speed gain justifies.


What about `RUSTC_FLAGS=-Z opt=-Os` (this won't work, I'm sure -- it is a total mishmash of things I've seen before but -Os is the relevant part). building your executable, your dependencies and/or libstd w/-Os could really pay off -- especially if you already have LTO enabled.


Not all of these should be used except when striving for absolute minimal sizes.


Is there an additional reason?

I see some of them change behaviour or make things complex, but I don't see a performance impact or anything that indicates software will be more crashy.


Optimizing for size versus performance often eliminates optimizations like unrolling loops.

At the assembly level, it's sometimes true that a single instruction or smaller sequence of instructions takes more cpu cycles.

Analyze the different instruction sequences here:

https://www.nxp.com/docs/en/supporting-information/MC680X0OP...


Most of the options listed have a disadvantage which is why they aren't enabled by default. The LTO and codegen unit options increase compile time, removing std is worse for the development experience, using xargo (or cargo's build-std feature) requires a C compiler to be present and increases compile time (as std is being compiled).


Well, not being able to use the standard library certainly makes it more difficult to write safe and performant programs.


Sadly there is not before after result example.


I was curious about this, so applied the easy ones for an existing tool (https://github.com/brianm/wsf).

This was not an exhaustive test of optimization combinations, just a single stack, but the results are interesting!

To read the table: default is no changes in the release profile, ie:

  [profile.release]
  # opt-level = 'z'
  # lto = true
  # panic = 'abort'
  # codegen-units = 1
After that each additional line gets uncommented and rebuilt, then sizes recorded before and after cargo-strip, so the final line is all four optimizations applied.

Results:

                       as generated    after cargo-strip
  default              8675776         5069328
  opt-level = 'z'      9023200         4676112
  lto = true           5943312         3586584
  panic = 'abort'      5062456         3135928
  codegen-units = 1    4747000         3013048
Tests run on ubuntu 20.04 (Linux d2836c103a22 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 GNU/Linux) with

  rustc 1.43.1 (8d69840ab 2020-05-04)  
  cargo 1.43.0 (2cbe9048e 2020-05-03)  
  cargo-strip - reduces the size of binaries using the `strip` command 0.2.2  
Fascinatingly, opt-level='z' produced a LARGER binary than the default, before stripping. That was unexpected.

-Brian


Do you think you could get it down to 300kb?


I doubt it, the dependency chain is pretty big: https://gist.github.com/brianm/066797531d8cc1f1c6c563ea8db7b...


I haven't been following Rust's development much lately, but I'm interested in understanding what's the state of ABI stability...


Like C++, they basically seem to take the stance of "no".


There's some subtleties there, on both the C++ and Rust side. I won't speak to the C++ stuff, but on the Rust side, it's more "not yet, and we don't know when, and maybe never, we'll see" than it is "no."


It would be accurate to say that the only stable ABI Rust currently supports is the C FFI ABI, right?


I would say that "the Rust ABI is not stable, but Rust also supports other ABIs." https://doc.rust-lang.org/stable/reference/items/external-bl...


Do you think that there will ever be reified generics on the ABI boundary?


No idea, ABIs aren't my area of expertise.


A large amount of cruft and performance issues in the C++ standard library remain because the committee is too afraid of breaking ABI compatibility, so it's not so much as a "no" as "oh no." Even MSVC hasn't broken ABI in a bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: