capnfantasic's comments

capnfantasic · on Nov 29, 2016

> Is this ever an real issue, even on any embedded system in the last 20 years?

Ask Cisco when they cut the Linksys routers' RAM in half a few years ago. Every byte counts. Component cost savings add up when you make a few million of them.

capnfantasic · on Nov 29, 2016

Traditionally yes. On the latest CPUs - who knows.

SonOfLilit · on Nov 29, 2016

Probably yes, because Intel knows this is the code every compiler outputs for zeroing a register.

Also, the reason it is "faster" is that the encoding is 1 byte, vs. 9 bytes (in 64 bit) for "mov rbp, 0" - roughly, 1 for "mov rbp,", 8 more for a 64 bit "0".

bonzini · on Nov 29, 2016

Technically you could get by with 5 bytes for "mov ebp, 0".

Another reason why it was faster was that the processor recognized it and avoided partial flags stalls after an "inc". But in 64-bit code you rarely have "inc" at all, so it matters less. On the other hand, a few years ago XOR had a false dependency on the register you're clearing; I'm not sure it is still that way on more recent processors.

SonOfLilit · on Nov 29, 2016

I tip my hat to you, your analysis is far more interesting than mine.

bonzini · on Nov 30, 2016

Wrong too, it's partial register stalls not partial flags stalls.

capnfantasic · on Nov 29, 2016

Fantastic until you need to malloc. You're reimplementing libc, but at least you know what's going on at every level.

DSMan195276 · on Nov 29, 2016

`malloc`'s not really that bad. There's a few different approaches you can take, but none of them are terribly complicated since the two basic memory allocation interfaces, `sbrk` and `mmap`, are fairly simple in terms of usage for generic allocations. But getting it all working and bug free still takes time. Same with stuff like `printf` and `scanf` (Though I'd actually argue those are harder to write then `malloc` if you're looking to be feature complete. `printf` has billions of features and I'm pretty sure `scanf` requires some extra black-magic internally).

There's no doubt that this is a fun project though - if you or someone-else enjoys this type of stuff, you should definitely try your hand at writing a simple Unix kernel or similar, you'd probably enjoy it.

On that note though, the writers aversion to inline assembly is unfortunate. It's a necessary evil for this type of programming. The syntax is ugly, but it's not really that hard to get used too (Especially since the large majority of inline assembly is just a few lines long, or even just one line long). In particular, the syscall wrappers can be done in a one-line piece of inline assembly, and then you can avoid the function-call overhead for the syscall by placing the inline assembly in a `static inline` function in your headers (Or a macro if you prefer), as well as avoid the extra .S file (Which IMO is the better part - it's always easier when you don't have to mix different languages like that).

I would also add that, while I used to share the aversion for AT&T asm syntax the author does, virtually all of the assembly code out there related to linux is written in AT&T, so it's worth it to get used to it and at least be able to read it. On that note, you can use the Intel syntax in inline assembly though, if you prefer, so even if you hate AT&T with a passion you can still write inline assembly ;)

pg314 · on Nov 29, 2016

You can get surprisingly far without using libc's malloc/free. E.g. TeX, the typesetting system by Knuth, implements its own dynamic memory handling. It has a large static array of bytes, and allocates from that when needed.

majewsky · on Nov 29, 2016

That's an arena allocator: https://en.wikipedia.org/wiki/Region-based_memory_management

Arenas are really nice if you're allocating a lot of objects of the same size, whereas malloc() must be prepared to handle a lot of different memory usage patterns.

pg314 · on Nov 29, 2016

I don't think it is. Differently sized objects can be allocated and released individually. Have a look at part 9 of [1]. In an arena based allocator you typically deallocate all the objects in an arena at once.

TeX basically uses a special purpose implementation of malloc/free, with a static array as backing instead of memory requested from the OS with mmap(2) or sbrk(2). The main reason is portability (the original version was released in 1978 using WEB/Pascal).

[1] http://brokestream.com/tex.pdf

nuclx · on Nov 29, 2016

FreeRTOS also provides a few malloc implementations backed by static arrays (not dependent on sbrk), which can be useful for running malloc-based test code on embedded platforms without native malloc: http://www.freertos.org/a00111.html

vidarh · on Dec 1, 2016

While one of the benefits of an arena allocator is to be able to deallocate everything at once, it's not that unusual to have an arena allocator that you can deallocate from "early" if needed.

std_throwaway · on Nov 29, 2016

Unless you also need to free, it's pretty simple.

shultays · on Nov 29, 2016

Also easy if your free/malloc is just a wrapper around munmap/mmap.

capnfantasic · on Nov 29, 2016

Of course.

When I wrote that comment I asked myself should I have written "malloc" or "malloc/free" - surely one implies the other.

vidarh · on Nov 29, 2016

For short-lived processes that do lots of allocations and where you can rely on the OS to release resources, just leaving out the deallocations is often faster.

Of course, you need to be careful as if you write code like that in a language without garbage collection, it's inherently not reuseable - retrofitting deallocation is often really painful because it gets easy to adopt patterns that make object ownership etc. unclear when you don't have to ensure it's easy to deallocate in the right order.

marcosdumay · on Nov 29, 2016

Not surely. Many programs are written in a way that allocates all the needed heap space at startup, and just reuse it forever. And those are overrepresented on the minimal-system kinds of environment.

dom0 · on Nov 29, 2016

Oh well, with 16 GB RAM even in laptops, who needs free anymore? Just restart the program. It's simpler anyway.

makapuf · on Nov 29, 2016

ironic for a minimalist/anti bloat pamphlet to start with "with 16GB ..."

pjc50 · on Nov 29, 2016

malloc() isn't particularly hard; K&R provides a working implementation using a freelist and sbrk in about a page of code. It's printf() that's the horrendous feature-crammed nightmare.