Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Selfie – A tiny RISC-V C compiler, emulator and hypervisor (cs.uni-salzburg.at)
187 points by peter_d_sherman on Jan 16, 2023 | hide | past | favorite | 18 comments


From the project’s README [1]:

> Selfie is implemented in a single (!) file and kept minimal for simplicity. There is also a simple in-memory linker, a RISC-U disassembler, a garbage collector, L1 instruction and data caches, a profiler, and a debugger with replay as well as minimal operating system support in the form of RISC-V system calls built into the emulator and hypervisor. The garbage collector is conservative and even self-collecting. It may operate as library in the same address space as the mutator and/or as part of the emulator in the address space of the kernel.

I was a bit skeptical reading that, but I’m very impressed right now [2] . . .

[1]: https://github.com/cksystemsteaching/selfie/blob/5de675a0f08...

[2]: https://github.com/cksystemsteaching/selfie/blob/5de675a0f08...


This part of the README explains what exactly Selfie can be used for:

> a self-compiling compiler called starc that compiles a tiny but still fast subset of C called C Star (*) to a tiny and easy-to-teach subset of RISC-V called RISC-U,

> a self-executing emulator called mipster that executes RISC-U code including itself when compiled with starc,

> a self-hosting hypervisor called hypster that provides RISC-U virtual machines that can host all of selfie, that is, starc, mipster, and hypster itself, and a tiny C* library called libcstar utilized by selfie.


Checking out further... Related course (https://cksystemsteaching.github.io/CS4All/) and book (https://github.com/ckirsch/book)


The design of the selfie compiler is inspired by the Oberon compiler of Professor Niklaus Wirth from ETH Zurich. RISC-U, the target language of the selfie compiler, is inspired by the RISC-V community around Professor David Patterson from UC Berkeley. The selfie garbage collector is inspired by the conservative garbage collector of Hans Boehm. The design of the selfie microkernel is inspired by microkernels of Professor Jochen Liedtke from University of Karlsruhe.

That's quite the pedigree! This is a very interesting project to dig into, starting with the book.


According to the GitHub README, selfie's emulator and hypervisor can only support programs in RISC-U, "a tiny and easy-to-teach subset of RISC-V".[1] So... not the full RISC-V ISA.

For what it's worth, the selfie compiler can build RISC-U programs for you. Still a really cool project! I've been meaning to learn more about compiler backend development. I usually spend time on the frontend.

[1] https://github.com/cksystemsteaching/selfie


Wow really happy for this project to get so much attention. It was my favorite of all the courses there!


Why do they need both

> uint64_t SIZEOFUINT64 = 8; // in bytes

and

> SIZEOFUINT64 = (uint64_t) ((uint64_t*) SELFIE_URL + 1) - (uint64_t) SELFIE_URL;

in init_library() ?


From reading the code [1], both aren't needed; the `SIZEOFUINT64 = (uint64_t) ((uint64_t*) SELFIE_URL + 1) - (uint64_t) SELFIE_URL;` statement is the value that always ends up being used (once the library is initialized). An optimizing compiler would likely optimize the calculation away anyway. My guess is that it's predefined at the top simply for clarity and verbosity's sake (remember that this is built for educational purposes).

[1]: https://github.com/cksystemsteaching/selfie/blob/5de675a0f08...


It's really scary and I struggle to understand what is wrong with just

    const size_t SIZEOFUINT64 = sizeof (uint64_t);


I think it's because their subset of C [1], which they call C Star (C*), doesn't support the `sizeof` operator. Since Selfie is supposed to be able to compile itself, it seems they've restricted the used grammar to the subset supported by C*.

[1]: https://github.com/cksystemsteaching/selfie/blob/50b5fec8378...


C Star (C*) ?!?

Oh, well.

https://en.wikipedia.org/wiki/C*

=> "It was developed in 1987 as an alternative language to Lisp and CM-Fortran for the Connection Machine CM-2 and above. The language C* adds to C a "domain" data type and a selection statement for parallel execution in domains. "


Aah of course. Didn't read that far. Thanks.


Their compiler doesn't support sizeof. Maybe it's like that to ease bootstrapping?


The first one looks like a declaration in the global scope with a sane default, the latter is determining the actual size.

Haven't done C in a long long time, so I may be talking out of my ass.


I don't believe it one bit that you haven't done C in 2^64 seconds.

EDIT: technically 2^63-1 seconds.


Lol! I actually chuckled at this. I completely missed that.


I think it’s more likely this is a minor/latent bug.

Initializing the value and later assigning it a new value runs the risk of having some code that uses the value of one expression and other code that uses the value (presumedly/hopefully identical to the former) of another expression.

That’s not a risk you want to take.

The name also doesn’t give me an indication they considered making the code agnostic as to the size of that variable.


Your take seems accurate. That is my reading as well




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: