LKML discusses merging Rust access to filesystem APIs

arnsholt · on Feb 20, 2024

I reread John Regehr's "The Problem with Friendly C" [0] a while back, and a comment towards the end made me think of the Rust in Linux effort. He speculates that "an influential group such as the Android team could create a friendly C dialect" which in turn can drive adoption and consensus on what a better C language should look like. Of course, we know that hasn't happened since 2015 when that article was written, but what has happened is the Android team's rewrite of the Binder in Rust. So I think the main thrust of the argument is correct: interested parties are converging on an answer to the challenges of writing secure C. It's just that rather than evolving C itself, the consensus that seems to be developing is that Rust is a better answer.

0: https://blog.regehr.org/archives/1287

actionfromafar · on Feb 20, 2024

I think Rust is an improvement over C, but I also think it's unfortunate that there's nothing which is similar to C, but safer. C is incredibly easy to grok. (Even if the "idealized machine model" has grown a bit more involved with newer standards and optimizations. I'm probably not using the correct terminology but I'm referring to what happens with undefined behaviour.)

Remind me again why we can't use Pascal? (Except that it's corny, looks funny and I scoffed at it as a youngun.)

I wonder what "C with x" would look like, where x is:

- real composable types

- checked arrays

- maybe some pointer checks?

But no:

- inheritance

- objects

And also very compatible with plain C, not sure if 100% language compatible would be a good idea. Keeping compatibility with C was one contributing reason C++ became so complicated.

But keeping linker and header compatibility would be a godsend.

Maybe realistically what I want is Zig, I realise it's the closest existing fit, even though it has metaprogramming and the syntax is a bit different from C.

I should really learn it to see if I would be happy with it. Currently my go-to languages are C++ if I have the time, C# if I don't and Python if I just want to explore thoughts.

bananapub · on Feb 20, 2024

> C is incredibly easy to grok.

what does this mean, though? writing good, secure C is so hard that literally no one can do it at scale. not Linus, not Theo, not even djb.

why do you find "learning enough to write bad code isn't very hard" to be valuable?

I'm not even really convinced that C is that useful as an approximation of "how computers work" anymore - caches and NUMA and speculative execution and mmap and large numbers of varying ability cpu cores and asynchronous code make C more of an approximation of "how the PDP worked in the 1970s" (with an abstraction layer that is an endless rolling nightmare of critical security issues), which I guess is a bit useful, but not as the thing we base the world on. and a C with array bounds checking and length-prefix strings would still have a shitty type system and no meta programming and not even an iteration protocol (or ability to add a nice one).

> Remind me again why we can't use Pascal?

I would guess it's because Unix-alikes (and for a while Windows, though I guess it had less C and more C++ earlier) ate the world and using the language (almost all of) the OS was written in had huge network effects.

actionfromafar · on Feb 20, 2024

Writing good, secure C is not that hard, if you are writing business code, not super optimized kernels. But it requires a discipline I only acquired by using other languages.

- Don't use libc string functions.

- Use a lot of structs.

- Write a lot of manipulator functions for these structs so you don't get inconsistent data.

- Write in a functional style, don't toss pointers around nilly-willy.

When you follow such self-imposed rules, it's not hard. But you find yourself writing a lot of code doing manually what a more advanced language would just do for you, out-of-the-box. And then you start questioning why you wrote it in C in the first place. (And by "you", I mean "I".)

But what I have discovered is that writing in C is so tedious, it makes me think really hard about the problem to solve so it forces me to do really the bare minimum to solve the problem, which can be a relief if you are prone to over-thinking code and architecture. In C# with an IDE it's so easy to just throw in some abstractions and options and what-not "just in case" which would be a right pain in C.

bananapub · on Feb 20, 2024

> When you follow such self-imposed rules, it's not hard.

this is an extremely funny reply, thanks.

actionfromafar · on Feb 20, 2024

Glad to be of at least amusement. :-D

Edit:

I think C is easy like the game of Chess. The rules are easy to grok...

chris_wot · on Feb 20, 2024

It sounds like interesting advise. Why the liberal use of structs?

actionfromafar · on Feb 20, 2024

After looking at too many char arrays with fields indexed by various #define SOMEINDEX 6, and such, I have come to this conclusion. Code like that, where the knowledge of the data structure is "sprinkled out" in lots of code with if-s and other conditionals, which all have to be updated all over the code base, should one want to change what the data structure looks like.

I don't want to point fingers so I won't name names, but I took an interest in a certain file system for embedded flash because it was very good, struck a nice balance between features, size and speed, had lots of regression tests etc. In many ways a high quality product.

But wow was it coded very much like that, where to read the code you had to have a full mental model of the layout on "disk" and the layout of the data structures in RAM.

If it had set/get/manipulate_or_update functions operating on structs, then serializing code for the structs to disk, it would have been much easier to read. (And I don't mean heavy serializing, structs with appropriate packing instructions to the compiler would have probably sufficed.)

Edit:

Yet another reason to use structs, don't use naked arrays, put a size_t and the array in a struct.

https://news.ycombinator.com/reply?id=39440071&goto=threads%...

stefan_ · on Feb 20, 2024

The efforts of language nerds to make ever more convoluted syntax really has nothing to do with the safety and security of code. Hence C#, Python et al.

actionfromafar · on Feb 20, 2024

Not sure what you are getting at, but one class of bugs (crashing and run-foreign-code-in-memory exploit bugs) are just side-stepped in C#. On the other hand, PHP, C# and Python protects nothing against logic bugs and stuff like tripping up path validation and other things which are often exploited in other ways.

arnsholt · on Feb 20, 2024

I'm sure there are lots and lots of options in the design space here that would have been viable, but for better or worse this is the one we got. I dunno why Pascal didn't happen (but large portions of developers scoffing at the language shouldn't be discounted: this is as much a social consenus building exercise as a purely technical one), but for the "simpler C" versus Rust question it seems to me that there's some kind of local optimum around C, in the sense that the people interested in safety wanted more than is possible with small changes to C and the people who wanted to keep C weren't sufficiently motivated to actually make the changes (Regehr touches on this in the first part of the article, getting any kind of consensus on what Friendly C should actually be turned out to be very hard).

actionfromafar · on Feb 20, 2024

Just adding checked arrays would make a big difference I feel. Maybe it's not too late to change culturally, I think both GCC and LLVM has compile options for that.

I also think if Pascal existed with a C syntax front-end, 0-index arrays and ditching the Pascal libraries and going for libc by default, I would probably have switched.

ljosifov · on Feb 20, 2024

This one thing https://digitalmars.com/articles/C-biggest-mistake.html would have helped a lot. Imo given 1) GCC and LLVM have had extensions like that for a long time now, 2) most arrive at that/similar destination at some point (b/c it's so useful), and 3) afaik standardization was supposed to standardize current best/popular practice that is widespread but non-standard (and not invent new untried untested stuff), I think that C standardization committee dropped the ball (and kept dropping it for a very long time) as far as C arrays (and matrices and ndarrays etc) go.

hnfong · on Feb 20, 2024

> going for libc by default

I understand this isn't relevant to the original discussion since the kernel doesn't use libc, but in userland, the flawed design of libc is as much a liability as the language itself IMHO.

For example, maybe you know when to use strcpy(), strncpy(), strlcpy(), memcpy(), but TBH I don't remember the appropriate contexts to use them and I don't write enough C code to warrant actively keep all the nuances in my head.

dannymi · on Feb 20, 2024

>0-index arrays

On each pascal array you specify both bounds.

So nothing stops you from doing `array[0..4] of Integer` or whatever.

arnsholt · on Feb 20, 2024

I mean, in some sense it's never too late to change? But I do think that as technical people our profession has a strong tendency to underestimate (and undervalue!) the social component of this kind of change. If you have agreement on a path forward, getting it done is mostly a question of elbow grease. The hard part most of the time is getting an agreement on which path to choose out of many mutually exclusive options. And the Linux Rust effort is a good example of this: I think getting agreement that this is good and desireable is at least as hard as the technical work of writing the code. As the LWN article notes, there's been lots of change in this over just the last few years but there's still quite strong resistance it seems.

pjmlp · on Feb 20, 2024

That would be Zig in a way, Modula-2 (1978) for C folks.

cb321 · on Feb 20, 2024

If you liked Pascal and like Python and prioritize the things you mentioned and are considering languages as young/rough as Zig not directly compatible with C but with easy declare-and-go FFIs, then Nim https://nim-lang.org/ may really be the language you are looking for.

While inheritance/objects (and even multi-method dynamic dispatch) do exist in the language, they are not widely relied upon in the stdlib or ecosystem in general. Nim has much more lisp-like metaprogramming / syntax macros than C++ (but its richness of syntax for expressiveness compared to sexp's makes such metaprogramming quite a bit more work). Another way to start to describe it might be a far more terse, less factored Ada with more modern lexical aesthetics.

While I know of no specific Linux kernel module-in-Nim project, since Nim can compile to C, it should require no hand holding|permission-slip for Linux integration, although I'm sure it could be made "nicer", as can anything. (People are also writing a few OS kernels in Nim.)

pjmlp · on Feb 20, 2024

> I think Rust is an improvement over C, but I also think it's unfortunate that there's nothing which is similar to C, but safer.

Modula-2 exists since 1978, the only downside versus C in terms of safety, is the use after free still being an issue, everything else was already covered.

Modula-2 was inspired by Mesa, created by Xerox PARC as they desired a secure systems programing to move away from BCPL.

In many regards, Zig is Modula-2 with a syntax more appealing to C minded folks.

actionfromafar · on Feb 20, 2024

Now I have to try Zig.

pjc50 · on Feb 20, 2024

> C is incredibly easy to grok

No it really isn't. Newbies struggle with pointers all the time, before they even get to the question of pointer lifetime and what is and what is not safe pointer arithmetic.

> "C with X"

I think the most interesting areas here are Zig and D, although I'm not too familiar with them myself.

Really what we want is an end to "can't spell CVE without C": a language which makes it as difficult as possible to write software which can be compromised remotely.

rini17 · on Feb 20, 2024

I was coming from Pascal but hardest to grok were not pointers but that everything in C is an expression possibly with side effects. Unlike Pascal statements.

jstimpfle · on Feb 21, 2024

What side effects are you talking about that are hard to grok?

jstimpfle · on Feb 21, 2024

The problem is that we also need a language that makes it easy to write the software in the first place.

I'm still sceptical that Rust is this language. It is a somewhat successful languages with lots of people tinkering around, and some mildly successful software (what is the most successful software written in it, btw?). But in a team I'm working in (doing a distributed filesystem), there was an effort do bring Rust into the team and it still isn't easy to gain adoption in the team after a few years. And this is partly credited due to the difficulty of writing in the language.

What I suspect is that a lot of the success is explained by the tooling around the language, making it very easy (especially compared to C) to rely on existing infrastructure. But not necessarily that this infrastructure is particularly easy to write or maintain.

steveklabnik · on Feb 21, 2024

> what is the most successful software written in it, btw?

Depends on how you define successful, but I’d probably put money on it being “whataver the few million lines inside of Facebook are doing” or “the few million (probably, I don’t actually know on this one) lines inside of Amazon that they say powers S3, EC2, and CloudFront. Might be the various CloudFlare products, given how likely a given GET request is to touch their infrastructure.

actionfromafar · on Feb 20, 2024

You are probably right. :-) C is incredibly easy to grok if you know assembler. kicking goalpost a little

pjc50 · on Feb 20, 2024

I sometimes think we'd be better off with a "typesafe macro assembler" for some of the C uses. People keep using C for this when it really isn't, and then get kicked in the face by UB/optimizer interactions deleting security critical code.

samatman · on Feb 20, 2024

LLVM IR is an acceptable meta-assembler for most purposes. The degree to which it's used directly is roughly the degree to which people find that sort of tool useful, and it is used directly to some degree.

It would be a worthy project to write a language with the goal of providing the minimum affordances to LLVM IR to make it pleasant to program in, without fully obscuring what you're doing. I'd play around with it.

actionfromafar · on Feb 20, 2024

Yes, djb wanted a C flavour with no UB. That could also work. A macro assembler could fall behind optimizations, but a C with no UB could still optimize pretty well I reckon. Maybe there is a small market for an optimized language with no UB.

pjc50 · on Feb 20, 2024

"No UB" is really hard to do without sacrificing some portability. It took C decades to specify that arithmetic was twos-complement, for example. However "UB means the optimizer is free to change the semantics on the assumption that you don't hit UB" is a huge source of surprises.

jstimpfle · on Feb 21, 2024

If I got kicked in the face by UB/optimizer in the last 5 years then I didn't notice.

k8svet · on Feb 20, 2024

Literally the best programmers and biggest companies in the world have given up on C because no one can actually write it securely. I promise, you're not as good at C as you've convinced yourself.

Trust me, I've read this opinion from a certain group before. Whose compositor segfaults on me weekly.

And Rust has neither objects nor inheritance, so.

actionfromafar · on Feb 20, 2024

I don't mean to be snarky, and I may misunderstand some nuance of "the best" and "no one". But I don't think C fell out of favour because of security. I think it was because of productivity. (Of course, the two are linked, but not all that firmly as WordPress can attest to.)

crq-yml · on Feb 20, 2024

The thing that is provoking the switch is really in the "disruptive 10x better" areas, which Rust does offer around certain aspects of resource management. Zig is aligned around a different axis of "better", that axis being making a small-core, bootstrapping, C-compatible system. For the kernel project, the build-and-bootstrap problems were solved in an internal sense eons ago - there is a documented process. The process might not be particularly pretty or elegant, but switching won't make it 10x better.

On the other hand, small apps that need to "do things with bytes" and want to link to a bunch of dependencies have an appealing answer in Zig because it proposes solving their problem in systematic fashion.

I like Pascal, but the 10x isn't there: it could probably be 2-3x better than C for a lot of things, in the Dephi or FPC variants. It has the bits that people will "write C in C++" for, with much less WTF.

nercury · on Feb 20, 2024

I am Rust developer. My advice: favor less abstractions. Especially when interfacing with C, you can always make almost 1:1 rust interface. Start with that. Then, when common patterns start to emerge, do the data abstractions first, i.e. define data structures shared by different implementations.

There is no reason why you can't write simple code in Rust. Well, except that it's tempting to over complicate things. Start using traits and generics everywhere, and you will enter meta-programming in generics hell. Soon after, you will start demanding new compiler features to survive there.

lifthrasiir · on Feb 20, 2024

Great advice, but the resulting one-to-one mapping is not necessarily safe (in terms of Rust), which is in part why people jump early on advanced Rust features. So I would rephrase yours as follows: do not reach for abstractions to achieve safety without much consideration and justification.

nercury · on Feb 20, 2024

I admit I don't know enough about kernel development, but from general experience one common example is resource cleanup.

Say, you have several resources that must be cleaned up in certain order. In C, you would just call the appropriate free functions. When abstracting over that with rust, you have to make an annoying choice:

Do what's in C, don't implement automatic Drop, but mark your functions unsafe. You get leanest zero-cost implementation, it is straightforward to understand, but needs additional maintenance care to prevent bugs.

Wrap resources in unsafe structs that have automatic drop (when goes out of scope). IMO this is a terrible choice, because the maintainer suddenly has to know about which structures have this unsafe cleanup going on. Simple scopes suddenly matter, order of items suddenly matter, it's a mess.

Use reference-counting wrappers to track usage and drop the items when they are no longer in use. Most libraries do this, but it's no longer the leanest possible API.

There might be another choice, to achieve both zero-cost execution and correct cleanup using metaprogramming, be it macros, generics, or both. That's exactly what I fear the most.

dragonelite · on Feb 20, 2024

Thats why im also keeping an eye out for zig and check how they will develop the next 5 years or so. Hope they don't go so meta programming heavy.

actionfromafar · on Feb 20, 2024

Zig has the especially interesting feature that it can compile C code.

pjmlp · on Feb 21, 2024

That is more a property of shipping the whole LLVM/Clang package with the zig compiler than anything, though.

actionfromafar · on Feb 21, 2024

Whatever the mechanism, it has big implications for how Zig will be used in practice.

pjmlp · on Feb 21, 2024

A mechanism available to any language that feels like shipping clang.

It is also how Swift gets its C and Objective-C interop, and will have its C++ interop as well.

Or how the Java Panama project extracts header file information for native interop.

chris_wot · on Feb 20, 2024

The problem with the rust developers creating an API based on a file system that is a “toy” (i.e. a limited subset of features) is that the API abstraction does not cater for all file systems.

The advise provided is good - develop a file system module in rust for ext2 and then propose a new API.

mustache_kimono · on Feb 21, 2024

> The advice provided is good - develop a file system module in rust for ext2 and then propose a new API.

Although I think the point re: the API is not really worth quibbling too much about, I think "Go rewrite ext2 in Rust" is pretty goofy advice, when the purpose is something like "so we will know what the API should look like."

The minute someone implements a filesystem in Rust used by real people is when the rubber will meet the road. Rust filesystems may be different. Everyone is on notice -- APIs aren't stable in the kernel. They especially aren't stable when no one is using them. Something should be there to avoid 8 different implementations, but this endless -- go reimplement an existing facility, as an exercise, teaches us very little, and is mostly a waste of resources (which I'm beginning to believe is the point).

rightbyte · on Feb 20, 2024

"There is little in the way of type safety here; a function cannot know that it was actually passed a pointer to the right sort of inode."

Is there no type tag in the inode structs?

Edit: Looking at its definition I don't see the problem.

dezgeg · on Feb 20, 2024

Let's have a concrete example, link() in struct inode_operations is defined as:

    int (*link) (struct dentry *,struct inode *,struct dentry *);

Let's then look at the ext4 implementation of the link() operation:

    static int ext4_link(struct dentry *old_dentry,
           struct inode *dir, struct dentry *dentry)
    {
    ...
       if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
     (!projid_eq(EXT4_I(dir)->i_projid,
   EXT4_I(old_dentry->d_inode)->i_projid)))
  return -EXDEV;
    ....

In particular that calls EXT4_I():

    static inline struct ext4_inode_info *EXT4_I(struct inode *inode)
    {
        return container_of(inode, struct ext4_inode_info, vfs_inode);
    }

Which is basically just direct unchecked pointer type conversion. So, calling ext4_link() on any non-ext4 created struct inode will blow up badly.

actionfromafar · on Feb 20, 2024

Maybe it meant know at compile time?

keikobadthebad · on Feb 20, 2024

From Jan 15th