There's no reason to place an arbitrary cap on the number of file descriptors a process can have. It's neither necessary nor sufficient for limiting the amount of memory the kernel will allocate on behalf of a process. On every Linux system I use, I bump the FD limit to maximum everywhere.
It's always a good thing to have resource limits, to constrain runaway programs or guard against bugs. Low limits are unfortunate, but extremely high limits or unbounded resource acquisition can lead to many problems. I rather see "too many open files" than my entire machine freezing up, when a program misbehaves.
There's a certain kind of person who just likes limits and will never pass up an opportunity to defend and employ them.
For example, imagine a world in which Linux had a RLIMIT_CUMULATIVE_IO:
"How else am I supposed to prevent programs wearing out my flash? Of course we should have this limit"
"Of course a program should get SIGIO after doing too much IO. It'll encourage use of compression"
"This is a security feature, dumbass. Crypto-encrypters need to write encrypted files, right? If you limit a program to writing 100MB, it can't do that much damage"
Yet we don't have a cumulative write(2) limit and the world keeps spinning. It's the same way with the limits we do have --- file number limits, vm.max_map_count, VSIZE, and so on. They're relicts of a different time, yet the I Like Limits people will retroactively justify their existence and resist attempts to make them more suitable for the modern world.
Your entire machine won't freeze if you have a sensible limit of the direct cause of that freeze (which would be e.g. memory or CPU %, not some arbitrary number descriptors)
It's not that it's a proxy for memory use, but FD is its own resource, I've seen many software with FD leak (i.e. they open a file and forget about it, if they need the file again, they open another FD) so this limit can be a method to tell that it's leaking. Whether that's a good idea/necessary depends on the application.
Then account for _kernel memory used_ by file descriptors and account it like any other ulimit. Don't impose a cap on file descriptors in particular. These caps distort program design throughout the ecosystem
I'm not really sure, but I've always assumed early primitive UNIX implementations didn't support dynamically allocating file descriptors. It's not uncommon to see a global fixed size array in toy OSs.
One downside to your approach is that kernel memory is not swappable in Linux: the OOM failure mode could be much nastier than leaking memory in userspace. But almost any code in the real world is going to allocate some memory in userspace to go along with the FD, that will cause an OOM first.
ENOMEM is already one of the allowed error conditions of `open`. Classically you hit this if it's a pipe and you've hit the pipe-user-pages-hard limit. POSIX is a bit pedantic about this but Linux explicitly says the kernel may return this as a general failure when kernel memory limits are reached.
My guess is it was more about partitioning resources: if you have four daemons and 100 static global file descriptors, in the simplest case you probably want to limit each one to using 25. But I'm guessing, hopefully somebody who knows more than me will show up here :)
No it's way simpler than that. The file descriptors are indices into an array containing the state of the file for the process. Limiting the max size of the array makes everything easier to implement correctly.
For example consider if you're opening/closing file descriptors concurrently. If the array never resizes the searches for free fds and close operations can happen without synchronization.
I meant the existence of ulimit was about partitioning resources.
Imagine a primitive UNIX with a global fixed size file descriptor probing hashtable indexed by FD+PID: that's more what I was getting at. I have no idea if such a thing really existed.
> If the array never resizes the searches for free fds and close operations can happen without synchronization.
No, you still have to (at the very least) serialize the lookups of the lowest available descriptor number if you care about complying with POSIX. In practice, you're almost certain to require more synchronization for other reasons. Threads share file descriptors.
There's one unfortunate historical reason: passing FDs >=1024 to glibc's "select" function doesn't work, so it would be a breaking change to ever raise the default soft limit above that. It's perfectly fine for the default hard limit to be way higher, though, and for programs that don't use "select" (or that use the kernel syscall directly, but you should really just use poll or epoll instead) to raise their own soft limit to the hard limit.
> There's no reason to place an arbitrary cap on the number of file descriptors a process can have
I like to think that if something is there then there's a reason for it, it's just that I'm not that smart to see it :) jokes aside, I could see this as a security measure? A malware that tries to encrypt your whole filesystem in a single shot could be blocked or at least slowed down with this limit.
You'd been downvoted, but I also wonder about that.
If you write a program that wants to have a million files open at once, you're almost certainly doing it wrong. Is there a real, inherent reason why the OS can't or shouldn't allow that, though?
> If you write a program that wants to have a million files open at once
A file descriptor is just the name of a kernel resource. Why shouldn't I be able to have a ton of inotify watches, sockets, dma_buf texture descriptors, or memfd file descriptors? Systems like DRM2 work around FD limits by using their own ID namespaces instead of file descriptors and make the system thereby uglier and more bug-prone. Some programs that regularly bump up against default FD limits are postgres, nginx, the docker daemon, watchman, and notoriously, JetBrains IDEs.
I honestly don’t know. Maybe there’s a great reason for it that would be obvious if I knew more about the low-level kernel details, but at the moment it eludes me.
Like, there’s not a limit on how many times you can call malloc() AFAIK, and the logic for limiting the number of those calls seems to be the same as for open files. “If you call malloc too many times, your program is buggy and you should fix it!” isn’t a thing, but yet allocating an open file is locked down hard.
A million FDs on a process is not weird. I used to run frontends with that many sockets, on Intel Clovertown Xeons. That was a machine that came out 20 years ago. There is absolutely no reason whatsoever that this would indicate "doing it wrong" in the year 2025.
How else am I supposed to service a million clients, other than having a million sockets?
This isn't a real issue though. Usually, you can just set the soft limit to the often much higher hard limit; at worst, you just have to reboot with a big number for max fds; too many open files is a clear indicator of a missing config, and off we go. The defaults limits are small and that usually works because most of the time a program opening 1M fds is broken.
Kind of annoying when Google decides their container optimized OS should go from soft and hard limits of 1M to soft limit 1024, hard limit 512k though.
> Is there a real, inherent reason why the OS can't or shouldn't allow that, though?
Yes, because you are not alone in this universe. A user does usually run more than one program and all programs shall have access to resources (cpu time, memory, disk space).
People often run programs that are supposed to use 98% of the resources of the system. Go ahead and let the admin set a limit, but trying to preemptively set a "reasonable" limit causes a lot more problems than it solves.
Especially when most of these resources go back to memory. If you want a limit, limit memory. Don't make it overcomplicated.
But back to: why is that a problem? Why is there a limit on max open files such that process A opening one takes away from how many process B can open?