A deep dive into Debian 13 /tmp: What's new, and what to do if you don't like it

fh973 · 2025-08-29T05:37:14 1756445834

Swap on servers somewhat defeats the purpose of ECC memory: your program state is now subject to complex IO path that is not end-to-end checksum protected. Also you get unpredictable performance.

So typically: swap off on servers. Do they have a server story?

abrookewood · 2025-08-29T05:50:23 1756446623

That's a really good point that had never occurred to me.

Edit: I think that the use of ZFS for your /tmp would solve this. You get Error Corrected memory writing to an check-summed file system.

yjftsjthsd-h · 2025-08-29T06:34:47 1756449287

ZFS /tmp is probably fine, but swapping to ZFS on Linux is dicey AIUI; there's an unfortunate possibility of deadlock https://github.com/openzfs/zfs/issues/7734

abrookewood · 2025-08-29T07:39:18 1756453158

Ah, thanks for pointing that out - wasn't aware.

cromka · 2025-08-29T08:46:29 1756457189

So maybe another filesystem with heavy checksums could be used? Btrfs or dm-crypt with integrity over ext4?

Vogtinator · 2025-08-29T11:58:14 1756468694

swapfile on linux must be directly mapped, bypassing any filesystem level checksums (see https://btrfs.readthedocs.io/en/latest/Swapfile.html)

tatref · 2025-08-29T08:58:09 1756457889

Why not dm-integrity?

yjftsjthsd-h · 2025-08-29T15:47:07 1756482427

https://wiki.archlinux.org/title/Dm-integrity

> It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed.

That seems like a poor fit for swap IMO.

https://www.kernel.org/doc/html/latest/admin-guide/device-ma... says,

> There’s an alternate mode of operation where dm-integrity uses a bitmap instead of a journal. If a bit in the bitmap is 1, the corresponding region’s data and integrity tags are not synchronized - if the machine crashes, the unsynchronized regions will be recalculated. The bitmap mode is faster than the journal mode, because we don’t have to write the data twice, but it is also less reliable, because if data corruption happens when the machine crashes, it may not be detected.

It's not clear to me if that would be okay for swap (as long as you don't hibernate, maybe) or if it's sufficiently protected from corruption.

ars · 2025-09-03T00:14:41 1756858481

If you have checksum errors reading data from disk, you have much worse issues than ram corruption. Any program you launch will probably be corrupted.

Although if you do swap on a server (and you should), the swap needs to be on a raid, otherwise your server will crash on a disk error.

Swap on a server is not meant for handling low memory issues, instead there's tons of data on a server that's almost never used, so instead swap that out and make more room for cache.

m463 · 2025-08-29T06:12:27 1756447947

I can see it now: pro ecc sata and m.2 ssds

justsomehnguy · 2025-08-29T14:18:34 1756477114

Well, SATA do have a basic CRC and you would see an increase in CRC transfer errors in SMART if the path (usually the cables) aren't good.

blueflow · 2025-08-29T10:55:18 1756464918

First, having no swap means anonymous pages cannot be evicted, named pages must be evicted instead.

Second, the binaries of your processes are mapped in as named pages (because they come from the ELF file).

Named pages are generell not understood as "used" memory because they can be evicted and reclaimed, but if you have a service with a 150MB binary running, those 150MB of seemingly "free" memory are absolutely crucial for performance.

Running out of this 150MB of disk cache will result in the machine using up all I/O capacities to re-fetch the ELF from disk and likely become unresponsive. Having swap does significantly delay this lock-up by allowing anonymous pages to be evicted, so the same memory pressure will cause less stalls.

So until the OOM management on Linux gets fixed, you need swap.

Scaevolus · 2025-08-29T12:56:58 1756472218

Swapping anonymous pages can bring the system to a crawl too. High memory pressure makes things very slow with swap, while with swap off high memory pressure is likely to invoke the oom killer and lets the system violently repair.

blueflow · 2025-08-29T13:35:36 1756474536

The "bug" with the OOM killer that i implied is that what you describe does not happen. Which is not surprising because disk cache thrashing is normal mode of operation for serving big files to the network. An OOM killer acting on that alone would be problematic, but without swap, that's where the slowdown will happen for other workloads, too.

Its less a bug but an understood problem, and there aren't any good solutions around yet.

Trixter · 2025-08-29T16:34:43 1756485283

earlyoom is what we use to address this. We can't tolerate any kind of swapping at all in our workloads, where it is better for the system to kill one process to save the others, than for the system to slow down or lock up.

goodpoint · 2025-08-29T09:54:32 1756461272

That's not how swap is meant to be used on servers.

dooglius · 2025-08-29T11:39:38 1756467578

The purpose of ECC has nothing to do with being "end-to-end". A typical CPU path to/from DRAM will not be end-to-end either, since caches will use different encodings. This is generally considered fine since each I/O segment has error detection in one form or another, both in the CPU-to-memory case and the memory-to-disk case. ECC in general is not like cryptographic authentication where it protects against any possible alteration; it's probabilistic in nature against the most common failure modes.

nrdvana · 2025-08-29T04:29:55 1756441795

The third mitigating feature the article forgot to mention is that tmpfs can get paged out to the swap partition. If you drop a large file there and forget it, it will all end up in the swap partition if applications are demanding more memory.

m463 · 2025-08-29T06:10:26 1756447826

what swap partition?

I meant this sort of jokingly. I think have a few linux systems that were never configured with swap partitions or swapfiles.

edoceo · 2025-08-29T06:17:22 1756448242

I'm with you. I don't swap. Processes die. OOM. Linux can recover and not lose data. Just unavailable for a moment.

marginalia_nu · 2025-08-29T07:21:08 1756452068

The Linux OOM killer is kinda sketchy to rely on. It likes to freeze up your system for long periods of time as it works out how to resolve the issue. Then it starts killing random PIDs to try to reclaim RAM like a system wide russian roulette.

It's especially janky when you don't have swap. I've found adding a small swap file of ~500 MB makes it work so much better, even for systems with half a terabyte of RAM this helps reduce the freezing issues.

wahern · 2025-08-29T07:33:42 1756452822

Yeah. I always disable overcommit (notwithstanding that Linux cannot provide perfectly accurate strict memory accounting), and I'd prefer not to use swap, but Linux VM maintainers have consistently stated that they've designed and tuned the VM subsystem with swap in mind. Is swap necessary in the abstract? No. Is swap necessary on Linux? No. But don't be surprised if Linux doesn't do what you'd expect in the absence of swap, and don't expect Linux to put much if any effort into improving performance in the absence of swap.

I've never ran into trouble on my personal servers, but I've worked at places that have, especially when running applications that tax the VM subsystem, e.g. the JVM and big Java apps. If one wonders why swap would be useful even if applications never allocate, even in the aggregate, more anonymous memory than system RAM, one of the reasons is the interaction with the buffer cache and eviction under pressure.

throw0101c · 2025-08-29T12:01:24 1756468884

> […] but Linux VM maintainers have consistently stated that they've designed and tuned the VM subsystem with swap in mind.

There is a citation for this that can be shown to skeptics?

mnw21cam · 2025-08-29T08:26:26 1756455986

Install earlyoom or one of its near-equivalents. That mostly solves the problem of it freezing up the system for long periods of time.

I haven't personally seen the OOM killer kill unproductively - usually it kills either a runaway culprit or something that will actually free up enough space to help.

For your "even for systems with half a terabyte of RAM", it is logical that the larger the system, the worse this behaviour is, because when things go sideways there is a lot more stuff to sort out and that takes longer. My work server has 1.5TB of RAM, and an OOM event before I installed earlyoom was not pretty at all.

marginalia_nu · 2025-08-29T09:00:50 1756458050

> For your "even for systems with half a terabyte of RAM", it is logical that the larger the system, the worse this behaviour is, because when things go sideways there is a lot more stuff to sort out and that takes longer. My work server has 1.5TB of RAM, and an OOM event before I installed earlyoom was not pretty at all.

I meant it more in the sense that it doesn't have to be more than a few hundred MB even for large RAM. It's not the size of the swap file that makes the difference, but its presence, and advice of having it be proportional to RAM are largely outdated.

justsomehnguy · 2025-08-29T13:59:59 1756475999

> I haven't personally seen the OOM killer kill unproductively

Ah, the classical linux fan adage: "never happened to me means never happens ever to anyone".

My favourite things to see with OOM:

killing mysql on the machine which hosts only mysql and is THE production;

and the best one - killing sshd. Of course I can report on that only after seeing it on the tty0 through the BMC/IPMI console or KVM console of a VM.

eMPee584 · 2025-08-29T09:59:09 1756461549

nohang also has been a good one for desktops, with friendly notifications under memory stress and sane defaults.

Aside these complementary tools, the amount of systemd traps (OOM adjustment score defaults & restrictions, tmux user sessions killed by default etc etc) associated to OOM has really been taking a toll on my nerves over the years.. And kernel progress on this also has been underwhelming.

Also, why has firefox switched off automatic tab unloading when memory is low ONLY FOR LINUX? Much better ux since I turned on browser.tabs.unloadOnLowMemory ...

natebc · 2025-08-29T11:01:04 1756465264

it's anecdata but I've had the linux OOM Killer take out OVS (Open Virtual Switch) on a kubernetes node several times.

Made me really not mind having a little swap space setup just in case.

marginalia_nu · 2025-08-29T11:17:08 1756466228

OOMKiller, as far as I understand it, will just pick a random page, figure out who owns it, and then kill that process, repeating until enough memory is available. This will bias toward processes with larger memory allocations, but may kill any process.

menaerus · 2025-08-29T11:41:56 1756467716

> If it ever becomes necessary for the OOM Killer to kill processes, the decision of which processes to kill will be made based on something called the OOM score. Each process has an OOM score associated with it.

> Every running process in Linux has an OOM score. The operating system calculates the OOM score for a process, based on several criteria - the criteria are mainly influenced by the amount of memory the process is using. Typically, the OOM score varies between -1000 and 1000. When the OOM Killer needs to kill a process, again, due to the system running low on memory, the process with the highest OOM score will be killed first!

https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Kil...

Balinares · 2025-08-29T06:40:15 1756449615

Swapping still occurs regardless. If there is no swap space the kernel swaps out code pages instead. So, running programs. The code pages then need to be loaded again from disk when the corresponding process is next scheduled and needs them.

This is not very efficient and is why a bit of actual swap space is generally recommended.

adrian_b · 2025-08-29T10:38:05 1756463885

Unlike swapping, freeing code pages does no writing to HDD/SSD, but it only needs to reload the pages when they are needed again in the future, therefore it is more efficient than swapping.

I have stopped using swapping on all my Linux servers, desktops and laptops more than 20 years ago. At that time it was a great improvement and since then it has never caused any problems. However, I have been generous with the amount of RAM I install, for any computer having at least the NUC size there are many years since I have never used less than 32 GB, while for new computers I do not intend to use less than 64 GB.

With recent enough Linux kernels, using tmpfs for /tmp is perfectly fine. Nevertheless, for decades using tmpfs for /tmp had been dangerous, because copying a file through /tmp would lose metadata, e.g. by truncating file timestamps and by stripping the extended file attributes.

Copying files through /tmp was frequent between the users of multi-user computers where there was no other directory where all users had write access and the former behavior of Linux tmpfs was very surprising for them.

TiredOfLife · 2025-08-29T09:37:14 1756460234

Using Desktop mode on SteamDeck before they increased the swap was fun. Launch a game, everything freezes, go for an hour long walk, see that the game has finally killed, make and drink cofee while system becomes usable again.

guappa · 2025-08-29T07:57:29 1756454249

Fedora did this long before debian. I remember doing wget of an .iso file on /tmp and my entire wayland session being killed by the OOM killer.

I still think it's a terrible idea.

nolist_policy · 2025-08-29T08:06:26 1756454786

Use `/var/tmp` of you want a disk backed tmp.

1718627440 · 2025-08-29T09:19:11 1756459151

I thought /var/tmp is for applications while /tmp is for the user.

Hendrikto · 2025-08-29T09:48:07 1756460887

> /tmp/

> The place for small temporary files. This directory is usually mounted as a tmpfs instance, and should hence not be used for larger files. (Use /var/tmp/ for larger files.) This directory is usually flushed at boot-up. Also, files that are not accessed within a certain time may be automatically deleted.

Source: https://uapi-group.org/specifications/specs/linux_file_syste...

guappa · 2025-08-29T12:01:45 1756468905

But that was written after the change was made :D

styanax · 2025-08-29T11:56:35 1756468595

Trivia: CIS Guidelines (security tasks applied to a server to pass an enhanced security audit to be compliant with a standard, in a soundbite) has an item requiring /var/tmp to be a bind mount to /tmp (as well as setting specific security options on /tmp). A server attempting to pass CIS audits (very common in my work-related experience w/Enterprises) may well not have a unique /var/tmp.

throw0101c · 2025-08-29T12:04:05 1756469045

> I thought /var/tmp is for applications while /tmp is for the user.

/tmp is for stuff that is 'absolutely' temporary, in that on many/most systems it is nuked between reboots. /var/tmp is 'relatively' temporary in that applications can put stuff there that they're working on, but if there is a crash, the contents are not deleted and can be recovered across reboots.

buckle8017 · 2025-08-29T04:31:58 1756441918

Which is a great reason to have a big swap file now.

gnyman · 2025-08-29T05:24:43 1756445083

Note though that if you don't have swap now, and enable it, you introduce the risk of thrashing [1]

If you have swap already it doesn't matter, but I've encountered enough thrashing that I now disable swap on almost all servers I work with.

It's rare but when it happens the server usually becomes completely unresponsive, so you have to hard reset it. I'd rather that the application trying to use too much memory is killed by the oom manager and I can ssh in and fix that.

[1] https://docs.redhat.com/en/documentation/red_hat_enterprise_...

mnw21cam · 2025-08-29T08:35:15 1756456515

That's not true. Without swap, you already have the risk of thrashing. This is because Linux views all segments of code which your processes are running as clean and evictable from the cache, and therefore basically equivalent to swap, even when you have no swap. Under low-memory conditions, Linux will happily evict all clean pages, including the ones that the next process to be scheduled needs to execute from, causing thrashing. You can still get an unresponsive server under low memory conditions due to thrashing with no swap.

Setting swappiness to zero doesn't fix this. Disabling swap doesn't fix this. Disabling overcommit does fix this, but that might have unacceptable disadvantages if some of the processes you are running allocate much more RAM than they use. Installing earlyoom to prevent real low memory conditions does fix this, and is probably the best solution.

k_bx · 2025-08-29T05:47:55 1756446475

Disabling swap on servers is de-facto standard for serious deployments.

The swap story needs a serious upgrade. I think /tmp in memory is a great idea, but I also think that particular /tmp needs a swap support (ideally with compression, ZSWAP), but not the main system.

throw0101c · 2025-08-29T12:05:48 1756469148

> Disabling swap on servers is de-facto standard for serious deployments.

I guess I have not been deploying seriously over the last couple of decades because the (hardware) systems that I deploy all had some swap, even if it was only a file.

k_bx · 2025-08-29T12:36:00 1756470960

What's your swappiness ?

ravetcofx · 2025-08-29T05:54:50 1756446890

Swap always seemed more meant for desktop use. Servers you need to give the real memory expected of the application stack.

finaard · 2025-08-29T06:29:21 1756448961

Pretty much all the guidelines about swap partitions out there reference old allocator behaviour from way over a decade ago - where you'd indeed typically run into weird issues without having a swap partition, even if you had enough RAM.

Short (and inaccurate) summary was that it'd try to use some swap even if it didn't need it yet, which made sense in the world of enough memory being too expensive, and got fixed at the cost of making the allocator way more complicated when we started having enough memory in most cases.

Nowadays typically you don't need swap unless you work on a product with some constraints, in which case you'd hand tune low memory performance anyway. Just don't buy anything with less than 32GB, and you should be good.

someothherguyy · 2025-08-29T06:20:07 1756448407

plenty of footguns in that general advice, local in memory storage services with default config, etc

bmacho · 2025-08-29T09:42:10 1756460530

So the ideal behaviour would be:

  - for most processes no SWAP
  - for tmpfs, use RAM until a quota
  - for tmpfs, start using a swapfile above that quota

ChatGPT doesn't think it is achievable, though it thinks cgroup2 can achieve something similar.

baq · 2025-08-29T05:37:26 1756445846

This is why I’m running with overcommit 2 and a different ratio per server purpose.

…though I’m not sure why we have to think about this in 2025 at all.

worthless-trash · 2025-08-29T06:29:13 1756448953

I'm assuming that you monitor the service closely for OOM then adjust with demand ?

baq · 2025-08-29T09:01:40 1756458100

yeah pretty much, also configuring memory limits everywhere where apps allow it. some software also handles malloc failures relatively gracefully, which helps a whole lot (thank you postgres devs)

worthless-trash · 2025-08-31T06:54:56 1756623296

Ive spent the last day thinking about that, I really can't see any big negative side effects, the only issue that I'd have is being notified of OOM conditions, and that would just be a syslog regex match. Great plan.

computatrum · 2025-08-29T05:28:10 1756445290

The mentioned periodic clean up of tmp files is not enabled out-of-the-box in case of a upgrade from previous Debian versions, see https://www.debian.org/releases/trixie/release-notes/issues.... .

GCUMstlyHarmls · 2025-08-29T03:48:16 1756439296

Actually quite handy and practical to know about, specifically in the context of a "low end box" where I personally would prefer that RAM exist for my applications and am totally fine with `/tmp` tasks being a bit slow (lets be real, the whole box is "slow" anyway and slow here is some factor of "vm block device on an ssd" rather than 1990s spinning rust).

greatgib · 2025-08-29T06:58:32 1756450712

I'm surprised to discover that it was not already the case for a long time for tmpfs to be used for /tmp, and that change is nice.

But the auto-cleanup feature looks awful to me. Be it desktop or servers, machine with uptime of more than a year, I never saw the case of tmp being filled just by forgotten garbage. Only sometimes filled by unzipping a too big file or something like that. But it is on the spot.

It used to be the place where you could store cache or other things like that that will hold until next reboot. It looks so arbitrary and source of random unexpected bugs to have files being automatically deleted there after random time.

I don't know where this feature comes from, but when stupid risky things like this are coming, I would easily bet that it is again a systemd "I know best what is good for you" broken feature shoved through our throats...

And if coming from systemd, expect that one day it will accidentally delete important filed from you, something like following symlinks to your home dir or your nvme EFI partition...

mrweasel · 2025-08-29T07:28:47 1756452527

> I never saw the case of tmp being filled just by forgotten garbage.

It might have more to do with the type of developers I've worked with, but it happens all the time. Monitoring complains and you go into check, and there it is gigabytes of junk dumped there by shitty software or scripts that can't cleanup after themselves.

The issue is that you don't always knows what's safe to delete, if you're the operations person, and not the developer. Periodically auto-cleaning /tmp is going to do break stuff, and it will be easier to demand that the operations team disable auto-cleanup than getting the issue fixed in the developers next sprint.

snmx999 · 2025-08-29T09:00:40 1756458040

Autocleaning: get the last accessed time from a file and only auto-clean files not accessed in the last n hours, e.g. 24 hours? Should be reasonably safe.

brainzap · 2025-08-29T09:51:09 1756461069

devs throw everything into tmp, so it also accumulates a lot of privacy data

loa_in_ · 2025-08-29T07:18:27 1756451907

I tried out variations on this on my daily driver setups. The design choices here were likely threefold:

Store tmpfs in memory: volatile but limited to free ram or swap, and that writes to disk

Store tmpfs on dedicated volume: Since we're going to write onto disk anyway, make it a lightweight special purpose file system that's commited to disk

On disk tmpfs but cleaned up periodically: additional settings to clean up - how often, what should stay, tie file lifetime to machine reboot? The answers to these questions vary more between applications than between filesystems, therefore it's more flexible to leave clean up to userspace.

In the end my main concern turned out to be that I lost files that I didn't want to lose, either to reboot cleanup, on timer cleanup, etc. I opted to clean up my temp files manually as needed.

cl3misch · 2025-08-29T07:57:06 1756454226

A tmpfs itself is basically a ramdisk by definition. I assume you mean /tmp when you say tmpfs?

loa_in_ · 2025-08-29T08:56:31 1756457791

Yes. I'm not careful lately.

hnlmorg · 2025-08-29T07:20:11 1756452011

If you’ve got swap set up then stale files will get written back to disk so at least you’re not RAM indefinitely just because they’re stored on tmpfs.

It’s still not an ideal solution though.

PhilipRoman · 2025-08-29T07:07:22 1756451242

I agree about the auto cleanup, I discovered it a few days after using /tmp as a ramdisk for Yocto build. Lost a few patches but nothing significant.

palmfacehn · 2025-08-29T03:55:49 1756439749

If I am satisfied with my disk speed, why would I want to use system memory? What are the specific use cases where this is warranted?

margalabargala · 2025-08-29T04:20:31 1756441231

Computers like a Raspberry Pi, where the OS is on a sdcard, will hugely benefit.

jauntywundrkind · 2025-08-29T04:27:14 1756441634

Yup. There's lots of advice about how to reduce cycle count, increase lifetime of sd cards out there. This post has a bunch of ideas, and tmpfs is definitely on the list. https://raspberrypi.stackexchange.com/a/186/32611

techjamie · 2025-08-29T04:11:20 1756440680

Technically it'll have some impact on the number of write cycles your disk goes through, and marginally reduce the level of wear.

Most disks have a lot of write cycles available that you'll be fine anyway, but it's a tiny benefit.

1vuio0pswjnm7 · 2025-08-29T13:40:02 1756474802

I havent used a non-tmpfs (disk-based) /tmp in over 15 years

Didnt need it on NetBSD, memory could go to zero and system would (thrash but) not crash. When I switched to Linux the OOM issue was a shock at first but I learned to avoid it

I use small form factor computers, with userland mounted in and running from memory, no swap; I only use longterm storage for non-temporary data

https://www.kingston.com/unitedkingdom/en/blog/pc-performanc...

worthless-trash · 2025-08-29T06:32:08 1756449128

I'm still a fan of poly instantiated /tmp and PrivateTmp (systemd). This may confuse/annoy admins who are not aware of namespaces, but I know that it definitely closes the attack vector of /tmp abuse by bad actors.

https://www.redhat.com/en/blog/polyinstantiating-tmp-and-var...

ars · 2025-08-29T04:24:50 1756441490

File is tmpfs will swap out if your system is under memory pressure.

If that happens, reading the file back is DRAMATICALLY slower than if you had just stored the file on disk in the first place.

This change is not going to speed things up for most users, it will slow things. Instead of caching important files, you waste memory on useless temporary files. Then the system swaps it out, so you can get cache back, and then it's really slow to read back.

This change is a mistake.

saurik · 2025-08-29T04:30:28 1756441828

Why is reading the data back from swap be slower at all -- much less "DRAMATICALLY" so -- than saving the data to disk and reading it back?

cwillu · 2025-08-29T04:38:35 1756442315

Because swapping back in happens 4kb at a time

mnw21cam · 2025-08-29T08:50:07 1756457407

It's also because a filesystem is much more likely to have consecutive parts of a file stored consecutively on disc, whereas swap is going to just randomly scatter 4kB blocks everywhere, so you'll be dealing with random access read speed instead of throughput read speed.

blueflow · 2025-08-29T11:49:06 1756468146

Valid argument with FAT on spinning rust, invalid with ext4 on ssd. ext4 is extent-based so the fragmentation overhead doesn't happen.

Suzuran · 2025-08-29T12:41:37 1756471297

The swap partition does not have a filesystem, it is a linear list of blocks.

blueflow · 2025-08-29T12:50:17 1756471817

Neither i nor parent said that. Confusion?

cwillu · 2025-08-30T02:44:06 1756521846

ext4 is irrelevant to what happens when a file is backed by swap; even with swapfiles, the mm subsystem more or less goes behind the back of the filesystem to access the disk corresponding to the swapfile.

The overhead of making (size-of-read / 4kb) requests (potentially stalling the reading process for every page) is relevant even on an ssd; there are costs to random access beyond moving a disk head and waiting for a platter to spin into position, and those costs are still relevant with solid-state storage.

ars · 2025-08-29T17:40:52 1756489252

You wrote your comment like it was a rebuttal of the person above you, but the text supports what they said: A filesystem is faster than swap for this.

What was your intent?

cycomanic · 2025-08-29T06:21:09 1756448469

worthless-trash · 2025-08-29T06:29:49 1756448989

Because of page size. Its treated like any other page.

marginalia_nu · 2025-08-29T07:26:23 1756452383

This doesn't really make sense. If /tmp was an on-disk directory the same memory pressure that caused swapping would just evict the file from the page cache instead, again leading to a cache miss and a dramatically slower read.

ars · 2025-08-29T17:41:21 1756489281

Reading it back from a filesystem is much much faster than reading it back from swap.

imp0cat · 2025-08-29T04:31:45 1756441905

Most systems probably aren't having problems with insufficient RAM nowaday though, do they? And this will reduce wear on your SSD.

Also, you can easily disable it: https://www.debian.org/releases/trixie/release-notes/issues....

magicalhippo · 2025-08-29T05:09:09 1756444149

If you're running it in a VM you might not have all that luxurious RAM.

When my Linux VM starts swapping I have to either wait an hour or more to regain control, or just hard restart the VM.

imp0cat · 2025-08-29T05:38:51 1756445931

Right, but if it's a VM, it's probably provisioned by something like ansible/terraform? If so, it's quite easy to add an init script that will disable this feature and never have to worry about it again.

mhitza · 2025-08-29T06:09:54 1756447794

What distro are you running? systemd-oomd kills processes a bit quicker than what came before (a couple minutes of a slow, stuttery system). Still too slow for a server you'd want to have back online as quickly as possible.

At least now when I run out of memory it kills processes that consume the most memory. A few years back it used to kill my desktop session instead!

mnw21cam · 2025-08-29T08:54:13 1756457653

Right, that's traditionally been because the X server has typically had a fairly large footprint, and therefore has been very attractive for the oom killer. But in the last 15 years or so, some heuristics have been applied to deliberately discourage the oom killer from killing "important things".

I install earlyoom on systems I admin. It prevents the low-memory thrashing by killing things while the system is still responsive, instead of when the system is in a state that means it'll take hours to recover.

ars · 2025-08-29T17:42:21 1756489341

It's not about insufficient ram, it's about reserving ram for much more important things: cache.

This changes puts the least important data in ram - temp files - while evicting much more important cache data.

pixelesque · 2025-08-29T06:35:32 1756449332

On small VPS systems with 512 MB or 1 GB you're more likely to notice (if /tmp is actually used by what's running on the sytem).

renewiltord · 2025-08-29T04:19:48 1756441188

Why is there no write through unionfs in Linux? Feels like a very useful tool to have. Does no one else need this? Have half a mind to write one with an NFS interface.

EDIT: Thank you, jaunty. But all of these are device level. Even bcachefs was block device level. It doesn't allow union over a FUSE FS etc. It seems strange to not have it at the filesystem level.

jauntywundrkind · 2025-08-29T04:23:49 1756441429

Dm-cache! https://www.kernel.org/doc/Documentation/device-mapper/cache...

jona-f · 2025-08-29T06:44:44 1756449884

Do you mean that you can mark files for which still the underlying filesystem is used? As far as I remember there were experiments with that about 20 years ago, but it was decided that the added complexity wasn't worth it. The implementation that replaced all of that has been very stable (unlike the ones before) and i'm using it heavily, so i think they had a point. Some write-through behavior can be scripted on top of that.

EDIT: So, wikipedia lists overlayfs and aufs as active projects and unionfs predates both. Maybe unionfs v2 is what replaced all that? Maybe I'm hallucinating...

renewiltord · 2025-08-29T20:49:50 1756500590

Overlayfs doesn't write through, and I believe unionfs and aufs no longer support write-through.

What I want is pretty much like how a write-through cache would work.

1. Write to top-level FS? The write cascades down but reads are fast immediately

2. Data not available in top-level FS? The read goes down to the bottom level and then reads up to the top so future reads are fast.

ComputerGuru · 2025-08-29T04:27:48 1756441668

I feel like this is mixing agendas. Is the goal freeing up /temp more regularly (so you don’t inadvertently rely on it, to save space, etc) or is the goal performance? I feel like with modern nvme (or just ssd) the argument for tmpfs out of the box is a hard one to make, and if you’re under special circumstances where it matters (eg you actually need ram speeds or are running on an SD or eMMC) then you would know to use a tmpfs yourself.

(Also, sorry but this article absolutely does not constitute a “deep dive” into anything.)

Incipient · 2025-08-29T05:23:39 1756445019

Using the example from the article, extracting an archive. Surely that use case is entity not possible using in-memory? What happens if you're dealing with a not-unreasonable 100gb archive?

Who runs around with 100gb+ of swap?!

nolist_policy · 2025-08-29T06:57:10 1756450630

Use `/var/tmp` of you want a disk backed tmp. Not sure why the article omits that.

rwmj · 2025-08-29T09:52:36 1756461156

And so you now have to make a decision, is this file small or large? This pushes the problem to users and programs. (A very real problem too, we made large changes throughout libguestfs to sort out "small" and "large" files and put them into /tmp or /var/tmp. Entirely unnecessary if /tmp wasn't tmpfs on some systems).

Suzuran · 2025-08-29T12:39:04 1756471144

/var/tmp is not allowed if you wish to pass security audits, it MUST be a bind mount to /tmp

NekkoDroid · 2025-08-29T15:02:31 1756479751

What is the rational behind requiring it being a bind mount (or also whatever other options are allowed)?

perlgeek · 2025-08-29T05:34:30 1756445670

Who runs around with a 100gb+ /tmp partition?

Our default server images come with a 4.4GB /tmp partition...

wielebny · 2025-08-29T06:14:04 1756448044

I run a script that rotates my /tmp/ each day, so I can access yesterday's tmp files at /tmp/20250828/ and so on.

My /tmp is my default folder for downloads and temporary work. It will grow 100GB+ easily.

cycomanic · 2025-08-29T06:33:45 1756449225

Sure, but note that your usecase goes specifically against fhs and posix specs:

>Programs must not assume that any files or directories in /tmp are preserved between invocations of the program.

>Although data stored in /tmp may be deleted in a site-specific manner, it is recommended that files and directories located in /tmp be deleted whenever the system is booted.

https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s18.htm...

Now you can obviously use your Filesystem whichever way you like, but I would say Debian shouldn't have to take into consideration uses which are outside the general recommendations/specs.

1718627440 · 2025-08-29T09:23:20 1756459400

Programs shouldn't assume that about /tmp, the user advising this is fine.

pessimizer · 2025-08-29T11:13:22 1756466002

The user wasn't "advising" this, or asking if it was fine. They're just doing it. Everything that they want to do with their own computer is permissible.

The person you're replying to is saying that tmp is meant for temporary storage that could disappear between reboots. A permanent archive of the past states of the tmp directory is not temporary.

1718627440 · 2025-08-29T12:56:49 1756472209

I meant from the perspective of the program. You can't assume /tmp to be stable, but when the user tells you it's fine, then it is fine.

> A permanent archive of the past states of the tmp directory is not temporary.

From the perspective of a program it's still volatile, since the files don't stay in their original place.

Mashimo · 2025-08-29T08:52:23 1756457543

For a long time my default download folder was /dev/shm. It is / was? the memory tmpfs and everything would just be gone after a reboot. Now I can just use /tmp

Even used something similar on my windows pc, had a B:/ disk 1GB in size that was my download folder. Automated cleanup made easy.

KronisLV · 2025-08-29T06:49:59 1756450199

This reminded me of the spacebar heating xkcd: https://xkcd.com/1172/

(not making fun of the workflow or anything, it's just that changes like tmpfs breaking stuff very much holds true)

bmacho · 2025-08-29T09:50:39 1756461039

And even your use case will benefit from any change, since you'll just move to /tmp2 instead of /tmp, which will actually behave you want it to.

So 99.9% of the users + you benefit from the change. I'm sure there are people that really rely on unconventional usages, but they are silent atm.

perlgeek · 2025-08-29T06:34:54 1756449294

Your use case sounds more like "scratch" folder, not really what /tmp is meant for.

probably_wrong · 2025-08-29T07:44:27 1756453467

The part that's more likely to bite people here and that's easily overlooked is that files in /var/tmp will survive a reboot but they'll still be automatically deleted after 30 days.

Paul_S · 2025-08-29T10:58:55 1756465135

I read the article and I really don't understand. Linux already buffers files into RAM if there's any unused, why would you do this?

blueflow · 2025-08-29T11:07:52 1756465672

The "so you wont need to read files from disk" argument is bullshit because tmpfs data can be evicted to swap. If memory pressure is high you will still be reading from disk.

And high memory pressure is also what makes disk-backed /tmp slow. No improvement at all.

rwmj · 2025-08-29T07:06:45 1756451205

'systemctl mask tmp.mount' - the most important command to run in these situations.

It's a really bad idea to put /tmp into memory. Filesystems already use memory when necessary and spill to the filesystem when memory is under pressure. If they don't do this correctly (which they do) then fix your filesystem! That will benefit everything.

nolist_policy · 2025-08-29T07:12:59 1756451579

You'd think that, but in ext4 the first write to a new file will hit the disk (the code mentions it is a workaround for something). Btrfs does it correctly.

rwmj · 2025-08-29T07:21:27 1756452087

Sounds like fixing that ext4 problem could make a lot of things go faster.

tgtweak · 2025-08-29T17:14:25 1756487665

This is precisely what /dev/shm is for... And it can be used explicitly without any gotchas. If someone really wanted tmp to be in memory (to reduce SSD endurance writes or speed it up nominally) they can edit their mounts.

This feels like a very unnecessary change and nothing in that article made a convincing argument for the contrary.

noobermin · 2025-08-29T07:40:36 1756453236

As someone who sort of needs to juice all my ram, this is annoying but at least it can be turned off.

apexalpha · 2025-08-29T11:11:40 1756465900

I thought /dev/shm was ramdisk?

Does /dev/shm stay? Surely it does but it is also capped at 50% RAM. Does that mean /dev/shm + /tmp can now get to 100% RAM? Or do they share the same ram budget?

rubatuga · 2025-08-29T13:48:29 1756475309

Same budget

gigatexal · 2025-08-29T05:31:09 1756445469

Why this change? Writing to it will be faster than disk but idk if am is a precious commodity I’d rather it was just a part of the disk I was writing to.

rwmj · 2025-08-29T07:09:43 1756451383

It's a dumb idea that came from the systemd people. They've never explained properly why it's a good idea, but it's the systemd default and for some reason distros defer to that.

egorfine · 2025-08-29T09:43:54 1756460634

Not at all. tmpfs precedes systemd for like a decade.

rwmj · 2025-08-29T09:50:59 1756461059

It only became the default on Fedora and other Linux distros following systemd because it was the default in systemd.

It was a bad idea on Solaris too, but at least back in those days the trade-off between RAM and disk storage was very different from today now we have NVME drives and such.

egorfine · 2025-08-29T10:07:42 1756462062

Oh no, tmpfs was introduced and used way before systemd.

blueflow · 2025-08-29T11:46:51 1756468011

Which was contested by no-one. He complained about systemd switching /tmp to tmpfs, not systemd people making or causing tmpfs in general.

rwmj · 2025-08-29T10:14:23 1756462463

I'm well aware. I used it on Solaris.

egorfine · 2025-08-29T10:20:02 1756462802

Solaris/SunOS. Oh those were good times. Cheers, mate!

celeryd · 2025-08-29T08:28:45 1756456125

We did this song and dance in RHEL. It's fine. Just use /var/tmp if you need persistent tmp storage. Gnome and X and tmux will not make you swap and if they do run xfce instead.