Hacker News new | past | comments | ask | show | jobs | submit login
Booting embedded Linux in 0.37 seconds on an ARMv7-A CPU at 528 MHz (github.com/eerimoq)
187 points by eerimoq on June 17, 2020 | hide | past | favorite | 83 comments



One interesting reason there's a been a bunch of progress in this space is that automotive systems are required to show the rear view camera in a certain amount of time. Progress driven by the oddest of things.


My car takes around 10 seconds(I think more im not sure) for the reverse camera to work after starting the engine. Its annoying having to wait, thats for sure


Is it pulling a fresh docker image each time?


Gold


I've been deep in the interview cycle lately and I just had a dream last night that I was asked, in a non-technical interview with a product manager, "what are the 5 ways to dockerize an application?"

I then said that I could only think of one way, and he responded "how do you not know docker if you are applying for a java developer job?"

Thankfully I woke up


well most of the time, my company deals with docker aswell. but my coworkers have nothing to do with it, it just is fully automated. the only problem we had, when introducing long running jobs that can be run while clicking a button inside our ui, which runs a k8s job. that was hairy for my coworker, but with enough shell scripts it started to be easier and easier.


That's cool :) I mean, it's fine to not know the answer to everything. Usually it's not a deal breaker, especially if you provide a general answer of "here's how i tHink it works"

But what's always hilarious to me is 3 seconds after they ask the docker question, they'll then ask "ok so tell me a bit more about how CompleteableFutures, Consumers, and Threadpools work together and why you would want to use them"

or my personal favorite, the predictable trifecta of

"whats the difference between an abstract class and an interface"

"ok tell me how garbage collection works"

"ok and whats the difference between final, finalize, and finally?"


well docker should not be your problem the latter are things an java developer should've heard about. well tbf CompletableFutures is relativly new in java, not in other languages of course (i.e. scala, c#, etc...)


Which car make and model?


Seconded - which make and model please, so I can never buy one?


Not OP, but from experience it could be 2009 prius.


The rear view camera system almost always runs independent of the main Head Unit (HU) or In-Vehicle Infotainment (IVI). In most cases the rear view camera view is a single application specifically coded for the target micro processor (SuperH for example) and is the only thing running on that micro processor. The HU and the rear view camera share the display. While you are driving in R the HU is booting Linux or QNX and when you move to D the screen switches to the HU. The rear view camera application keeps running uninterrupted.


What you said is/used to be right but GP is also probably right, hardware companies love to migrate distributed and reliable system into overcomplicated but integrated Linux contraptions


Why not just wire the Linux box to the car battery? Most after market dash cams do this to capture accidents during parking. Certainly a low power device is enough ?


You would be surprised how even the most innocuous electrical systems can draw down a car battery if the car doesn't move for extended periods, especially in cold weather and with a battery that's already degraded or only gets charge from short drives.

I have an old car (with a modern <2 year old battery BTW) sitting in the driveway that rarely gets driven. For a long time, every time I wanted to drive the car the battery charge would be so low it would fail to crank the engine enough to start it. I would jump-start it and drive it for at least an hour, and if I would drive it again within a week or two it would be fine, but after ~3 to ~4 weeks the battery would be dead again.

When I finally got around to diagnose the problem and measured leak current while removing the fuses 1-by-1, I found out that the tiny light in the glovebox compartment was not turning off because the lid switch did not engage properly. The current was something like 100 mAh but it was apparently enough to drain a less-than-full battery within only a few weeks...


>current was something like 100 mAh

Current was 100mA, not mAh.


But that makes sense, right? 0,1 Amps = 2,4 amps/day, many cars have a ~60 Ah battery, so that would be empty in 25 days (and probably in less than that doesn't have enough power to start the engine).


I think you mean 2.4Ah/day


h stands for per hour, but it was already converted it into per day, so it got correctly dropped.


They probably want reboot after crash to be also quick


I don't know if it a requirement that comes from our customers or from the law, rear view camera must be available within 4 seconds from door opening.


I'm impressed. I don't know of too much work that's going on upstream for optimizing boot times, other than some of the clear linux stuff: https://www.phoronix.com/scan.php?page=news_item&px=Clear-Li...

There are folks looking into improving boot times on Android; turns out init and kernel drivers are a tangly mess of {dependency} spaghetti. Loading kernel modules can induce delays in processing relocations.

The kernel patches disable a bunch of stuff, including ethernet it looks like? Most of the kernel changes comment out blocks of code, or trade long delays for shorter delays with more iterations.


Yeah, this appears to take advantage of the fact that if you know exactly what hardware you're working with, you can skip a whole lot of detection and general support, which helps quite a bit for embedded but less so on ex. your laptop. Honestly, while there may well be dependency issues, I was under the impression that at least the kernel side of things generally is pretty well optimized, and in most cases you're paying for flexibility.


dracut on RHEL7+ builds the initramfs in "host_only" mode, which attempts to strip it down to just the kernel modules you need for boot time.

Which is also somewhat annoying because it completely trashes portability, which can be really irritating in cloud environments.

Ubuntu has similar capability (and I'm guessing Debian upstream?), but you have to specifically enable it, by default it ships the full modules set in the initramfs.

I would imagine most places outside of embedded world and maybe microvms, this stuff isn't that valuable.


Slight correction: by default Debian/Ubuntu put all modules potentially required for initial boot into the initramfs. That's still a very small percentage of all modules. I.e. you only need those modules that will get you so far into the boot process that you have access to the root partition with all the remaining modules.

E.g. if you want to do network-boot over wifi, you'll have to add a initramfs-hook script to add the wifi modules for your hardware into the initramfs [1]. They are not included by default.

[1] http://www.marcfargas.com/posts/enable-wireless-debian-initr...


You're right, I misinterpreted what "most" meant in the mkinitramfs config. Interesting. I've not seen any difficulties with porting Ubuntu between different hardware configurations, so it seems to include a reasonable amount of them.

Every now and then I'm tempted to try "dep" instead of "most", but then I realise there just isn't enough benefit!


Is there a way to specify when running mkinitramfs whether kernel modules are stored there or not?


Vaguely related: there's Firecracker which boots in 125ms on x86 but that's as a VM, so it's an apples to oranges comparison. From what I recall Firecracker powers AWS Lambda so it's an interesting project in that respect too.

https://firecracker-microvm.github.io/


I know Google cloud VM's are using kexec for faster launches; cause we had some awful toolchain related bugs to fix there. Debugging wasn't very fun, at least on x86 this part of the kernel is called "purgatory" cause there is literally no runtime (not even the kernel's "runtime" is available, mid-kexec).


I remember an article from over a decade ago about booting Linux on an embedded device in 1 second. The key was to modify driver init and bring up the critical stuff first such as disk and graphics and boot user space as fast as possible. Then worry about networking and so on.


I'd imagine that if init tried to use the network before the kernel's networking was initialized, you could deadlock though that'd be considered a bug in the init dependencies. The hard part is even visualizing the dependencies with init is tricky. I don't know if that's something systemd explicitly solves, but I try to stay out of userspace (unless the compiler is borked; narrator: it is)


Does anyone know why x86 systems are so slow to start up? I got a x570 mobo recently and it still takes about 5 seconds to get to grub.


Before grub you might have POST and possibly an intentional delay to allow for user input to enter EFI configuation.


Yes, most of the bios programs have the setting of "POST Boot Delay", which intentionally waits a couple of seconds to give user to interact with the bios, e.g. entering settings page or to allow user to change the boot drive.


It's performing various hardware tests before showing you the BIOS screen. And even then, it usually pauses the BIOS screen for about a second to give the user enough time to hit F12 or ESC if they want to change any settings. You could probably disable both of those if you really wanted to.


I've seen server boards wait like 30 seconds before going into BIOS. Consumer PCs care less about waiting for PSUs and VRMs to warm up but could be a reason why it's not superfast.


I've seen wildly varying start up times on x86 hardware. I had an Samsung Ativ Book 9+ running stock Debian that with a cold start had fully started X and showed the login prompt before the backlight turned on (about 1 second). I've also had the "pleasure" to manage some Dell servers that took an impressive 2 minutes just to get past the BIOS.

(Well I did turn off the 5 second delay in GRUB to make that laptop boot time possible.)


The reason it varies wildly is that the hardware and possible settings supported varies wildly on a DIY PC. A few things I noticed a few years ago while troubleshooting slow POST:

- Having a HDD connected to the additional SATA ports provided by an onboard controller (not the chipset) incurred a delay because the controller was initialized later (slower?), the HDD would power up later, and the POST sequence wouldn't finish until the HDD finished spinning up. Checking SMART on boot just made it worse.

- Switching between the 2 GPUs I had available at the time (one Nvidia, one AMD) consistently made a couple of seconds of difference.

- Using XMP made POST much slower too (don't remember by how much).

- Updating the FW on my SSD shaved a bit of time.

- Devices connected to USB during post also increased POST time.

And as a side note, my cheaper, simpler mobos would always POST faster. Gaming/OC mobos these days are loaded and it all adds up to what the PC has to do to initialize in POST.

Bottom line is it's easier to optimize for short POST when your config is locked in place (like a phone) then on a machine that could have any number of possible permutations of hardware and settings. Today that's x86.


> some Dell servers that took an impressive 2 minutes just to get past the BIOS.

I'm pretty sure I've seen worse, but in fairness you can only check 1T of RAM so fast:) (These were monster database servers.)



memory controller training takes fraction of a second


Not on some amd board. It's a common complain, my previous board would have a pre-bios delay of 5 to 10 second with xmp enable and still up to 5 without xmp. I changed board and it's now a lot better, but still slower than intel.

ps: xmp is probably an intel only name, but i cant remember the amd one.


DOCP for Asus, and one of the others has EOCP. I think it's a licensing thing where they don't want to pay for XMP.


It seems to be a lot of scanning of external busses.

If I toss a SATA optical drive on, it definitely slows things down.

I suspect a single NVMe drive and only keyboard and mouse plugged in would boot faster.


<joke>

Perhaps it's because minix in the Management Engine has to boot before the BIOS can run so you can take it over remotely? OS bootup usually takes a while.

</joke>


I know this was meant in jest, but isn't that part of the system always "hot" as long as there is power, seeing as the entire point is to provide remote management capability?


memory training is slow. on x570 you should have a debug code indicator, follow the codes there to get an impression where the time is spent.


I wonder how much time is spent on getting the processor from 16-bit Real Mode and up to the full 64-bit mode.


A few tens of microseconds, most likely.


It is interesting how even people in this industry are sometimes orders of magnitude off when thinking about how quickly a modern computer is able to do something. Personally, I think it is because web development has trained people to believe that computers are slow.


Just switching is a few tens of instructions. Tens of microseconds should be more than enough to also build a non-trivial page table.


> Networking takes by far the longest time to get ready. The main reason is that Ethernet auto-negotiation takes a significant amount of time, about 1 to 3 seconds.

Is this a fundamental limitation of how auto-negotiation works? Is there a way to speed it up?


Set fixed values for the speed (i.e. force 1 Gbit/s if you know there will be a capable cable). I'm not sure on if the feature that detects crossover cables can be switched off in userspace.

As for the timing, as long as you don't specify speed / crossover detection you will always have some sort of physical link training and negotiation...


> Physical link training and negotiation

That still shouldn't take that long, though, should it? 3s sounds like some O(N^2) process is happening.

Keep in mind that this stuff is happening close to the metal, on a nowadays-unshared medium (no Ethernet hubs around any more), with negligible speed-of-light delays because the nearest switch is probably ~100ft away at most. If some high-level protocol like Steam Link can have no perceivable latency, then certainly PHY negotiation shouldn't.

My naive guess would be that the medium is speed-tested in order, first seeing if it works at 1Mbps, then 10Mbps, then 100Mbps, and finally 1Gbps; and alternating in the crossover-cable versions of those tests; satisficing with the last-achieved line rate when the next up-clocking fails.

If that's the case, then I have a feeling that modern hardware could get a bit of an advantage just from doing things in the opposite order: 1. optimistically assuming everything is set up for 1Gbps, and then, if not, ratcheting down the link-speed until the link starts working; and 2. only doing the crossover-cable tests after all the non-crossover tests fail.

You'd still have the same worst-case performance (3s) as before, but now that worst-case would be for old 1Mbps crossover cables: not a common case!


There is a thing called: compatibility.

Even when you don't have a hub anymore and Ethernet is not shared anymore, it is just a "anymore", which means it needs to respect those old things and need to test for it.

BTW, even that Ethernet is not a shared medium anymore is wrong. In Industrial and Automotive Ethernet we are back at SPE (Single Pair Ethernet) and working shared medium, because switched Ethernet is way to expansive.


You don’t test for the medium being shared/unshared; Ethernet is just a protocol that assumes a shared medium, and does https://en.wikipedia.org/wiki/Carrier-sense_multiple_access, even when there’s no benefit to it.

The reason that Ethernet can afford to do that even in entirely switched deployments, though, is that Ethernet’s CSMA is very aggressive/optimistic, meaning that there’s almost no overhead to it in the case that there really is nothing else sharing the medium. In fact, Ethernet’s “1-persistent” CSMA is effectively designed for low contention, falling over at high [100+ TXers] contention—which is why we don’t just use Ethernet over shared-medium WANs like a cable ISP’s (pre-fibre-backhaul) coax, but instead protocols like https://en.wikipedia.org/wiki/Asynchronous_transfer_mode.

My point with bringing up the low contention of modern media wasn’t that modern devices could somehow skip CSMA sense-idle altogether; but rather that, due to the aggressive nature of Ethernet’s CSMA, Ethernet when on a low-contention or no-contention media should have basically zero sense-idle overhead, which means one less thing standing in the way of fast Ethernet PHY autonegotiation in an archetypal modern deployment; and so one less reason to privilege the hypothesis of “it’s the laws of physics making PHY autonegotiation slow” over “Ethernet controllers are doing something dumb.”

Here’s something to chew on: USB is also a shared-medium PHY with many layers of legacy compatibility. And yet, on every OS I know of, a USB3 analog input device (e.g. a microphone) can go from “off/unplugged” to “negotiated, registered, driver up, and transmitting data to the host, that the host has an open buffer for such that it will acknowledge and process the data within the soft-real-time window”, all with 0.5s of delay or less.

Heck, the entire Bluetooth stack plus connections to pre-paired devices can come up faster than Ethernet—and Bluetooth sits on top of USB! Bluetooth comes up fast enough, that Apple bothers on its desktops to bring up the Bluetooth stack within EFI, finishing quickly enough that Bluetooth peripherals can be used to signal an interrupt to the EFI boot process within its ~1s interaction window. (We all know, meanwhile, what EFI under Ethernet control looks like: the modern server mainboard’s 6+ second “IPMI autoconfig” delay.)


Comparing USB to Ethernet is difficult. Any USB-device is talking USB1.1 at startup. So negotiation basically is transferring a data packet with the capabilities at a pre-defined data rate.

While Ethernet has to negotiate the number of wires, full-duplex vs half-duplex (which depends on the number of wires). The code like Manchester vs. 4B5B.

The main difference is, in USB the host decides on what to talk. In Ethernet there is no such instance.


You're forgetting delays on the switch side like spanning tree checks, arp table population, DHCP. But yeah, there's no reason it shouldnt be improved.


None of which are required for link-level autonegotiation, however. Those are later steps.


1 Megabit ethernet was never a thing.


Pedantic but... 802.3e


Oh, you are right! Do any modern network cards support this?


Not a chance. The claim that ethernet starts at 10Mbit is basically correct. It's not like StarLAN got the kind of adoption that thicknet or thinnet got. It came out after those two standards but was 1/10th the speed and AT&T put out StarLAN 10 just a year later.


At 1Gb/s and higher link speeds, there really is no such thing as disabling auto-negotiation. What you are setting are the _advertised_ speeds, but auto-negotiation still happens.


Could it not start before the filesystem is available?


> A reboot is even faster, only 0.26 seconds from issuing the reboot to entering user space.

Curious; it doesn't say how that works. Could be kexec, but if it's a real reboot then I'd be interested to know why it's faster. Can you still skip some hardware initialization somehow?


I'm curious as well, and yes, by reboot I mean calling "reboot(RB_AUTOBOOT)". I was under the impression that the ROM code has to start over from the beginning, which I think it does. Maybe it can skip some initialization since the SoC is already up and running. Maybe there is some initial hardware setup that is only done at power on. It's hard to tell as this part of the boot sequence runs closed source NXP code.


Soft boot doesn't have to read the bootloader from ROM again.


> Start with MMC clock frequency at 52 MHz instaed of 400 kHz.

Whoa there!

Spec violation. Not guaranteed to work. Might work some days but not others.


The goal is to not configure the MMC at all in Linux, but just rely on the bootloader has already configured it.

Btw, where can I find the spec? And where in the spec can I read about this? Is this true even if we know the MMC supports 52 MHz?


https://www.jedec.org/sites/default/files/docs/JESD84-B451.p...

You can use http://bugmenot.com/view/jedec.org to get free-to-use credentials.

The section in point is "A.6.1Bus initialization".

I'd say that if your MMC works fine going full-speed out of the gate and the IC on the board supports this speed, then you unlikely to encounter any issues.


Thank you very much. Very helpful answer. I'll continue to use 52 MHz until I encounter problems (if any).


Spec says initialization shall be done at 400KHz or less. You can Google for sd spec


Will fix it at some point =)


I have an intel system that takes 20x as long just to beep that there's no keyboard.


Are super-quick boots vulnerable to having a (presumably?) lower entropy pool exploited or do the steps taken to mitigate low entropy across freshly minted cloud images also help here?


Isn't that more a question of how repeatable the process is?


The hardware looks awesome! Looks like it hasn't been updated in 2 years though, has anyone produced these PCBs based on the Jiffy?

Is that an EMMC Socket on there?


Why can't network boot up be done asynchronously?


You are right, it probably could. I should try compiling the fec-driver into the kernel and make it asynchronous. Should save a couple of milliseconds, and hopefully not delay entering user space.


Isn't network bring up done asynchronously in distros these days? I see my network interface not up and running even after the system is fully booted.


I'm just referring to the fec driver kernel module, not the user space software. But I might of course be wrong. I've done lots of iterations trying to optimize the boot time on this tiny embedded system, and it's not always easy to remember all details. It could be that the fec driver is asynchronously probed. I guess I have to try it again at some point. =)


“Async MMC and FEC (Ethernet) driver probes to do other initialization in parallel.”

I think it is already is




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: