This actually makes malware less of an issue, because you'd need to target both a desktop and a mobile device of a particular user in order to MITM. The only things that are happening 'automatically' is the request being sent to the user's mobile device, and the entire process can be verified cryptographically to ensure an attacker doesn't send a false request.
(HTTPS)
Client -> Server: LOG ME IN PLEASE
Server -> Client: Do you have a cookie?
Client -> Server: NO, BUT HERE's MY PASSWORD
Server -> Client: I've just sent you a request for an additional auth token. Please confirm it to log in. And please stop shouting.
(Email, SMS, HTTPS)
Server -> Client's Mobile Device: Here's a token. Sign it with some crap you've already got and give it back to me.
CMD generates new token after confirming origin of original request; uses software, or talks to hardware token.
(App, Website)
CMD -> User: IS THIS REQUEST OKAY? CAN I SEND THIS THING?
User -> CMD: Yeah sure, I requested this login. *pushes button*
(HTTPS)
CMD -> Server: HERE'S YOUR NEW SIGNED TOKEN
Server -> CMD: Thanks. Here's a cookie to remember me by.
Server -> Client: Page refresh! We got the right token from your mobile device, so here's your new cookie, you're logged in.
The only major security pitfalls here are in how you deliver either an app or a website to a user remotely, but there's plenty of good ways to do that already. You can even use boring old existing PKI.
You're not sending authorization "blind", though. You have to request the login from one device with a password, and approve it from another with a button push. In this way, only logins that you requested get sent to you, and only requests you approve get sent back, and both devices would have to be hacked by an attacker to intercept either part of the authentication process. Well, unless they both have your cracked password and inject malware on your mobile. Perhaps putting the button-pressy part on the hardware keyfob is safer :)
Even if none of this were cryptographically signed, look at the typical 2-factor SMS with this process. A user attempts a login and a text is sent to them with a code. The user then puts this code back into the login screen and presses return. If they could simply press a button on their phone they could approve the login without having to type the code in. Of course, we want to require a button press, because an automatic login means anyone who initiates the login (like an attacker with a stolen password) could log in automatically. And what i'm proposing is basically this, but with actual secure connections instead of SMSes or e-mails, and the ability to confirm on the device that the request was legit and not part of an elaborate phishing expedition.
All of this would work without a data connection because you could use QR codes to enter the challenge token into the device and it would spit out the response in a variety of formats.
I don't think FIDO conflicts with what you want to do. It simply doesn't mandate it.
If you wanted a 2nd channel confirmation like that, I suppose you could set up the service to link a mobile device (there might even eventually be some FIDO app that accepts FIDO challenge push notifications in a way that's not service-specific). Then FIDO would be used exclusively on the second channel (parenthentical stages 2, 3, 4 in your example). The web interface in that kind of setup would have nothing to do with FIDO.
I don't understand the MITM malware threat model you're envisioning that can hijack an in-band public key crypto exchange, but that can't simply hijack the session on the browser after you've completed (validly) the out of band verification.
Some of the complexity of FIDO is there to enable reasonably secure single-channel authentication (as might be expected, there are trade-offs between easy SSO and subdomain/origin policy security). But I think FIDO would still be valuable if you wanted to do out of band confirmation without a browser. The hardware dongles are standardized (although the requirements to unlock a key can vary from a button press like on the NEO to, hypothetically, biometrics in advanced dongles). Mobile apps would also be standardized, much like TOTP is now, so you could use any FIDO app with any FIDO dongle (or no dongle, if an app offers less secure on-phone secret storage). Without a hardware standard, you either have to store secrets on the mobile device, or you run into the existing problem of needing N hardware dongles for N services.
You're not sending authorization "blind", though. You have to request the login from one device with a password, and approve it from another with a button push. In this way, only logins that you requested get sent to you, and only requests you approve get sent back, and both devices would have to be hacked by an attacker to intercept either part of the authentication process. Well, unless they both have your cracked password and inject malware on your mobile. Perhaps putting the button-pressy part on the hardware keyfob is safer :)
Even if none of this were cryptographically signed, look at the typical 2-factor SMS with this process. A user attempts a login and a text is sent to them with a code. The user then puts this code back into the login screen and presses return. If they could simply press a button on their phone they could approve the login without having to type the code in. Of course, we want to require a button press, because an automatic login means anyone who initiates the login (like an attacker with a stolen password) could log in automatically. And what i'm proposing is basically this, but with actual secure connections instead of SMSes or e-mails, and the ability to confirm on the device that the request was legit and not part of an elaborate phishing expedition.
All of this would work without a data connection because you could use QR codes to enter the challenge token into the device and it would spit out the response in a variety of formats.