Hacker News new | past | comments | ask | show | jobs | submit login

If you have systemd, you could do this:

    [Unit]
    Description=look ma, no autossh
    After=network.target
    
    [Service]
    Type=exec
    ExecStart=/usr/bin/ssh -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -Nn -R 7070:localhost:22 pc 'sleep 20m'
    Restart=always
    RestartSec=20
    RuntimeMaxSec=30m
    
    [Install]
    WantedBy=default.target



This is no better than ssh in a loop, which is trivially done by a shell script - no systemd needed.

However, when you have shitty NAT routers (SonicWall, any AT&T fiber device, for instance), the connections will be timed out or will die and there'll be long periods where you're waiting for the next iteration of the loop, and/or sometimes it'll get stuck and never try again.

autossh deals with this by actually passing traffic and taking action if traffic doesn't move.


> autossh deals with this by actually passing traffic and taking action if traffic doesn't move.

The `ServerAliveInterval` option above achieves this.


No, it actually doesn't, or at least not properly. It's not hard to get ssh sessions that are wedged.


Have you reported this as a bug then?


Probably works as intended.


If you read what the person wrote, you'll see a ServerAliveInterval.

If there are ServerAliveIntevalMaxCount (defaults to 3) attempts that fail, the ssh connection will drop. And systemd will restart it.

Today you learned. Nice. I've dropped autossh for years and you can too, even on flaky connections.


Today I learned that some people make mistakes, but I already knew that ;) ServerAliveInterval doesn't do this properly and consistently.

I've used my own autossh type script for two decades now. It's mostly used to give access to machines behind shitty NAT, and/or that have addresses that constantly change, and/or for systems on CGNAT, like Starlink.

If ServerAliveInterval works so well and negates the need for something like autossh to exist, then why have sessions created by my script, which has ServerAliveInterval (and ServerAliveIntevalMaxCount) gotten hung up where the script needs to kill the old and create a new ssh connection now and then? My script logs each timeout, each session hang, and each new connection, and depending on the network, it can happen often.

Please read the bit where it's explained how autossh sends test data back and forth. Do you think you just magically and cleverly discovered ServerAliveIntevalMaxCount and that the autossh people have no idea that it exists?

Or perhaps they know it exists, they know it's not perfect, and they used another mechanism to make up for the shortcomings of what ssh offers out of the box?


The README has this text:

> For example, if you are using a recent version of OpenSSH, you may wish to explore using the ServerAliveInterval and ServerAliveCountMax options to have the SSH client exit if it finds itself no longer connected to the server. In many ways this may be a better solution than the monitoring port.


Just to clarify that we're talking about the same thing in case I misunderstood something: autossh (style) scripts do these things:

1. fake data to keep a connection "fresh" for shitty middleware

2. detect connection which are stuck (state = open, but no data can actually round trip) and kill them

3. restart ssh when that happens

Is that what we're talking about here? I think people are saying that points 1 and 2, but not 3, are covered by SSH's ServerAlive* options. And that's also how OpenSSH advertises and documents those options, and apparently even how autossh talks about it in their own readme.

You're saying that those options don't actually solve points 1 and 2, while (your/their/etc) autossh does properly detect it.

Correct so far?

If so that seems like a bug in OpenSSH (or whatever implementation) which should get appropriate attention upstream. Has anyone reported this upstream? Is there a ticket to follow?

PS: I think we're all in agreement that option 3 is out of scope for stock OpenSSH (regardless of what other tools do)


I haven’t revisited this issue in years but on a project for thousands of similar devices we found autossh much more reliable.

I believe the issue is that the connections often fail or get wedged in other network layers; the only way to be sure that your ssh tunnel isn’t: a) lossy enough to “keep alive” but too lossy to send data, or b) isn’t just always waiting on TCP retry backoff, or c) etc, is to use the tunnel to transmit actual data at the application level.


> is to use the tunnel to transmit actual data at the application level.

Isn't that exactly what ServerAliveInterval does? The man page says: "ssh(1) will send a message through the encrypted channel". A plain TCP keepalive wouldn't count as being "through the encrypted channel".


Honestly at this point Im out of date, but autossh also takes care of bugs or connection issues within the ssh link itself


So does ssh now.

So much smoke & obfuscation. Autossh itself mentions ServerAliveInterval. It's worked flawlessly on all kinds of dodgy connections for me.

If anyone has any damned bug reports, link them.


I don't know if I would call it smoke and obfuscation, at the time systemd was not widely deployed and the ssh functionality was not as developed, so it made sense to use autossh. Now it sounds like it doesn't make sense anymore. It happens.


You summarized things well. #2 is the primary reason that ssh in a loop doesn't work as well or as reliably as autossh (the program discussed here; it's just coincidental that my own automatic ssh script is also called autossh).


This approach works very well. I've had dozens of extremely remote systems hooked up this way for about 8 years. The only problem I've seen is that occasionally the server ssh process will get stuck, so you have to log in to the server and kill it. It seems to happen when a remote goes offline and reconnects without closing the old connection first.

If I were doing it now, I'd probably use wireguard, probably. This is simpler to set up and works great.


Can't you just add something like ServerAliveCountMaxto help with solving stale connections?

So something like that would solve that

[Unit] Description=look ma, no autossh After=network.target

[Service] Type=exec ExecStart=/usr/bin/ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=3 -o ExitOnForwardFailure=yes -Nn -R 7070:localhost:22 pc 'sleep 20m' Restart=always RestartSec=20 RuntimeMaxSec=30m

[Install] WantedBy=default.target


The default of ServerAliveCountMax is already 3


> The only problem I've seen is that occasionally the server ssh process will get stuck, so you have to log in to the server and kill it.

You also need ClientAliveInterval on the server side (in addition to ServerAliveInterval on the client). In other words, both the client and the server need to be configured to monitor the connection. With this setup I had no issues with reconnections.


> ssh process stuck

systemd's RuntimeMaxSec should help in this case but I've never had trouble with sshd personally

To add more context I use the above service to ssh from my phone to my laptop via my desktop PC. The service runs on my laptop and binds port 22 of my laptop to port 7070 of my PC but wiregaurd would probably work similarly


RuntimeMaxSec would have systemd kill a live forwarded connection though?


closing ssh doesn't close the ports if they are being used, at least with ControlMaster. You need to run something like this to force the ssh daemon to close the port

    ssh -O cancel -L 4102:localhost:4000 pc
but if ControlMaster is stuck maybe autossh is better in that case, or use this:

    Host *
        ServerAliveInterval 11


This is not very dissimilar from how the RIPE Atlas software probe (debian package) maintains a persistent SSH command/control session to the anchors and RIPE infrastructure. As I recall it installs itself as a systemd service.

https://atlas.ripe.net/docs/howtos/software-probes.html


No passphrase for the key? What about spotty connection? Doesn't WantedBy block startup on this starting properly? (I'm pretty sure I've been soft locked out of my computer when Comcast decides to do Comcast things.


No. WantedBy will have no impact on startup. Before or after would, but not Wantedby.


This is quite clean and tidy


Ok now how to tell if the connection is running in the systemd status output? Cos this will show as active even when the connection is down or trying to reconnect.


been doing this since 2012... autossh wasn't the solution back then even.

you want ServerAliveCountMax too but default is 3.


What is the reason to run 'sleep 20m'?


exits (and so restarts) every 20min, e.g ensuring there's no hung sshd on the other side for longer than that.

IIRC if there's an active connection on the forwarding thingy, that ssh command won't exit until the forwarded connection is closed, so this won't interrupt an active forwarded connection every 20min.


I think this is actually superior to autossh. Doesn’t autossh not restart after crash/reboot?


You could run autossh as a systemd service that starts on boot. :-)


I think you meant this as a joke, but this is what we landed on about a decade ago and it was the most reliable setup we found.


It doesn't by default, but you can set the AUTOSSH_GATETIME environment variable to 0 so that autossh retries even if the first connection attempt fails.


That's so much better than bourne/bash, which requires this monstrous wart of a code blob:

autossh() {

        # Tiny delay after failure in case of connection errors

        while ! ssh "$@"; do echo Restarting ssh "$@"...; sleep 1; done

}


[flagged]


Do we really still have to turn every conversation into systemd friction?


No. Some people use ssh while not running Linux, and not by running something exotic; macOS is widely popular.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: