This is no better than ssh in a loop, which is trivially done by a shell script - no systemd needed.
However, when you have shitty NAT routers (SonicWall, any AT&T fiber device, for instance), the connections will be timed out or will die and there'll be long periods where you're waiting for the next iteration of the loop, and/or sometimes it'll get stuck and never try again.
autossh deals with this by actually passing traffic and taking action if traffic doesn't move.
Today I learned that some people make mistakes, but I already knew that ;) ServerAliveInterval doesn't do this properly and consistently.
I've used my own autossh type script for two decades now. It's mostly used to give access to machines behind shitty NAT, and/or that have addresses that constantly change, and/or for systems on CGNAT, like Starlink.
If ServerAliveInterval works so well and negates the need for something like autossh to exist, then why have sessions created by my script, which has ServerAliveInterval (and ServerAliveIntevalMaxCount) gotten hung up where the script needs to kill the old and create a new ssh connection now and then? My script logs each timeout, each session hang, and each new connection, and depending on the network, it can happen often.
Please read the bit where it's explained how autossh sends test data back and forth. Do you think you just magically and cleverly discovered ServerAliveIntevalMaxCount and that the autossh people have no idea that it exists?
Or perhaps they know it exists, they know it's not perfect, and they used another mechanism to make up for the shortcomings of what ssh offers out of the box?
> For example, if you are using a recent version of OpenSSH, you
may wish to explore using the ServerAliveInterval and
ServerAliveCountMax options to have the SSH client exit if it
finds itself no longer connected to the server. In many ways
this may be a better solution than the monitoring port.
Just to clarify that we're talking about the same thing in case I misunderstood something: autossh (style) scripts do these things:
1. fake data to keep a connection "fresh" for shitty middleware
2. detect connection which are stuck (state = open, but no data can actually round trip) and kill them
3. restart ssh when that happens
Is that what we're talking about here? I think people are saying that points 1 and 2, but not 3, are covered by SSH's ServerAlive* options. And that's also how OpenSSH advertises and documents those options, and apparently even how autossh talks about it in their own readme.
You're saying that those options don't actually solve points 1 and 2, while (your/their/etc) autossh does properly detect it.
Correct so far?
If so that seems like a bug in OpenSSH (or whatever implementation) which should get appropriate attention upstream. Has anyone reported this upstream? Is there a ticket to follow?
PS: I think we're all in agreement that option 3 is out of scope for stock OpenSSH (regardless of what other tools do)
I haven’t revisited this issue in years but on a project for thousands of similar devices we found autossh much more reliable.
I believe the issue is that the connections often fail or get wedged in other network layers; the only way to be sure that your ssh tunnel isn’t: a) lossy enough to “keep alive” but too lossy to send data, or b) isn’t just always waiting on TCP retry backoff, or c) etc, is to use the tunnel to transmit actual data at the application level.
> is to use the tunnel to transmit actual data at the application level.
Isn't that exactly what ServerAliveInterval does? The man page says: "ssh(1) will send a message through the encrypted channel". A plain TCP keepalive wouldn't count as being "through the encrypted channel".
I don't know if I would call it smoke and obfuscation, at the time systemd was not widely deployed and the ssh functionality was not as developed, so it made sense to use autossh. Now it sounds like it doesn't make sense anymore. It happens.
You summarized things well. #2 is the primary reason that ssh in a loop doesn't work as well or as reliably as autossh (the program discussed here; it's just coincidental that my own automatic ssh script is also called autossh).
This approach works very well. I've had dozens of extremely remote systems hooked up this way for about 8 years. The only problem I've seen is that occasionally the server ssh process will get stuck, so you have to log in to the server and kill it. It seems to happen when a remote goes offline and reconnects without closing the old connection first.
If I were doing it now, I'd probably use wireguard, probably. This is simpler to set up and works great.
> The only problem I've seen is that occasionally the server ssh process will get stuck, so you have to log in to the server and kill it.
You also need ClientAliveInterval on the server side (in addition to ServerAliveInterval on the client). In other words, both the client and the server need to be configured to monitor the connection. With this setup I had no issues with reconnections.
systemd's RuntimeMaxSec should help in this case but I've never had trouble with sshd personally
To add more context I use the above service to ssh from my phone to my laptop via my desktop PC. The service runs on my laptop and binds port 22 of my laptop to port 7070 of my PC but wiregaurd would probably work similarly
closing ssh doesn't close the ports if they are being used, at least with ControlMaster. You need to run something like this to force the ssh daemon to close the port
ssh -O cancel -L 4102:localhost:4000 pc
but if ControlMaster is stuck maybe autossh is better in that case, or use this:
This is not very dissimilar from how the RIPE Atlas software probe (debian package) maintains a persistent SSH command/control session to the anchors and RIPE infrastructure. As I recall it installs itself as a systemd service.
No passphrase for the key?
What about spotty connection? Doesn't WantedBy block startup on this starting properly? (I'm pretty sure I've been soft locked out of my computer when Comcast decides to do Comcast things.
Ok now how to tell if the connection is running in the systemd status output? Cos this will show as active even when the connection is down or trying to reconnect.
exits (and so restarts) every 20min, e.g ensuring there's no hung sshd on the other side for longer than that.
IIRC if there's an active connection on the forwarding thingy, that ssh command won't exit until the forwarded connection is closed, so this won't interrupt an active forwarded connection every 20min.
It doesn't by default, but you can set the AUTOSSH_GATETIME environment variable to 0 so that autossh retries even if the first connection attempt fails.