How TCP backlog works in Linux

Animats · on Aug 1, 2015

The article kind of misses the point. The reason for having a separate queue for connections in a SYN-RECEIVED state is to provide a defense against SYN flooding attacks.[1] An incoming SYN has a source IP address, but that may be faked. In a SYN flooding attack, large numbers of SYN packets with fake source addresses are sent. The connection will never reach ESTABLISHED, because the reply ACK goes to the fake source address, which didn't send the SYN and won't complete the handshake.

Early TCP implementations allocated all the resources for a connection, including the big buffers, when a SYN came in. SYN flooding attacks could tie up all of a server's connection resources until the connection attempt timed out after a minute or two. So now, TCP implementations have to have a separate pool of connection data for connections in SYN-RECEIVED state. There's no data at that stage, so buffers are not yet needed, and a minimum amount of state has to be kept until the 3-way handshake completes. Once the handshake completes, full connection resources are allocated and the connection goes to ESTABLISHED state.

This has nothing to do with behavior of established connections, or connection dropping.

[1] https://en.wikipedia.org/wiki/SYN_flood

nly · on Aug 2, 2015

> There's no data at that stage

There can be. The initial SYN can contain a payload.

Animats · on Aug 3, 2015

True, although few implementations send data with SYN, because the BSD socket interface, which everybody uses, forces a full handshake before sending data. Some firewalls treat data with SYN as an attack.

ised · on Aug 1, 2015

"The solution suggested by Stevens... The problem with this is..."

I see no problem with it. But perhaps I am missing something.

"... an application is expected to tune the backlog..."

Two simple applications I use everday called tcpserver/sslserver and tcpclient meet this expectation.

See "-b" and "-c" switches.

Has the author looked at Stevens' own example?

http://www.icir.org/christian/sock.html

pests · on Aug 1, 2015

There was a recent video or article posted here discussing the poor interaction between Nagle's Algorithm/Delayed ACK/TCP slow-start and how it results in increased latency, especially for the first few packets.

From a first read it sounds like the decisions made in both BSD and Linux could also be adding to the latency problem for the first initial packets.

Have OSes checked how their TCP backlog implementation affects the various congestion control algorithms being used?

Animats · on Aug 1, 2015

Different problem.

The bad interaction between the Nagle Algorithm and Delayed ACK still irks me. I designed one, somebody at Berkeley designed the other, and by the time I found out about it, I was out of network architecture and doing something else for a different company.

That 200ms fixed timer in delayed ACK was a bad idea. The whole delayed ACK thing was a hack to reduce overhead for character-by-character Telnet, which mattered to Berkeley back then because Berkeley used a lot of dumb terminal servers. The fixed time delay is based on human response time and the time UNIX needed to process a character echo. If a typed character has to be echoed, the ACK can be piggybacked on the reply packet with the echo. This is one of the few cases in which delayed ACK is a win. It might also be a win with some quick request-reply APIs.

Really, delayed ACKs should be off by default, and should only turn on when the connection has been showing a consistent pattern of "packet received, ACK sent, application quickly transmitted reply so ACK could have been combined with reply."

John Nagle

dboreham · on Aug 1, 2015

I recall emailing you about this in 1996 when I worked on LDAP at Netscape and being rather surprised to receive a reply! Good times...

bbrazil · on Aug 1, 2015

I ran into the overflow behaviour with our source repository provider, as they'd get hammered on the top of every minute by all the continuous integration servers and silently drop connections. The specific version of SSH we were running didn't send the client banner until it received the server banner, so the connection just hung for 2 hours on the client.

After much debugging and reading of kernel source this was all figured out, and the provider adjusted things on their end so this wouldn't happen.

Moral of the story: You probably should set tcp_abort_on_overflow to 1.