Is it really the 30th anniversary coming up? :mind-blown:
I'm trying to finish draft 1 of my C Guide (which has turned into a monster--never attempt to write a _comprehensive_ language guide), and it's comments like these that really keep me going. Your kind words literally bring tears to my eyes.
I'm just genuinely so glad that it has been so helpful, and I never would have guessed it would have been useful for so long. And I'm smugly happy to contribute to the information-sharing ad-free small-web Internet in my own little way.
And thank you to everyone who has read it and to everyone who has sent in corrections and bug reports. I still do update it!
hey beej, your voice, clarity and depth are a shining light in programming tutorials. The fact that these are up for free is a little mindblowing - thanks mate!
I'll never not recommend this book. Fantastic and free, but you can get an official paperback book these days too. It's well written if you want to learn basic networking with the BSD sockets API. It's also the funniest software book I've ever read. A lot of programming book authors have some charm to their writing, but Beej is on another level.
I came into the thread to comment I'll never not smile when I see beej's guide pop up.
back in the late 90's/early aughts it was absolutely the best way to learn network programming using BSD sockets. It originally picked it up to better understand circlemud code in college, it will always hold a special place in my heart.
When I read this through 25 years ago I learned more about networking than I think I knew in total up until that point, and that was nearing the end of an A level (English further education) Computing course. It's a really comprehensive guide that laid it out exactly the way I needed it for me to absorb it. I still recommend it to people that might be new to network programming as the sockets API really doesn't change that much whether you're using C or Python or some other language.
I think anyone who wants to get into network programming, even if they don't plan on doing it in C, should read this. It's what helped things finally "click" well over a decade ago when I first read it.
Started reading... I'm trying to understand the difference between connectionless (datagram sockets) and persistent connection (stream sockets).
The thing I've realised is that I don't understand what a connection actually is. So I don't understand this bit
"Why are they connectionless? Well, basically, it’s because you don’t have to maintain an open connection as you do with stream sockets. You just build a packet, slap an IP header on it with destination information, and send it out. No connection needed"
How can anything be sent with no connection? What is a connection?
A connection in the context of TCP is essentially the state related to the handshake.
With UDP, you build a packet, slap an IP header on it, and send it out in the hopes that the other side receives it.
With TCP, you can't just send data, you have to perform a three-way handshake first: send a packet with the SYN flag set, receive a SYN-ACK, and if you received a SYN-ACK send an ACK back.
Stateful firewalls, for instance, track the connection state for Network Address Translation (NAT) or firewalling purposes. When a TCP connection is opened (SYN), the connection is considered 'new'. After the handshake is completed (SYN-ACK, ACK), the state changes to 'established'. The lifetime of the connection ends after a TCP packet with the RST or FIN flags set, and the state changes to 'closed'.
Cool, related things to read up on: Linux conntrack, TCP reordering and retransmission, Stream Control Transmission Protocol (SCTP), Multi-Path TCP (MPTCP), Internet Control Message Protocol (ICMP, also known as "ping").
So is a connection really just the maintained state in both the sender and receiver machines. What is maintained in that state? The ACK flags and the IP of the other machine?
The IP, port and sequence numbers (which is basically a starting random number + number of bytes sent/received) of both sides of the connection, and this is maintained by both sides.
TCP - connection centered (without going into nitty-gritty)
it sets up a connection between two communicating computers that has "state"; connection setup, connection ongoing, and connection teardown. The data may arrive out of order, but the "receiver" of a stream will rearrange them within a limited window, so there are in order, and acknowledge reception of the packets as part of the protocol (TCP), higher layers like your software generally can rely on it for that.
UDP: connectionless, just label the packet with the destination and send it off. No acknowledgement at the UDP layer that the packet was ever received. No order guarantees. The upper layer (usually your software) must manage reception of the packet, ordering, AND request the packet be sent again in those cases where they -must- be received (like a file). Some applications don't require that (say you're talking on a phone call and 1 out of 1000 sound packets drop. It doesn't matter, as the packet is useless if it's outside a few milliseconds of being sent. It's simply dropped, and no request is sent back to try and get it again.
I actually remember being a bit confused by this too. Hopefully I can help:
These refer to 2 different protocols at the same layer: UDP, and TCP. Below these protocols there are layers that route little messages around local networks (like Ethernet) and that route little messages across the Internet (like IP), and UDP and TCP are another layer of abstraction on top of that.
TCP adds information about the ordering of messages, it has a mechanism for acknowledging receipt of a message, it has logic for resending a message, etc. UDP does not include these concepts, so if a message gets lost somewhere in the network, an application using UDP might not even notice.
The "connection" in this case, really refers to the state about these messages that is kept. It's a virtual / logical connection, not a physical connection.
Ah this does make sense. I'm guessing that when the sending machine starts a new TCP connection to another machine, the reciever sees that it is a TCP message and starts a TCP "session" (I'm not sure that's correct terminology) which maintains required state on its side and sender maintains its own state too?
"Session" is a good way to think of it. If it's inactive for too long, it can be closed and forgotten, etc. But "connection" is the correct terminology. You just sometimes need to be clear that you're talking about a TCP connection and not a physical connection, or even the logical connection that makes IP packets routable. It's one more layer above that.
But yes - both sides maintain state about the connection. Senders will re-send packets if they go too long without being acknowledged. They will slow down large batches of sending if lots of messages are being dropped (essentially throttling in case the network is being overwhelmed and they're making it worse). And receivers will acknowledge packets as they're received and assemble messages back in the right order if packets arrive out-of-order, or bits in the middle are missing.
As far as I understand it, a "connection" in this context is a line of communication between you and some other machine that is ready to send or receive data.
Think of sending a text (connectionless) vs making a phone call (connection-oriented. why it's not called "connectionful" is beyond me).
When you send a text, you hope that the recipient gets it. If they don't, you send it again until they do.
Whereas when you make a phone call, the call is live the minute you dial the number. You're not always talking to the recipient, though; you might be waiting for them to pick up, trying again because the number is busy, in call waiting, etc.
(Packets transmitted via VoIP phone calls are usually UDP and texts _can be_ sent over TCP IIRC, but that's neither here nor there :D)
The responses you've got so far aren't wrong, but there's an easier way to think about this:
Connectionless: Drop a postcard in the mail and stop thinking about it. Maybe it gets there, maybe not.
Connection-oriented: Start a correspondence. If you don't get a response after a while, send your letter again. If you get parts 1, 2, and 4 you say "Hey, I missed #3, send that again please!" You keep thinking about the flow of conversation whether you have a message in hand right now or not.
The "connection" is in your memory. On a computer, that may involve persistent memory use on network hardware, by your OS kernel, and in the program sending the messages while the connection exists.
A TCP connection is a virtual connection. The two sides of the connection maintain state and use this to turn packets into a stream of data. Since IP doesn't guarantee delivery at all, much less in-order delivery, this requires some work, and maintaining state on both ends.
UDP is a stateless protocol. If you listen on port 12345, you will get every packet that comes in on port 12345. Some packets sent won't make it, others might arrive twice, others might arrive out-of-order.
I think that's referring to a tcp connection right. Basically with UDP you can just start spamming out packet wherever you want but TCP requires a bit of setup with the handshake before you start sending the actual data. Networking definitely gets confusing with the billion different layers of connections.
Your comment is emblematic of the silliness prevalent on HN when it comes to studying from textbooks. Publishing date has nothing to do with core concepts in any field. Only pedagogy with an eye to clarity and appeal to intuition matters. All newer concepts are built over existing fundamental layers which is what one should study first. My recommendation (i have substantial experience in the area of network protocols and programming) is excellent in that regard; the only one that i know of which deals with different types of networks and the applicable theory (eg. queuing, scheduling, flow control etc.) for each.
The core Internet protocols (IP, TCP, UDP) haven't changed much, other than IPv6. We now have some amazing TCP refinements for congestion control and packet loss. In general, anything you learned about this stuff 30 years ago would still apply.
Most (all?) modern technology is built atop things that came before. Many texts like this one are still useful (although I can't speak to this specific book).
To understand a TCP connection, e.g., I imagine a book from the 1980s would be perfectly fine.
> How can anything be sent with no connection? What is a connection?
Think of a wired network (TCP) vs radio/television broadcast (UDP). There really never was a "connection" it is just a logical concept/abstraction that means an endpoint can be reached, when no data is sent/received there is no connection.
Broadcast or UDP is a "connection" but more like a tuner. UDP datagrams are sent out and don't need to be ACK'd and may never be received, they just broadcast to where you tell them to go. They might even be received out of order where you can discard previous ones if they aren't needed. Note: You can do Reliable UDP and ACK any important messages with a UDP datagram back. Most highly available real-time systems and games use UDP with some sprinkle of RUDP when needed. Example: player positions or actions across the level don't matter to you, can be received and rendered or not. Global state like level starting, level ending, you want to ACK those back to unify the simulation on important states. Any critical message you mark for ACK thus the "reliable" part, it also handles discarding out of order messages which happens with UDP broadcast.
Wired or TCP is more like a stream and has a "connection" and handles all the ordering, verification and ACK backs for you. This has lots of overhead and isn't great for gaming beyond like turn based or simple networking, it works great for sending a web page or file though because all parts are necessary.
Streaming or SCTP-like it really RUDP with more standards around it. It is a combination of the TCP where needed and UDP and can be direct or broadcast.
All types of "connections" are really virtual/logical connections not actual connections.
Gaffer on Games also has some great overviews on these topics and is a must read like Beej's.
I love this guide! I gave it as a reference to my students when I was teaching networks a few years ago. At the time it didn't have IPv6 taken into account! I'm so glad it is the case now, especially since I'm going to teach system and network programming against next semester =).
The other reference that I really like to give students about networks is Michal Zalewski's Silence on the wire [1]. A really great introduction that can be read cover to cover — almost as a novel — despite being really technical.
I was in college 95-99 and this was THE source for this type of information. As soon as you asked a question you'd be told "beej". The specific assignment that I remember we had was to write a web server, in C, I think on Solaris boxes back then. All of what is now in Section 5 was the perfect source here. The only alternative at the time I recall was manpages which are good, but don't explain it as well especially how the calls relate to each other. We didn't have IPv6 to deal with then but the IP block/ordering stuff was also quite useful.
This and W. Richard Steven's TCP/IP Illustrated (which covers the Layer 2/3/4 protocols in much more depth) are all you need. Okay, probably not _all_, but a really good chunk.
Absolutely would not have made it through my university capstone project with out this document! The team even donated our pathetic amount of spare cash when we were done.
I'm trying to finish draft 1 of my C Guide (which has turned into a monster--never attempt to write a _comprehensive_ language guide), and it's comments like these that really keep me going. Your kind words literally bring tears to my eyes.
I'm just genuinely so glad that it has been so helpful, and I never would have guessed it would have been useful for so long. And I'm smugly happy to contribute to the information-sharing ad-free small-web Internet in my own little way.
And thank you to everyone who has read it and to everyone who has sent in corrections and bug reports. I still do update it!