I am currently looking for ways to build a service that can handle around 100k-200k active concurrent websocket connections on production. It's wild seeing this article here. Does anyone know of any alternative ways to do this? Most people seem to suggest using Elixir but I wonder if I can achieve the same using a more "conventional" language such as Java or Golang.
Elixir is well suited to highly concurrent systems and work like this. I'm big on the whole Elixir ecosystem though so I haven't explored other options.
I don't see why there would be anything stopping Go from being similarly capable as it also has a good reputation for concurrency and what I hear does preemptive scheduling.
Java can probably do anything except be fun and lightweight so assuming you want to figure out the hoops to jump through. I assume it could..
Elixir can do it with the ergonomics and expressiveness of Python/Ruby. If you enjoy that level of abstraction I recommend it.
Do you have any pointer, book preferably, in starting an exploratory Elixir project? I don't have any objective apart from giving the ecosystem a taste
If you really want a book pick one from here [0]. First one is good.
Personally I think just following the official guide [1] will give you all you need to get a taste of the language and the platform and decide if you like it or not.
If you were talking about websockets in particular I guess realistically most people use Phoenix Channels [2] that give you websockets in ten lines of code.
We did this with Node.js and uWebSockets and it scaled easily to a few million web sockets on ~10 machines so I can confirm the stack works in practice
We used the C++ version of uWebSockets to replace a legacy node app. We went from four fully loaded cores to about 20% of a single core and a fraction of the memory usage. It's a great library.
It's unlikely you'd want to connect IOT devices to a backend using web sockets, I'd use a UDP based protocol for that, e.g. QUIC. But for web clients it makes sense.
Honestly, what matters is (a) what you're going to be doing with those connections and (b) your hardware.
As a generalization (again, really depends what you're going to be doing), I'd expect people to get a lot further with a Go or Java based implementations. Specifically, if those connections are interacting with each other in any meaningful way, I think shared data is still too useful to pass up.
I've written a websocket server implementation in Zig(1) and Elixir(2)
> Specifically, if those connections are interacting with each other in any meaningful way, I think shared data is still too useful to pass up.
What does this mean? What are some scenarios where connections interact with each other? I work with dotnet. To me, every request is standalone and doesn’t need to know any other request exists. At the most, I can see doing some kind of caching where if someone does a GET /person/12345 and someone else does the same, I maybe able to do some caching. However, I don’t think this is what you meant by shared data.
Did you mean like if someone does a PUT /person/12345/email hikingfan@gmail.com instead of the next get request reaching to the database, you keep it in the application memory and just use it?
Or am I completely missing the point and you’re talking about near real-time stuff like calls and screen sharing?
This is in the context of a websocket (which is what the original story is about). Presumably, websocket is being used because HTTP isn't enough, namely, you want to receive pushes from the server. This _often_ comes in the form of data that multiple connections are interested in: game state, chat, collaborative editing. At scale, this data, or a copy of it, often stays in memory. E.g. a chat system might keep a list of room + brief chat history + user list in memory. This memory is being mutated by concurrent connections.
Many languages (e.g, NodeJS) won’t even let you share code. So you can’t really do stuff like have hundreds of threads without being very careful with the size of your application code, because each thread will get a copy.
Pretty much any modern runtime (Java/Go/Node w/ native bindings) can handle that many connections per machine. You probably want to horizontally scale it with kafka or similar, but anyway, a single machine will work to start.
Considering someone had a 100k+ idle connections on a raspberry pi with Java/Netty, yeah, you could get to a million today with some mid tier hardware and Linux tuning, pretty easily.
.NET7 and Kestrel are likely able to pull this off if properly configured. Kestrel/AspNetCore routinely shows up in the top 10 techempower web benchmarks.
Node might be faster to write, but harder to maintain in the long run, also is not as reliable as Go or Rust, I personally will pick Rust because I have experience with it, but AFAIK Go has a very good reputation, the only "difference" with Rust is the GC (I mean "difference", because Go performance is not that far off from Rust, and seems also easy to write in Go than in Rust).
Also IMHO it's better to have a strong typed language behind your project, if it will be big, dynamic languages and big projects tend to be a nightmare for me.
Would you mind unpacking how, in your view, Go/Rust/compiled strongly-typed languages lead to more *reliable* software? I can see how performance and maintainability* are sort of self-evident arguments in favour of them, but not sure how reliability could be a feature inherent to a language/runtime.
* As a build/compile-time concern, using Node doesn't preclude strong-typing, so maintainability is also not a strong argument against the runtime itself, given you can use e.g. TypeScript.
I think this blog post[0] describes what level of reliability you can achieve with Rust, specifically:
> In fact, Pingora crashes are so rare we usually find unrelated issues when we do encounter one. Recently we discovered a kernel bug soon after our service started crashing. We've also discovered hardware issues on a few machines, in the past ruling out rare memory bugs caused by our software even after significant debugging was nearly impossible.
For sure not everyone will be able to achieve that in their first try or when getting started, but for sure is possible, but with Node I'm not confident enough to say that, for sure if works to hack something quickly and put in online, with Rust it takes longer and there are not too many platforms yet where you can easily deploy your app.
This article covers Node.js for me, I guess.