Tailscale simp here, been using this feature since it launched in beta, can't believe it didn't exist earlier.
This solved every last remaining problem of my CGNAT'd devices having to hop through STUN servers (with the QoS being noticable), now they just route through my own nodes.
Not OP. Personal opinion on why it is a somewhat hard problem. The main problem is using the available compute correctly and productively while doing two very separate types of tasks that were previously solved independently: generating responses with llm inference engines and modifying weights with a training code. A step of training updates the weights so the inference engines have to adjust theirs, but we talk about 750B parameters and multiple inference servers. Stale weights can be used instead, but only for a tiny bit and the data from them needs special corrections that also involve large compute/memory. Your inference engines better be deterministic (for given pseudoRNG; it clashes with parallelism) or you have a way to correct the probability streams. Ideally inference and training should have same everything at the bit level when they handle the same context, but we dont live in that world yet. And of course, GPUs break. For no great reason, other than the tiny scale of their features making them fragile. And because you scale, you need to handle failures gracefully and efficiently.
Surely you could just pre-generate rollouts with slightly stale weights and then cheaply verify the rollout when up-to-date weights stream in by treating the former solution as speculative decoding. Sounds quite trivial to me, perhaps I'm missing something.
Cheap verifying of speculative decoding only works for a few tokens at a time. Long sequence generations (thousands to tens of thousands of tokens in typical rollouts for thinking models) are dominated by distribution drift on stale weights (because slightly wrong probabilities multiply over long streams), and the off policy RL training methods dont work well (high variance) for such high dimensional problems.
I don't think it's possible to separate any open source contribution from the ones that came before it, as we're all standing on the shoulders of giants. Every developer learns from their predecessors and adapts patterns and code from existing projects.
Exactly that. And all the books about, for instance, operating systems, totally based on the work of others: their ideas where collected and documented, the exact algorithms, and so forth. All the human culture worked this way. Moreover there is a strong pattern of the most prolific / known open source developers being NOT against the fact that their code was used for training: they can't talk for everybody but it is a signal that for many this use is within the scope of making source code available.
Yeah, documented *and credited*. I'm not against the idea of disseminating knowledge, and even with my misgivings about LLMs, I wouldn't have said anything if this blog post was simply "LLMs are really useful".
My comment was in response to you essentially saying "all the criticisms of LLMs aren't real, and you should be uncompromisingly proud about using them".
> Moreover there is a strong pattern of the most prolific / known open source developers being NOT against the fact that their code was used for training
I think it's easy to get "echo-chambered" by who you follow online with this, my experience has been the opposite, i don't think it's clear what the reality is.
If you fork an open source project and nuke the git history, that's considered to be a "dick move" because you are erasing the record of people's contributions.
The hard truth is that if you're big enough (and the original creator is small enough) you can just do whatever you want and to hell with what any license says about it.
To my understanding, the expensive lawyers hired by the biggest people around, filtered through layers of bureaucracy and translated to software teams, still result in companies mostly avoiding GPL code.
I’ve been thinking that information provenance would be very useful for LLMs. Not just for attribution (git authors), but the LLM would know (and be able to control) which outputs are derived from reliable sources (e.g. Wikipedia vs a Reddit post; also which outputs are derived from ideologically-aligned sources, which would make LLMs more personal and subjectively better, but also easier to bias and generate deliberate misinformation).
“Information provenance” could (and I think most likely would, although I’m very unfamiliar with LLM internals) be which sources most plausibly derive an output, so even output that exists today could eventually get proper attribution.
At least today if you know something’s origin, and it’s both obvious and publicly online, you have proof via the Internet Archive.
> I don't think it's possible to separate any open source contribution from the ones that came before it, as we're all standing on the shoulders of giants. Every developer learns from their predecessors and adapts patterns and code from existing projects.
Yes but you can also ask the developer (wheter in libera.irc, or say if its a foss project on any foss talk, about which books and blogs they followed for code patterns & inspirations & just talk to them)
I do feel like some aspects of this are gonna get eaten away by the black box if we do spec-development imo.
well, its not without issues. the actual motivation was not that dhcp is the suxxors, but to promote a model where the assigned prefix was free and highly dynamic.
the goal being to support a model where one could support multiple prefixes to handle the common case of multiple internet connections. more importantly to allow providers to shuffle the address space around without having to coordinate with the end organization. this was perceived to be necessary to prevent the v6 address space from accruing segmentation.
It's funny the "handle the common case of multiple internet connections" just doesn't work at all with ipv6 yet works much better under IPv4 NAT. With IPv6 each machine gets it's own routing table due to having two addresses which means I can't failover on the router when an ISP goes down. Machine will keep trying to use the ISP that is having 100% packet loss. I can't prioritize sending traffic out of one ISP because I'd need to configure it on each machine due to them having their own routing table. With IPv4 the router can handle those rules since its doing NAT for all machines in the network so it gets to choose.
The controller is annoying and changes completely every 6 months, and for home I use basically none of its features beyond configuring the AP. Virtualy all the issues I’ve had with Unifi APs were controller bugs, telling the AP firmware to do stupid things when it could have done literally nothing.
That said, I have some concerns that the OpenWRT AP firmware is not as optimized as the Unifi firmware is for that specific hardware. Mostly for wireless performance, but I also don’t want to hit some weird CPU bottleneck.
One thing I like about using OpenBSD for my home router is almost all the necessary daemons being developed and included with the OS. DHCPv4 server/client, DHCPv6 client, IPv6 RA server, NTP, and of course SSH are all impeccably documented, use consistent config file formats/command-line arg styles, and are privilege-separated with pledge.
Also it's a really well trodden path. You aren't likely to run into an OpenBSD firewall problem that hasn't been seen before.
Regarding any BSD used for any purpose, BSD has a more consistent logic to how everything works. That said, if you're used to Linux then you're going to be annoyed that everything is very slightly different. I am always glad that multiple BSD projects have survived and still have some real users, I think that's good for computing in general.
The recent addition of dhcp6leased is a great example: Built into the base system, simpler to configure than either dhcp6c or dhcpcd, and presumably also more secure than either.
Compared to working with iptables, PF is like this haiku:
A breath of fresh air,
floating on white rose petals,
eating strawberries.
Now I'm getting carried away:
Hartmeier codes now,
Henning knows not why it fails,
fails only for n00b.
Tables load my lists,
tarpit for the asshole spammer,
death to his mail store.
CARP due to Cisco,
redundant blessed packets,
licensed free for me.
pf has been ported to Debian/kFreeBSD, but afaik no effort has been made to port it to the Linux kernel. A lot of networking gear already runs a BSD kernel, so my guess is the really high-level network devs don't bother because they already know BSD so well.
I assume in this case they already had a bunch of firewall rules for PF and switching from OpenBSD -> FreeBSD is a much easier lift then going to linux because both the BSDs are using PF, although IIRC there are some differences between both implementations.
PF is really nice. (Source: me. Cissp and a couple decades of professional experience with open source and proprietary firewalls).
And if they are already using it on openbsd, it’s almost certainly an easier lift to move from one BSD PF implementation to another versus migrating everything to Linux and iptables.
I've gotta me-too this. I've written any number of firewall rulesets on various OSes and appliances over the years, and pf is delightful. It was the first and only time I've seen a configuration file that was clearly The Way It Should Be.
I'm pretty die-hard Linux, but I had a client who needed to do traffic shaping on hundreds or thousands of this ISPs users. I've tried multiple times to get anything more than the most simple traffic shaping working under Linux, with pretty bad luck at it. I set them up with a FreeBSD box and the shaping config, IIRC, was a one-liner and just worked, I never heard any complaints about it.
I've run a lot of Linux firewalls over the decades, but FreeBSDs shaping is <chefs kiss>
What features have you used for shaping with pf/FreeBSD? I remember (around 8ish years ago) using dummynet with pf, but it wasn't supported out of the box and I used some patches from the mailing lists for this purpose. It wasn't perfect, at times buggy. Back then ipfw had better support for such features, but I didn't like the syntax just as much as iptables. I eventually settled on Linux as I have grown to understand iptables (I hate that nftables is the brand new thing with entirely different syntax to learn again... and even requires more work upfront because basic chains are not preconfigured...) but traffic shaping sucked big time on linux, I never understood the tc tool to be effective, it's just too arcane. I always admired pf, especially on OpenBSD since it had more features but the single threaded nature killed it for any serious usage for me.
The user interface is literally 1000x better. That's all
Linux is enormously higher performance but it is a huge pain in the ass to squeeze the performance out AND retain any level of readability
which is why there are like a dozen vendors selling various solutions that quietly compile their proprietary filter definitions to bpf for use natively in the kernel netfilter code...
Too many random changes, too fiddly to maintain, too much general flakiness. Especially for simple single-purpose devices that you want to set up once and leave alone for years, BSD is generally much nicer than Linux. I'd actually flip your question: why would you ever use Linux rather than FreeBSD?
Do you have any specific examples where a Linux-based firewall was too "random" or "fiddly" or "flaky"? Or provide examples of ways that BSD "much nicer"?
It sounds to me like you picked a bad Linux distro for your use case.
I've seen plenty of single-purpose Linux-based network appliances, and none of them have come across as flaky or unreliable because of the OS. In fact they can be easier to use for people who have more operational experience using Linux already.
> Do you have any specific examples where a Linux-based firewall was too "random" or "fiddly" or "flaky"?
They switched out ifconfig for some other thing. There's been about 3 different firewall systems that you've have to migrate between. Some of the newer systems (docker and I think maybe flatpak/the other one) bypass your firewall rules by default, which is a nasty surprise. A couple of times I did a system upgrade and my system wouldn't boot because drivers or boot systems or what have you had changed. That stuff doesn't happen on FreeBSD.
I'm sure to someone who lives and breathes Linux, or who works on this stuff, it's all trivial. But if it's not something you work on day-to-day, it's something you want to set and forget as an appliance, Linux adds pain.
> It sounds to me like you picked a bad Linux distro for your use case.
Were there any grounds at all in what I said for thinking that, or did you just make it up out of blind tribalism?
> In fact they can be easier to use for people who have more operational experience using Linux already.
Of course, but that's purely circular logic. Whatever OS you use for most of your systems, systems using that OS will be easier for you to use.
tcp_pass = "{ 22 25 80 110 123 }"
udp_pass = "{ 110 631 }"
block all
pass out on fxp0 proto tcp to any port $tcp_pass keep state
pass out on fxp0 proto udp to any port $udp_pass keep state
Note last rule matching wins, so you put your catch-all at the top, "block all". Then in this case fxp0 is the network interface. So they're defining where traffic can go to from the machine in question, in this case any source as long as it's to port 22, 25, 80, 110, or 123 for TCP, and either 110 or 631, for UDP.
<action> <direction> on <interface> proto <protocol> to <destination> port <port> <state instructions>
The BSDs still tend to use device-specific names versus the generic ethX or location-specific ensNN, so if you have multiple interfaces knowing about internal and external may help the next person who sees your code to grok it.
One thing unexpected I found when setting up an OpenBSD based router recently: the web isn’t riddled with low-quality and often wrong SEO and AI slop about OpenBSD like it is for Linux. I guess there just isn’t enough money to be made producing it for it for such a niche audience.
If you search up a problem, you get real documentation, real technical blog posts, and real forum posts with actual useful conversations happening.
I've been using OpenBSD and PF for nearly 25 years (PF debuted December 2001). Over those years there have been syntax changes to pf.conf, but the most disruptive were early on, and I can't remember the last syntax change that effected my configs (mostly NAT, spamd, and connection rate limiting).
During that time the firewall tool du jour on Linux was ipchains, then iptables, and now nftables, and there have been at least some incompatible changes within the lifespan of each tool.
PF is also from 2001. But its roots go further back, I once used a very PF-like syntax on a Unix firewall from 1997. I forget which type of Unix it was, maybe Solaris.
Either way, I don't think there is any defense for the strange syntax of IPtables, the chains, the tables. And that's coming from a person who transitioned fully from BSD to Linux 15 years ago, and has designed commercial solutions using IPtables and ipset.
El Al 1862 was another flight [1] that had an engine fall off, taking another engine out with it. The pilots managed to fly around for a few minutes and attempt a landing, but there was too much structural damage.
It doesn't seem aircraft are designed to survive these types of catastrophic failures.
This solved every last remaining problem of my CGNAT'd devices having to hop through STUN servers (with the QoS being noticable), now they just route through my own nodes.
reply