Lego blocks are how I like to think about software components... They may not be the perfect shape you need but you can iterate fast. In fact my favorite software development model is just to iterate on your lego blocks until the app you need is some trivial combination of your blocks.
Ok, maybe someone here can clear this up for me. My understanding of B+tree's is that they are good for implementing indexes on disk because the fanout reduces disk seeks... what I don't understand is in memory b+trees... which most of the implementations I find are. What are the advantages of an in memory b+tree?
You use either container when you want a sorted associative map type, which I have not found many uses cases for in my work. I might have a handful of them versus many instances of vectors and unsorted associative maps, i.e. absl::flat_hash_map.
Reverse mode differentiation? No, it can't be that natural since it took until 1970 to be proposed. But also in a sense basic (which you could also guess, since it was introduced in a MSc thesis).
Most of us that are somewhat into the tech behind AI know that it's all based on simple matrix math... and anyone can do that... So "inevitibalism" is how we sound because we see that if OpenAI doesn't do it, someone else will. Even if all the countries in the world agree to ban AI, its not based on something with actual scarcity (like purified uranium, or gold) so someone somewhere will keep moving this tech forward...
> Even if all the countries in the world agree to ban AI, its not based on something with actual scarcity (like purified uranium, or gold) so someone somewhere will keep moving this tech forward...
However, this is the crux of the matter! At issue is whether or not one believes people (individually and/or socially) have the ability to make large decisions about what should or should not be acceptable. Worse -- a culture with _assumed_ inevitability concerning some trend might well bring forth that trend _merely by the assumed inevitability and nothing else_.
It is obvious that the scales required to make LLM-style AI effective require extremely large capital investments and infrastructure, and that at the same time there is potentially a lot of money to be made. Both of those aspects -- to me -- point to a lot of "assumed inevitability," in particular when you look at who is making the most boisterous statements and for what reasons.
Integrating my time series database (https://github.com/dicroce/nanots) as the underlying storage engine in my video surveillance system, and the performance is glorious. Next up I'm trying to decide between a mobile app or AI... and if AI local or in the cloud?
Holy shit, is this the squatting man? (strangely similar stick figure cave drawings dating to the same timeframe all over the world, and reproduced apparently with high energy plasma experiment).
I am tech founder, who spends most of my day in my own startup deploying LLM-based tools into my own operations, and I'm maybe 1% of the way through the roadmap I'd like to build with what exists and is possible to do today.
The parent was contradicting the idea that the existing AI capabilities have already been "digested". I agree with them btw.
> And the progress seams to be in the benchmarks only
This seems to be mostly wrong given peoples' reactions to e.g. o3 that was released today. Either way, progress having stalled for the last year doesn't seem that big considering how much progress there has been for the previous 15-20 years.
> and I'm maybe 1% of the way through the roadmap I'd like to build with what exists and is possible to do today.
How do you know they are possible to do today? Errors gets much worse at scale, especially when systems starts to depend on each other, so it is hard to say what can be automated and not.
Like if you have a process A->B, automating A might be fine as long as a human does B and vice versa, but automating both could not be.
Not even close. Software can now understand human language... this is going to mean computers can be a lot more places than they ever could. Furthermore, software can now understand the content of images... eventually this will have a wild impact on nearly everything.
It doesn't understand anything, there is no understanding going on in these models. It takes input and generates output based on the statistical math created from its training set. It's Bayesian statistics and vector/matrix math. There is no cogitation or actual understanding.
This is insanely reductionist and mindless regurgitation of what we already know about how the models work. Understanding is a spectrum, it's not binary. We can measurably show that that there is in fact, some kind of understanding.
If you explain a concept to a child you check for understanding by seeing if the output they produce checks out with your understanding of the concept. You don't peer into their brain and see if there are neurons and consciousness happening
The method of verification has no bearing on the validity of the conclusion. I don't open a child's head because there are side effects on the functioning of the child post brain-opening. However I can look into the brain of an AI with no such side effects.
I'm reasonably sure ChatGPT doesn't have a Macbook, and didn't really run the benchmarks. But It DID produce exactly what you would expect a human to say, which is what it is programmed to do. No understanding, just rote repetition.
I won't post more because there are a billion of them. LLMs are great, but they're not intelligent, they don't understand, and the output still needs validated before use. We have a long way to go, and that's ok.
Understand? It fails with to understand a rephrasing of a math problem a five year old can solve...
They get much better at training to the test from memory the bigger they get. Likewise you can get some emergent properties out of them.
Really it does not understand a thing, sadly. It can barely analyze language and spew out a matching response chain.
To actually understand something, it must be capable of breaking it down into constituent parts, synthesizing a solution and then phrasing the solution correctly while explaining the steps it took.
And that's not even what huge 62B LLM with the notepad chain of thought (like o3, GPT-4.1 or Claude 3.7) can really properly do.
Further, it has to be able to operate on sub-token level. Say, what happens if I run together truncated version of words or sentences?
Even a chimpanzee can handle that. (in sign language)
It cannot do true multimodal IO either. You cannot ask it to respond with at least two matching syllables per word and two pictures of syllables per word, in addition to letters. This is a task a 4 year old can do.
Prediction alone is not indicative of understanding. Pasting together answers like lego is also not indicative of understanding.
(Afterwards ask it how it felt about the task. And to spot and explain some patterns in a picture of clouds.)
To push this metaphor, I'm very curious to see what happens as new organic training material becomes increasingly rare, and AI is fed nothing but its own excrement. What happens as hallucinations become actual training data? Will Google start citing sources for their AI overviews that were in turn AI-generated? Is this already happening?
I figure this problem is why the billionaires are chasing social media dominance, but even on social media I don't know how they'll differentiate organic content from AI content.
I really disagree. I had a masseuse tell me how he uses ChatGPT, told it a ton of info about himself, and now he uses it for personalized nutrition recommendations. I was in Atlanta over the weekend recently, at a random brunch spot, and overheard some _very_ not SV/tech folks talk about how they use it everyday. Their user growth rate shows this -- you don't hit hundreds of millions of people and have them all be HN/SV info-bubble folks.
That doesn’t match what I hear from teachers, academics, or the librarians complaining that they are regularly getting requests for things which don’t exist. Everyone I know who’s been hiring has mentioned spammy applications with telltale LLM droppings, too.
I can see how students would be first users of this kinda of tech but am not on those spheres, but I believe you.
As per spammy applications, hasn't always been this the case and now made worse due to the cheapness of -generating- plausible data?
I think ghost-applicants where existent already before AI where consultant companies would pool people to try and get a position on a high paying job and just do consultancy/outsourcing things underneath, many such cases before the advent of AI.
Yes, AI is effectively a very strong catalyst because it drives down the cost so much. Kids cheated before but it was more work and higher risk, people faked images before but most were too lazy to make high quality fakes, etc.
This is accurate, doubly so for the people who treat it like a religion and fear the coming of their machine god. This, when what we actually have are (admittedly sometimes impressive) next-token predictors that you MUST double-check because they routinely hallucinate.
Then again I remember when people here were convinced that crypto was going to change the world, democratize money, end fiat currency, and that was just the start! Programs of enormous complexity and freedom would run on the blockchain, games and hell even societies would be built on the chain.
A lot of people here are easily blinded by promises of big money coming their way, and there's money in loudly falling for successive hype storms.
Im not mocking AI, and while the internet and smartphones fundamentally changed how societies operate, and AI will probably do so to, why the Doomerism? Isn't that how tech works? We invent new tech and use it and so on?
What makes AI fundamentally different than smartphones or the internet? Will it change the world? Probably, already has.
Pretty much everyone in high school or college is using them. Also everyone whose job is to produce some kind of content or data analysis. That's already a lot of people.
Agreed. A hot take I have is that I think AI is over-hyped in its long-term capabilities, but under-hyped in its short-term ones. We're at the point today or in the next twelve months where all the frontier labs could stop investing any money into research, they'd still see revenue growth via usage of what they've built, and humanity will still be significantly more productive every year, year-over-year, for quite a bit, because of it.
The real driver of productivity growth from AI systems over the next few years isn't going to be model advancements; it'll be the more traditional software engineering, electrical engineering, robotics, etc systems that get built around the models. Phrased another way: If you're an AI researcher thinking you're safe but the software engineers are going to lose their jobs, I'd bet every dollar on reality being the reverse of that.
There seems to be a fundamental mismatch between how sane people think about sandboxing, and how linux manages namespaces.
A linux-naive developer would expect to spawn a new process from a payload with access to nothing. It can't see other processes, it has a read only root with nothing in it, there are no network devices, no users, etc. Then they would expect to read documentation to learn how to add things to the sandbox. They want to pass in a directory, or a network interface, or some users. The effort goes into adding resources to the sandbox, not taking them away.
Instead there is this elaborate ceremony where the principal process basically spawns another version of itself endowed with all the same privileges and then gives them up, hopefully leaving itself with only the stuff it wants the sandboxed process to have. Make sure you don't forget to revoke anything.
A lot of things break if there's no /proc/self. A lot more things break if the terminfo database is absent. More things break if there's no timezone database. Finally, almost everything breaks if the root file system has no libc.so.6.
When you write Dockerfiles, you can easily do it FROM scratch. You can then easily observe whether the thing you are sandboxing actually works.
> no users
Now you are breaking something as fundamental as getuid.
The modern statically linked languages (I'm thinking of Go and Zig specifically) increasingly need less and less of the cruft you mentioned. Hopefully, that trend continues.
> no users
I mean running as root. I think all processes on Linux have to have a user id. Anything inside a sandbox should start with all the permissions for that environment. If the sandbox process wants to muck around with the users/groups authorization model then it can create those resources inside the sandbox.
The things that break in C if /proc/self or the terminfo DB are missing will break in Go and Zig too.
What I think you might mean is something like: "in modern statically linked applications written with languages like Go and Zig, it is much less likely for the them to call on OS services that require these sorts of resources".
Or capabilities. Additive security has been known for decades; Linux really dropped the ball here. Linux file descriptors (open file descriptions, whatever) are close to a genuine capability model, except there's plenty of leakage where you can get at the insecure base.
> Instead there is this elaborate ceremony where the principal process basically spawns another version of itself endowed with all the same privileges and then gives them up
The flags to unshare are copies of clone3 args, so you're actually free to do this. There's some song and dance though, because it's not actually possible to exec an arbitrary binary will access to nothing.
But I think the big discrepancy is that there is inherently a two step process to "spawn a new process with a new executable." Doesn't work that way - you clone3/fork into a new child process, inheriting what you will from the parent based on the clone args/flags (which could be everything, could be nothing), do some setup work, and then exec.
> There seems to be a fundamental mismatch between how sane people think about sandboxing, and how linux manages namespaces.
What bothers me most about sandboxing with linux namespaces is that edge cases keep turning up that allow them to trick the kernel into granting more privileges than it should.
I wonder if Landlock can/will bring something more like FreeBSD jails to the table. (I haven't made time to read about it in detail yet.)
There is the later added posix_spawn, which could be implemented with a system call, even if on Linux it is emulated with clone + exec.
posix_spawn can do much, but not all, of what is possible with clone + exec. Presumably the standard editors have been scared to add too complex function parameters for its invocation, though that should not have been a problem if all parameters had reasonable default values.
yup! FreeBSD jails are essentially what OP wants with chroot++.
I was pretty puzzled when Docker and LXC came around as this whole new thing believed to have "never been done before"; FreeBSD had supported a very similar concept for years before security groups were added in Linux.
Jails and ezjail were stellar to make mini no-overhead containers when running various services on a server. Being able to archive them and expand them on a new machine was also pretty cool (as long as the BSD version was the same.)
this whole new thing believed to have "never been done before";
Nobody with knowledge of sandboxing believed this, Virtuozzo and later OpenVZ had been on Linux for a long time after all. Virtuozzo was even from a similar time frame as FreeBSD jails (2000-ish).
The key innovation of Docker was to provide a standardized way to build, distribute, and run container images.
Virsh had worked for a long time before docker came around, but yeah… you essentially had to build your own Docker-like infrastructure that only you were using
Plan9 had a proper solution for this. New processes don't get access to any files by default - you have to explicitly mount directories for them, capability style.
It has nothing to do with weirdness; Unix itself was plenty weird for its time. The relevant difference between Unix™ and Plan 9 is that Unix source code was given away (or cheaply licensed) to hardware companies which all wrote their own operating systems on top (SunOS, Ultrix, HP-UX, etc. etc.). This made Unix the common factor of very many commercial workstation environments. Plan 9? It was sold directly as a commercial product, for no hardware platform in particular. Nobody wanted to buy it.
People liked Unix because it was free – either really free, via BSD, or as a Unix derivative provided at no cost when people bought their workstations. A new revolutionary operating system had absolutely no reason for anybody to buy it: No commercial developers wanted to develop to a platform without users, and no users wanted a platform without software.
Plan 9 only changed their license many years later, when it was too late for anybody to care, and Unix had become the established standard.
As a number of comments have noted, there are a bunch of different axes that chroot could be 'better' on - e.g. security and sandboxing.
I wrote https://github.com/aidanhs/machroot (initially forked from bubble wrap) a while ago to lean into the pure "pretend I see another filesystem" aspect of chroot with additional conveniences (so no security focus). For example, it allows setting up overlay filesystems, allows mounting squashfs filesystems with an overlay on top...and because it uses a mount namespace, means you don't need to tear down the mount points - just exit the command and you're done.
The codebase is pretty small so I just tweaked it with whatever features I needed at the time, rather than try and make it a fully fledged tool.
(honestly you can probably replicate most of it with a shell script that invokes unshare and appropriate mount commands)
ironically docker never gave you true network isolation because there's no way to make it user friendly. plus the many exploits on the all powerful daemon.
but most professional world use systemd to bootstrap isolated processes nowadays, which is kinda if what you are hinting at. cgroups2 and namespaces are what you want.
We should get a better change root yes, and the LD_LIBRARY_PATH gets autoupdated with respect to the new root. A few flags here and there to set permissions of the child process and we're off too the races.
Oh wow, wow. That all sounded so intensely complex, incomprehensible. What we are going to need to do is build a program to handle all that, highly formalized. Let's make it so formalized it's one of those things like taxes or AWS where people can just make a living from understanding the beast. It can be like systemd meets multics meets java. have it's own various complicated commands, complicated file formats, and so on. The chroot() is only historically understood by everyone, so let's steal a page from the java playbook and just rename everything with our own terminology. The product will be so outstanding, wow, I call it "Shocker"
I'm not GP, but if I were to hazard a guess, they want something more than just mount space isolation. Something akin to BSD jails, without the bells and whistles of OCI containers like overlay filesystem, network virtualization, resource management, etc.
That requirement is pretty legitimate, since its easier and suitable enough for many applications for which we currently use OCI containers. For example, isolated builds, development environments, sandboxes etc. (I have an isolated build tool for Gentoo).
But Linux already has multiple solutions that fit the bill, like systemd-nspawn, LXC, bubblewrap, etc. Too bad, they aren't as widely known as chroot.
None of those things do what chroot does but many of them involve chroot - so I'm still not grasping what "better chroot" is, other than "not chroot, but something completely different."
One annoying part of using chroot if you're creating them on the fly is teardown - you have to manually invoke umount, and also take care to get this right for partially created chroots (maybe you detected an error after mounting proc, in the process of getting other files in place).
This was my original motivation in creating machroot (mentioned elsewhere in this thread) and having it use namespaces.