To be honest, I actually really like the visual delivery here. It's especially helpful for understanding what's going on with computer vision problems. Please make more!
Sorry, but this is a lot of marketing for the same thing over and over again. I'm not against Aloha as an _affordable_ platform, but skimping on hardware is kind of a bug not a feature. Moreover it's not even _lowcost_, its BoM is still like 20k and collecting all the data is labor intensive and not cheap.
And if we're focusing on the idea, it has existed since the 1950s and they were doing it relatively well then:
> skimping on hardware is kind of a bug not a feature.
I have to disagree here. Not for 20k, but if you could really build a robot arm out of basically a desk lamp, some servos and a camera and had some software to control it as precisely as this video claims it does, this would be a complete game changer. We'd probably see an explosion of attempts to automate all kind of everyday household tasks that are infeasible to automate cost-effectively today (folding laundry, cleaning up the room, cooking, etc)
Also, every self-respecting maker out there would probably try to build one :)
> And if we're focusing on the idea, it has existed since the 1950s and they were doing it relatively well then:
I don't quite understand how the video fits here. That's a manually operated robot arm. The point of Aloha is that it's fully controlled by software, right?
If you want a robot that can fold your laundry, clean your room and cook, you need a lot more than cheap hardware. You need an autonomous agent (i.e. "an AI") that can guide the hardware to accomplish the task.
We're still very far from that and you certainly can't do that with ALOHA, in practice, despite what the videos may seem to show. For each of the few, discrete, tasks that you see in the videos, the robot arms have to be trained by demonstration (via teleoperation) and the end result is a system that can only copy the operator's actions with very little variation.
You can check this in the Mobile ALOHA paper on arxiv (https://arxiv.org/abs/2401.02117) where page 6 shows the six tasks the system has been trained to perform, and the tolerances in the initial setup. So e.g. in the shrimp cooking task, the initial position of the robot can vary by 10cm and the position of the implements by 2cm. If everything is not set up just so, the task will fail.
What all this means is that if you could assemble this "cheap" system you'd then have to train it by a few hundred demonstrations to fold your laundry, and maybe it could do it, probably not, and if you moved the washing machine or got a new one, you'd have to train all over again.
As to robots cleaning up your room and cooking, those are currently in the realm of science fiction, unless you're a zen ascetic living in an empty room and happy to eat beans on toast every day. Beans from a can, that is. You'll have to initialise the task by opening the can yourself, obviously. You have a toaster, right?
> If you want a robot that can fold your laundry, clean your room and cook, you need a lot more than cheap hardware. You need an autonomous agent (i.e. "an AI") that can guide the hardware to accomplish the task.
Yes, that's my point. Cheap hardware is far harder to control than expensive hardware, so if Google actually developed some AI that can do high-precision tasks on "wobbly", off-the-shelf hardware, that would be the breakthrough.
I agree that extensive training for each single device would be prohibitive, but that feels like a problem that could be solved with more development: With many machine learning tasks, we started with individual training a model for each specific use case and environment. Today we're able to make generalized model which are trained once and can be deployed in a wide variety of environments. I don't see why this shouldn't be possible for a vision-based robot controller either.
Managing the actual high-level task is easy as soon as you're able to do all the low-level tasks: I.e., converting a recipe into a machine-readable format, dividing it into a tree of tasks and subtasks etc is easy. The hard parts are actually cutting the vegetables, de-boning the meat, etc. The amount of complex movement planning necessary for that doesn't exist yet. But this project looks as if it's a step in exactly that direction.
I can appreciate that, but also they are recording and replaying motor signals from specific teleoperation demonstrations. Something that _was_ possible in the 1950s. You might say that it is challenging to replay demonstrations well on lower-quality hardware. And so there is academic value in trying to make it work on worse hardware, but it would not be my goto solution for real industry problems. E.g. this is not a route I would fund for a startup, for example.
They do not replay recorded motor signals. They use recorded motor signals only to train neural policies, which then run autonomously on the robot and can generalize to new instances of a task (such as the above video generalizing to an adult size sweater when it was only ever trained on child size polo shirts).
Obviously some amount of generalization is required to fold a shirt, as no two shirts will ever be in precisely the same configuration after being dropped on a table by a human. Playback of recorded motor signals could never solve this task.
> recorded motor signals only to train neural policies
Is interesting that they are using "Leader Arms" [0] to encode tasks instead of motion capture. Is it just a matter of reduced complexity to get off the ground? I suppose the task of mapping human arm motion to what a robot can do is tough.
I appreciate that going from polo shirts to sweaters is a form of "generalisation" but that's only interesting because of the extremely limited capability for generalisation that systems have when they're trained by imitation learning, as ALOHA.
Note for example that all the shirts in the videos are oriented in the same direction, with the neck facing to the top of the video. Even then, the system can only straighten a shirt that lands with one corner folded under it after many failed attempts, and if you turned a shirt so that the neck faced downwards, it wouldn't be able to straighten it and hang it no matter how many times it tried. Let's not even talk about getting a shirt tangled in the arms themselves (in the videos, a human intervenes to free the shirt and start again). It's trained to straighten a shirt on the table, with the neck facing one way [1].
So the OP is very right. We're no nearer to real-world autonomy than we were in the '50s. The behaviours of the systems you see in the videos are still hard-coded, only they're hard-coded by demonstration, with extremely low tolerance for variation in tasks or environments, and they still can't do anything they haven't been painstakingly and explicitly shown how to do. This is a sever limitation and without a clear solution to it there's no autonomy.
On the other hand, ιδού πεδίον δόξης λαμπρόν, as we say in Greek. This is a wide open field full of hills to plant one's flag on. There's so much that robotic autonomy can't yet do that you can get Google to fund you if you can show a robot tying half a knot.
__________________
[1] Note btw that straightening the shirt is pointless: it will straighten up when you hang it. That's just to show the robot can do some random moves and arrive at a result that maybe looks meaningful to a human, but there's no way to tell whether the robot is sensing that it achieved a goal, or not. The straightening part is just a gimmick.
We're building software for neuromorphic cameras specifically for robotics. If robots could actually understand motion in completely unconstrained situations, then both optimal control and modern ML techniques would easily see uplift in capability (i.e. things work great in simulation, but you can't get good positions and velocities accurately and at high enough rate in the real world).
Robots already have fast, accurate motors, but their vision systems are like seeing the world through a strobe light.
It is true that replay in the world frame will not handle initial position changes for the shirt. But if the commands are in the frame of the end-effector and the data is object-centric, replay will somewhat generalize.(Please also consider the fact that you are watching the videos that have survived the "should I upload this?" filter.)
The second thing is that large-scale behavior cloning (which is the technique used here), is essentially replay with a little smoothing. Not bad inherently, but just a fact.
My point is that there was an academic contribution made back when the first aloha paper came out and they showed doing BC on low-quality hardware could work, but this is like the 4th paper in a row of sort of the same stuff.
Since this is YC, I'll add - As an academic (physics) turned investor, I would like to see more focus on systems engineering and first-principles thinking. Less PR for the sake of PR. I love robotics and really want to see this stuff take off, but for the right reasons.
> large-scale behavior cloning (which is the technique used here), is essentially replay with a little smoothing
A definition of "replay" that involves extensive correction based on perception in the loop is really stretching it. But let me take your argument at face value. This is essentially the same argument that people use to dismiss GPT-4 as "just" a stochastic parrot. Two things about this:
One, like GPT-4, replay with generalization based on perception can be exceedingly useful by itself, far more so than strict replay, even if the generalization is limited.
Two, obviously this doesn't generalize as much as GPT-4. But the reason is that it doesn't have enough training data. With GPT-4 scale training data it would generalize amazingly well and be super useful. Collecting human demonstrations may not get us to GPT-4 scale, but it will be enough to bootstrap a robot useful enough to be deployed in the field. Once there is a commercially successful dextrous robot in the field we will be able to collect orders of magnitude more data, unsupervised data collection should start to work, and robotics will fall to the bitter lesson just as vision, ASR, TTS, translation, and NLP before.
"Limited generalisation" in the real world means you're dead in the water. Like the Greek philosopher Heraclitus pointed out 2000+ years go, the real world is never the same environment and any task you want to carry out is not the same task the second time you attempt it (I'm paraphrasing). The systems in the videos can't deal with that. They work very similar to industrial robots: everything has to be placed just so with only centimeters of tolerance in the initial placement of objects, and tiny variations in the initial setup throw the system out of whack. As the OP points out, you're only seeing the successful attempts in carefully selected videos.
That's not something that you can solve with learning from data, alone. A real-world autonomous system must be able to deal with situations that it has no experience with, it has to be able to deal with them as they unfold, and it has to learn from them general strategies that it can apply to more novel situations. That is a problem that, by definition, cannot be solved by any approach that must be trained offline on many examples of specific situations.
Thank you for your rebuttal. It is good to think about the "just a stochastic parrot" thing. In many ways this is true, but it might not be bad. I'm not against replay. I'm just pointing out that I would not start with an _affordable_ 20k robot with fairly undeveloped engineering fundamentals. It's kind of like trying to dig a foundation to your house with a plastic beach shovel. Could you do it? Maybe, if you tried hard enough. Is it the best bet for success? doubtful.
The detail about end-effector frame is pretty critical as doing this BC with joint angles would not be tractable. You can tell there was a big shift from the RL approaches trying to do very generalizing algorithms to more recent works that are heavily focused on this arms/manipulators because end-effector control enables more flashy results.
Another limiting factor is that data collection is a big problem: not only will you never be sure you've collected enough data, they're collecting data of a human trying to do this work through a janky teleoperation rig. The behavior they're trying to clone is of a human working poorly, which isn't a great source of data! Furthermore limiting the data collection to (typically) 10Hz means that the scene will always have to be quasi-static, and I'm not sure these huge models will speed up enough to actually understand velocity as a 'sufficient statistic' of the underlying dynamics.
Ultimately, it's been frustrating to see so much money dumped into the recent humanoid push using teleop / BC. It's going to hamper the folks actually pursing first-principles thinking.
What do you mean by saying that they're replaying signals from teleoperation demonstrations? Like in https://twitter.com/DannyDriess/status/1780270239185588732, was someone demonstrating how to struggle to fold a shirt, then they put a shirt in the same orientation and had the robot repeat the same motor commands?
I follow this space closely and I never saw the 1950 teleoperation video which literally blows my mind that people had this working in 1950. Now you just need to connect that to a transformer / diffusion and it will be able to perform that task autonomously maybe 80% of the time with 200+ demonstrations and close to 100% of the time with 1000+ demonstrations.
Aloha was not new, but it’s still good work because robotics researchers were not focused on this form of data collection. The issue was most people went into the simulation rabbit hole where they had to solve sim-to-real.
Others went into the VR handset and hand tracking idea, where you never got super precise manipulations and so any robots trained on that always showed choppy movement.
Others including OpenAI decided to go full reinforcement learning foregoing human demonstrations which had some decent results but after 6 months of RL on an arm farm led by Google and Sergey Levine, the results were underwhelming to say the least.
So yes it’s not like Aloha invented teleoperation, they demonstrated that using this mode of teleoperation you could collect a lot of data that can train autonomous robot policies easily and beat other methods which I think is a great contribution!
I’m not sure you can say that imitation learning has been under-researched in the past. Imitation learning has been tried before alongside RL. But it did not generalize well until the advent of generative diffusion models.
To be honest, most researchers in applied ML in the bay say the opposite. If you are trying to be nimble and prototype, use pytorch. If you're trying to gain some optimizations as you near deployment, rewrite in Jax.
Interesting perspective about possible Jax optimizations. Assuming these models are trained and deployed on non-TPU hardware, are there any real advantages in using Jax for deployment on GPU? I’d have assumed that inference is largely a solved optimization for large transformer based models (with any low hanging fruits from custom CUDA code already written) and the details are shifting towards infrastructure tradeoffs and availability of efficient GPUs. But I may be out of the loop with the latest gossip. Or do you simply mean that maybe there exist cases where TPU inference makes sense financially and using jax makes a difference?
Tensorflow has been falling behind since they stopped caring about backward compatibility. PyTorch is the leading framework. Jax is getting some traction at Google and was used to train Gemini.
Let me say, he's a great teacher! I took a CV class with him. He should teach more, and take it seriously.
Being a popular AI influencer is not necessarily correlated with being a good researcher though. And I would argue there is a strong indication that it is negatively correlated with being a good business leader / founder.
Here's to hoping he chills out and goes back to the sorely needed lost art of explaining complicated things in elegant ways, and doesn't stray too far back into wasting time with all the top sheisters of the valley.
Edit: the more I think about it, the more I realize that it probably screws with a person to have their tweets get b-lined to the front page of hackernews. It makes you a target for offers and opportunities because of your name/influence, but not necessarily because of your underlying "best fit"
if only we compensated that knowledge properly. Youtube seems to come the closest, but Youtube educators also show how much time you have to spend attracting views instead of teaching expertise.
> It makes you a target for offers and opportunities because of your name/influence, but not necessarily because of your underlying "best fit"
That's unfortunately life in a nutshell. The best fits rarely end up getting any given position. May be overqualified, filtered out in the HR steps, or rejected for some ephemeral reason (making them RTO, not accepting their counteroffer, potentially illegal factors behind closed doors, etc).
it's a crappy game so I don't blame anyone for using whatever cards they are dealt.
> Youtube seems to come the closest, but Youtube educators also show how much time you have to spend attracting views instead of teaching expertise.
Actually for all the attention that the top Youtubers get (in terms of revenue), the reality is that it's going to be impossible to replace teaching income with popular Youtube videos alone.
Based on what I've seen, 1 million video views on Youtube gets you something like $5-10K. And that's with a primarily US audience that has the higher CPM / RPM. So your channel(s) would need to get to about 6 million views per year, primarily US driven, in order to get to earning a median US wage.
If you made video a week and the average is 115k views, you replace your median salary[0]. But the logic on ppc ends up being alot more complicated than you assume.
to get 6m views you need to make one video a week that gets 114k views 6000000/52 = 115,384.61.
Something I've been thinking a lot about is the transition into post scarcity and how we need to dramatically alter the incentive structures and payment allocations.
I've been asking this question for about a decade and still have no good solutions: "What do you do when x% of your workforce is unemployable?" (being that x% of jobs are removed without replacement. Imagine sophisticated and cheap robots. Or if needed, magic)
This is a thought experiment, so your answer can't be "there'll be new jobs." Even if you believe that's what'll happen in real life, it's not in bounds of the thought experiment. It is best to consider multiple values of x because it is likely to change and that would more reflect a post scarcity transition. It is not outside the realms of possibility that in the future you can obtain food, shelter, and medical care for free or at practically no cost. "Too cheap to meter" if you will.
I'll give you two answers that I've gotten that I find interesting. I do not think either are great and they each have issues. 1) jobs programs. Have people do unnecessary jobs simply so they create work wherein we can compensate them. 2) Entertainment. People are, on average, far more interested in watching people play chess against one another than computers, despite the computer being better. So reasons that this ,,might,, not go away.
>The best fits rarely end up getting any given position.
This can be self-fulfilling.
In an organization beyond a certain size, there will be more almost-adequate-fits than there are leadership positions. This could be about like a recognized baseline which seems like it really needs to be scrutinized closely to see exactly who might be slightly above or below the line.
Or in a small company where there is not any almost-fit whatsoever, imagination can result in an ideal that is equally recognizable, but also might not be fully attainable.
Either way it could be OK but not exactly the best-fit.
If good fortune smiles and the rare more-than-adequate-fit appears anywhere on the horizon though, it's so unfamiliar they fly right over the radar.
Seconded! Another math youtuber who is an outrageously good educator is Adithya Chakravarthy a.k.a Aleph 0. He doesn't put out videos very often, but when he does you're probably going to learn something new even if you knew the topic he was speaking about.
He uses elegant hand-drawn notes rather than Manim - although 3blue1brown's open sourced visualization library is beautiful too, I think this makes it extra impressive.
3blue1brown runs Summer of Math competitions to highlight other creative math videos. Many, but not all, use the same 3b1b 'manim' animation software, so they often have the same look'n'feel. Here are the results from 2022, and the huge YT playlist:
That’s kind of the point, you won’t be able to due to the algorithm.
I can give you something analogous though: I’m a big fan of old school east coast hip-hop. You have the established mainline artists from back then (“Nas”, “Jay-Z”, “Big L”, etc), then you have a the established underground artists (say, “Lord Finesse” or “Kool G Rap”), and then you have the really really underground guys like “Mr. Low Kash ‘n Da Shady Bunch”, “Superscientifiku”, “Punk Barbarians”, “Harlekinz”, etc.
A lot of those in that third “tier” are every bit as good as the second tier. And both tiers contain a lot of artists that could hit the quality point of the mainline artists, they just never had access to the producer and studio time that the mainline did.
I know these artists because I love going digging for the next hidden gem. Spotify recommended me perhaps one or two of all the super-underground guys.
Somewhat off-topic, but what do you feel like are the best techniques to find the artists in Tier 2 and 3? I face a similar conundrum just in a different genre.
(I realize know I dislike using the descriptor "tier", as it implies some sort of ranking. Perhaps "layer" would have been better, but I'll stick with it for now)
For both tier 2 and tier 3 its basically the same process. This is for Spotify btw, I have no idea how different the workflow would be for something like Apple Music.
Say the genre you want to dig around in is Hip-Hop. You are aware of Eminem and Mac Miller, and vaguely aware of a guy named Nas. By intuition you'd probably already be able to tell that Nas is more at the edge among the mainline artists.
You click on "Nas", and scroll down to Fans also like. Right now, for "Nas", it is showing "Mobb Deep", "Mos Def", "Rakim", "Big L", "Wu-Tang Clan", "Gang Starr", "Ghostface Killah", "Method Man" and "Common".
This is a mix T1 and T2. "Wu-Tang"s in there along with assorted members, but some of the other artists are much lesser known quantities.
Its a bit hard for me to decide what a Hip-Hop layman would consider the most unknown name here, but I'd venture it'd be "Big L". We click on him, do the same thing. Now we're really getting somewhere, with guys like "Inspectah Deck" and "Smif-n-Wessun". Click, dig, we get a bunch of names amongst which "Lord Finesse" stands out. The Show more at the end of Fans Like is also invaluable.
In total the dig order for me to get to the very bottom of the undeground is "Nas" > "Big L" > "Smif-n-Wessun" > "Lord Finesse" > "Channel Live" > "Ed OG & Da Bulldogs" > "Trends of Culture" > "Brokin English Klik" (358 monthly listeners).
I wouldn't consider each of those going a tier (layer) deeper. As a guy who knows waaay too much about Hip-Hop, I'd separate them into:
- T1: "Nas", "Big L"
- T2 "Smif-n-Wessun", "Lord Finesse"
- T3 "Channel Live", "Ed OG & Da Bulldogs", "Trends of Culture", "Brokin English Klik"
Perhaps "Brokin English Klik" should be in its own T4 and 3 tiers lacks the fidelity to be necessarily accurate. Not sure.
A little shortcut would be using "The Edge of $Genre" playlists. They're the pair playlists to "The Sound of $Genre" (broad slice) and "The Pulse of $Genre" (most popular) generated via everynoise.com, although as that guy got fired from Spotify its up in the air how long those will keep working.
Edit: oh, and if you run into a playlist that caters to that deep underground (in my case, that was "90's Tapes"*), that's worth its bytes in gold.
I hate the fact there is no diversity in recommendation algos. We need to bring back Yahoo style top-down directories recommendations and not just a blackbox. But you can find good channels on youtube using tags like "#some3" and "#some2" and so on.
TikTok's recommendation algorithm is probably one of the best. It puts content first, giving what seems only a passing weight to follower count.
That doesn't mean that having a big follower count doesn't increase you chance to go viral and gain a lot of views, but it is much more likely for great content from a small creator to go viral, than mediocre content from someone with 500.000 followers.
You can also see this in that successful TikTok profiles often have a much higher view-to-follower ratio than something like YouTube.
3b1b's animations are certainly important but his main selling point is his thoughtful explanations of mathematics -- the topics, approaches, and words.
He's a great educator, but at the same time we must recognize that his videos are not a replacement for a traditional math course. They amplify the existing paradigm, not replace.
MOCs are great for access, but they are not, and definitely should not be treated as, replacements. That I am certain will have a net negative result. I'm in grad school and there's something I tell students on the first day:
> The main value in you paying (tuition) and attending is not just to hear me lecture, but to be able to stop, interrupt, and ask questions or visit me in office hours. If you are just interested in lectures I've linked several on our website from high quality as well as several books, blogs, and other resources. Everyone should all use these. But you can't talk to a video or book, but you can to me. You should use all of these resources to maximize your learning. I will not be taking attendance.
I'm sure many of you have had lectures with a hundred students if you went to a large school (I luckily did not). You're probably aware how different that is from a smaller course. It's great for access and certainly is monetarily efficient, but its certainly not the most efficient for educating an individual. MOCs are great because they increase the ability of educators to share notes. We pull from one another all the time (with credit of course), because if someone else teaches in a better way than I do, I should update the way I teach. MOCs are more an extension of books. Youtube is the same, but at the end of the day you can't learn math without doing math. Even Grant states this explicitly.
this is disrupting education. you can get a better undergraduate education in STEM on youtube than my paid education 20 years ago. I think those visualizations can even pull forward a bunch of stuff into high school.
Well, I get the point and find it appealing but I don't agree.
When my kiddo was a sophomore in HS he decided that he wanted to be an engineer, and I thought that it would be really good for him to learn calc- my feeling was that if he got out of HS without at least getting through Calculus he'd have a really hard time.
So _I_ learned calculus. I started with basic math on Kahn and moved to the end of the Calc AB syllabus. I have, like, 500K points there. And I've watched a whole lot of STEM on YT.
Yesterday I finished a lab with Moritz Klein's Voltage Controlled Oscillators, where I was able to successfully understand the function of all the sections in the circuit.
I've been trying to follow Aaron Lanterman's Georgia Tech lectures on analog electronics.
The issue is that I have other stuff going on in my life. Like, my son studies more than I work at my full time job.
And I don't really have the pressure on me to learn the more advanced math that he's using. In fact, in the couple of years since he graduated HS, I've not really found a use for calc in my day-to-day work on any of the technical things I've done (mostly programming) and so I've lost a lot of it.
So, by contrast, my son who will be graduating as a BS in ME in May, has a far better and deeper understanding of the engineering material than I do.
And it's not just a time issue- I quit my programming job last summer because I have just enough work as a musician to pay the rent, which leaves me plenty of time to do stuff. And it's not that I don't know how to learn at a college level- I taught in an English Dept for 8 years and quit a PhD in the humities ABD.
That's all just my experience.
I love STEM (and trades education) material on Youtube, but I really think that it's missing something to think that you could get " a better undergraduate education in STEM on youtube".
1. With advanced math I feel I retain at the n-1 level. Unless I’m using it, it fades. That’s frustrating but I don’t think it’s the fault of the deliverer.
I do think working through problems has to be part of the practice, I’ve bought workbooks to have something to try to drive the knowledge into muscle memory. It still fades, but maybe not as much.
2. Calculus, in particular seems super unimportant to real life. Stats and Linear Algebra, somewhat similar in Math Level, seem much more applicable. I’m very happy to see Stats being offered in high school now as an alternative to Calculus. For Calculus, you almost need to learn 3-4 rules and someone says “trust me, just memorize these, don’t spend too much time on this.” And you would be able to live a happy productive life.
I think it's important to separate the motivation pill from the content delivery. You can buy a motivation pill for cheaper than $160k or whatever a degree costs these days. And we get to compare the very best tryhard youtubers to the median lecturer who is grinding it out.
This was the point I made earlier. Consider Richard Feynman lectures. Why didn't universities collectively took the decision to create pre-made/cooked lecture videos for topics that don't change and show these videos during normal lecture which otherwise would be the job of professor to revise / prepare the topics the night before and deliver. The professor spends so much time in doing the same thing again and again everyyear. This would have freed them to have more discussion, office hours and so on.
Actually there is a tortoise and hare race going on. Entertainment is outpacing education. Education is getting better and better with modern technology but so also is distraction i.e. entertainment.
I think good teachers make great researchers, because they have to understand their field very well, they anticipate and ask themselves the questions that need to be asked, they manage to always see their field with fresh eyes, they are good collaborators, and most importantly, good communicators.
My question is this, great educators like Karpathy make things from 'scratch' and explain in a way that I can understand. Is it a matter of the instructor ability to do this or it's a matter of the student(i.e. me) not having enough chops to understand material from elsewhere?
A teacher can usually adapt the content depending on its audience, I would not teach the research in my field at the same level to professionals, PhDs, master students, bachelor students, amateurs, or even school students.
If what I'm teaching is fairly complex, it requires a lot of background that I could teach, but I would not have the time to do so, because it would be to the detriment of other students. So, while I usually teach 'from scratch', depending on my audience I will obfuscate some details (that I can answer separately if a question is asked) and usually I will dramatically change the speed of the lessons depending on the previous background, because I need to assume that the student has the prerequisite background to understand at that speed fairly complex material.
As an example, I gave some explanations to a student from zero to transformers, it took several hours with lots of questions, the same presentation to a teacher not in the field took me 1h30 and to a PhD in a related field took 25 minutes, the content was exactly the same, and it was from scratch, but the background in the audience was fairly different.
At the same time, if you can explain something by using analogies to real-world things, to systems most of us have an intuition for, then you can target many more people at the same time. It's true that this is harder, because you have to find patterns that are common between these systems and also make it clear where the analogy ends. But the benefit to finding these common patterns is that you also understand them deeper.
To give a relevant example, graph theory concepts can be found both in so many real-world systems but also in programming languages and computer systems.
The headline, the picture, the article --- it would be easier to take them seriously if they just made the tools work and stopped posing for band pictures.
I am happy to have the tools, but the hype, the valuation, the "we have solved everything" mentality. It's just so offputting.
I think the problem with stuff like this is that in many cases the authors have an idea like "LLMS + ???? = Operating System !"
which is not that hard of an idea to come up with. Generally in each instance of LLM + ???? = "thing", the first papers that come out to fill the ???? with an answer do that in a rush to get on arxiv and so naturally the work is lackluster (since they have barely had time to think about an actual good solution for the ????).
Hey OP, thank you for the post. In this work, we are looking at one particular functionality of an OS - memory management. The project started off motivated by trying to extend context length for a project we were working on. The analogy to OSes came naturally as the project progressed and there was growing similarity to the caching and memory hierarchy relation.
My initial reaction when reading the title was the same. Their abstract does a better job at explaining what they actually do, and where the connection with an OS comes in:
> To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory.
I appreciate the conciseness of this article. Thank you for sharing. I agree. The kitchen sink data strategy appears to be fairly inefficient with current model architectures.
I deleted my account that I opened in 2007. I was a heavy user and used it to make friends and get jobs. I'm a little sad to see it thrown away like this, but not so much that I wanted to stick around to support it.