> they're doing something really simple -- take BLIP2's ViT-L+Q-former, connect it to Vicuna-13B with a linear layer, and train just the tiny layer on some datasets of image-text pairs
Oh yes. Simple! Jesus, this ML stuff makes a humble web dev like myself feel like a dog trying to read Tolstoy.
> This ML stuff makes a humble web dev like myself feel like a dog trying to read Tolstoy.
Just like any discussion between advanced web devs would make any humble woodworker feel?
And just like any discussion between advanced woodworkers would make a humble web dev feel?
"It's really simple, they're just using a No. 7 jointer plane with a high-angle frog and a PM-V11 blade to flatten those curly birch boards, then a No. 4 smoother plane with a Norris-type adjuster and a toothed blade for the final pass."
Whut?
"You could use Webpack to bundle your HTML, CSS and Babel-transpiled TypeScript 5 down to shim-included Ecmascript 4", "They're just using OAuth2 authentication with Passport.js and JWT tokens, which easily gets you CSRF protection", "Our e-learning platform uses LMS.js and xAPI.js, plus SCORM for course packaging and Moodle as the LMS backend.", ...
There was a time you didn't know what any of that meant.
Just because you don't know what the words mean shouldn't make it sound difficult. Not saying AI is easy, just that the jargon is not a good indication of difficulty and we should know better than to be so easily mystified.
Hey, guys. Hey. Ready to talk plate processing and residue transport plate funneling? Why don't we start with joust jambs? Hey, why not? Plates and jousts. Can we couple them? Hell, yeah, we can. Want to know how? Get this. Proprietary to McMillan. Only us. Ready? We fit Donnely nut spacing grip grids and splay-flexed brace columns against beam-fastened derrick husk nuts and girdle plate Jerries, while plate flex tandems press task apparati of ten vertipin-plated pan traps at every maiden clamp plate packet. Knuckle couplers plate alternating sprams from the t-nut to the SKN to the chim line. Yeah. That is the McMillan way. And it's just another day at the office.
I remember seeing someone link to that scene recently as a joke on Twitter (about Twitter trying to explain Twitter Blue). Within a few days I’d watched the entire series… absolutely phenomenal show.
Edit: ah I actually saw the prior scene where Leslie was explaining to John what he expected (which is the setup for the linked bit): https://www.youtube.com/watch?v=G7Do2tlYLhs
The thing is, machine learning sorta requires a few math prerequisites: linear algebra, differential equations, and to some degree vector calculus. Most web developers don’t have this background.
If you want to understand the theory, that's true. If you want to develop an intuitive understanding without having to understand all the nuts and bolts (and I understand that can be a big ask for how some people learn/understand), give this a try: https://karpathy.ai/zero-to-hero.html
The irony is Karpathy presents the limit/epsilon definition of derivatives in the first half hour (quite well IMO and he never actually says “epsilon”) which is very much a nuts and bolts kind of explanation in calculus.
That said, when most people say differential equations they’re usually thinking of analytical solutions which is very much not necessary for practical ML.
I would say the limit epsilon derivative is exactly the sort of thing grandparent post is talking about. It's quite intuitive and doesn't require hardly any mathematical foundation at all, other than basic geometry and algebra. You can understand topics that build on that simple concept without understanding the more formal derivative definitions.
Great idea, actually. I do hope for a curriculum that enables kids on the trade school path to learn more about programming. Why not Master/Journeyman/Apprentice style learning for web dev??
That's kind of how I think about bootcamps pumping out web devs. They're like trade schools, teaching you just enough fundamentals to know how to use existing tools.
Mostly agree... though I don't think the bootcamps get enough fundamentals in. Not to mention that it takes the type of person that will go above and beyond what has been assigned to succeed trying to be a productive employee in the space. I'm self-taught and the first years of my career spent countless hours reading, practicing and solving problems. I still spend a good 10-15 hours a week reading and exploring software development and try to at least keep up with what's out there. In the end, the best you can do is be aware of what, or even that options are out there.
You make a good point.
Except that a number of these concepts and tooling in the ML world have been slingshotted into the forefront in a relatively short time and it has been hard to play catch up.
For eg. - someone said "frozen Vicuna" below - what does that mean?
This thing takes an image and creates a representation matrix.
> connect it to Vicuna-13B with a linear layer
Vicuna is an open LLM, pretty good quality, not as good as GPT3.5 though.
This is the beautiful part - a mere multiplication is enough to convert the image tensor to text tensor. One freaking line of code, and a simple one.
> and train just the tiny layer on some datasets of image-text pairs
You then get a shitload of image-text pairs and train the model to describe the images in text. But keep both the image and text model frozen. Is that hard? No, just flip a flag. So this "linear projection layer" (a matrix multiplication) is the only learned part. That means it takes less time to train, needs fewer examples and requires less memory.
Training the image and text models was much more difficult. But here we don't train these models, they use them as ready-made parts. It's a hack on top of two unrelated models, so it is cheap.
In the end the finishing touches - they label 3500 high quality image-text pairs, and fine-tune on them. Now the model becomes truly amazing. It has broad visual intelligence, and scooped OpenAI who didn't release Image GPT-4 in the APIs yet.
The important lesson to take is that unrelated models can be composed together with a bit of extra training for the glue model. And that open AI is just as powerful as "Open"AI sometimes. It's breathing down their necks, just one step behind. This model is also significant for applications - it can power many automations in a flexible way.
> This is the beautiful part - a mere multiplication is enough to convert the image tensor to text tensor. One freaking line of code, and a simple one.
I thought they were creating image tokens based on the queries during finetuning and appending them to the language model. They are not text tokens.
And just like webdev, each of those were done in a different platform and require arcane incantations and 5h of doc perusing to make it work on your system.
Maybe it's because of how I use it, but the code ChatGPT gives me has always been super helpful and 99% correct. But, we have a policy at work not to use it for work product so I have to spend time changing enough of it where it's different, and I'm never copy/pasting anything. Enough changes to the structure and variables to make it sufficiently different that it can't be considered pasting company data into GPT, ask my question(s), see what comes back out, refactor/type manually into my IDE, test. I'd say one out of every 8-9 times I get something objectively wrong - a method that doesn't exist, something not compiling, etc. But it's faster than using google/DDG, especially with some prompting so that it just spits back code and not 5th-grade level explanatory paragraphs before and after. And well over half the time it does exactly what I need or sufficiently close that my initial refactoring step gets me the rest of the way.
Would you say that this satisfies the spirit of the company policy? Or is it a bit of a hack to get around it?
I ask because we are about to produce a similar policy at work. We can see the advantages of it, but likewise, we can't have company data held in their systems.
The policy is to not send any "sensitive company data" into ChatGPT, which I 100% agree with. How we implement a given Vue component or a particular API isn't sensitive or particularly novel so if I strip the business logic out I do honestly believe I'm complying with the spirit of the policy.
at some point someone makes a service where you can let AI take over your computer directly. Easier that way! Curling straight to shell taken to next level.
1. plant seed
2. ...wait a very long time...
3. observe completely unexpected but cool result
The unexpected part of step 3 is what makes this very different from any kind of engineering, even webdev.
Of course, there is a lot of engineering involved in good ML, but that is more comparable to agricultural engineering in the sense that it's just a lot of dumb plumbing that any engineer can do without knowledge of the actual application.
I mean, for me, the unexpected part of 3 is what got me into programming in general. The first time you type a mysterious incantation into an editor and a few more mysterious incantations into the console and the console prints "Hello, world" like it was supposed to, it's unexpected because it's hard to believe that any of this mysterious incantation stuff actually works at all.
As you get better at programming you have to take on harder problems to create the surprise of something working, because you gain confidence, and as you gain confidence, you start expecting your code to work. It's only when you've compiled the thing 6 times with small corrections and gotten segfaults each time and the 7th time you finally find the place you weren't updating the pointer and you correct it, but this is the 7th error you've corrected without the segfault going away, so you don't really expect it to fix the problem, but then you run it and it's fixed!
And then you get a job and the reality is that most of the jobs you're just writing CRUD apps and for a little while you can get some surprise out of learning the frameworks, but eventually you actually get really, really knowledgeable about the Postrgres/Django/React stack and nothing surprises you any more, but because nothing surprises you any more, you're really effective and you start being able to bill the big bucks but only for work on that stack because it takes time to struggle enough to get surprised, and the time that takes means your time is worth less to your clients. Money ruins everything. And if you don't do anything non-billable, it's easy to forget what programming felt like when you didn't know how your tools all worked inside and out. Not everyone takes this path but it's certainly the easiest path to take.
I think for a lot of folks who have been doing this for a long time, the reason ML is so exciting is it's getting them back out of their comfort zone, and into a space where they can experience surprise again.
But that surprise has always been available if you continue to find areas of programming that push you out of your comfort zone. For me it's been writing compilers/interpreters for programming languages. Crafting Interpreters was awesome: for the first time I benchmarked a program written in my language against a Python program, and my program was faster: I never expected I'd be able to do that! More recently, I wrote a generational GC. It's... way too memory-intensive to be used in my language which uses one-GC-per-thread for potentially millions of threads, but it certainly was a surprise when that worked.
Personally, I'm keeping track of ML enough to know broad strokes of things but I'm not getting my hands dirty with code until there are some giants to stand on the shoulders of. Those may already exist but it's not clear who they are yet. And I've got very little interest in plugging together opaque API components; I know how to make an API call. I want to write the model code and train it myself.
I like how you've expressed this insight, and it is so true.
Becoming great at a particular technology stack means modelling it in great detail in your head, so you can move through it without external assistance. But that leaves an arena without discovery, where you just reinforce the same synapses, leading to rigidity and an absence of awe.
I've only been reading ML stuff for a few months and I kind of understand what it's saying. This stuff isn't as complex as its made out to be.
It's just a bunch of black boxes AKA "pure functions".
BLIP2's ViT-L+Q-former AKA
//I give you a picture of a plate of lobster it will say "A plate of lobster".
getTextFromImage(image) -> Text
Vicuna-13B AKA
//I give you a prompt and you return completion ChatGPT style
getCompletionFromPrompt(text) -> Text
We want to take the output of the first one and then feed in a prompt to the LLM (Vicuna) that will help answer a question about the image. However the datatypes don't match. Lets add in a mapper.
This is the magic of ML. We can just "learn" this function from data. And they plugged in a "simple" layer and learned it from a few examples of (image , question) -> answer. This is what frameworks like Keras, Pytorch allow you to do. You can wire up these black boxes with some intermediate layers and pass in a bunch of data and voila you have a new model. This is called differentiable programming.
The thing is you don't need to convert to text and then map back into numbers to feed into the LLM. You skip that and use the numbers it outputs and multiply directly with an intermediate matrix.
More precisely -
It gets the question After irs passed through a matrix that transforms the text description of the image so the LLM can “understand” it.
It maps from the space of one ML model to the other.
Just get rid of all the abbreviations in your mind - they seem to be very intimidating. I really liked the explanation that Stephen Wolfram did on ChatGPT:
I pick that up in above video and also in the post above.
Definitely healthy for him which just to be clear I’m a huge Wolfram fan and the ego doesn’t really bother me, it’s just part of who he is, however I do find it nice that LLMs are having him self reflect more than typical.
Not a big Wolfram fan myself. I gave him the benefit of the doubt and bought "A New Kind of Science" (freakin' expensive when it first came out), and read the whole 1280 pages cover to cover ... Would have been better presented as a short blog post.
I find it funny how despite being completely uninvolved in ChatGPT he felt the need to inject himself into the conversation and write a book about it. I guess it's the sort of important stuff that he felt an important person like himself should be educating the plebes on.
Predictably he had no insight into it and will have left the plebes thinking it's something related to MNIST and cat-detection.
I just happen to read this article of him, which I found easy to understand. I'm neither a huge proponent nor opponent of the likes of his work. Or, bluntly speaking: I don’t know much else about his reputation in the community.
Seriously, ChatGPT was the thing that gave me a foothold into the AI/machine learning world... because it gave me hope that a mere mortal can achieve something reasonable with this tech without a crazy amount of work and educational background.
There are really great resources now from eli5 about all of this tech to books like ‘the little learner’ which any programmer can get into. Yes, it takes effort but it is a great time for it.
Regardless of what you want to learn, "small daily activities" is a bit hard. You can learn some stuff by osmosis, following the feeds of AI devs && AI channels, but the bulk of what I learn comes from starting projects & digging into code & reading papers.
If you can hold attention span over several days (I can't), work on a project bit-by-bit. Just make sure it uses modern AI stuff, and that you have smart people to talk around with.
I was where you're at about ... oh wow, it's been almost ten years since I jumped into machine learning. Mind you, I've been learning on the side most of this time other than a theoretical class at the University of Minnesota. But, that aside, and depending on where you're at in your understanding, this is a great resource for catching up if you're really interested: https://karpathy.ai/zero-to-hero.html it was posted on HN a couple of weeks ago and I have to say it's a really good introduction and Andrej Karpathy is a passionate and excellent teacher. You may want to brush up on some intro Calculus, but it's very understandable.
Maybe you're just holding it wrong: You're not supposed to let your LLM rest or chat idly while you do the webdev stuff yourself, but to make your LLM do the webdev stuff for you ;P
Oh yes. Simple! Jesus, this ML stuff makes a humble web dev like myself feel like a dog trying to read Tolstoy.