> they're doing something really simple -- take BLIP2's ViT-L+Q-former, connect ...

ikurei · on April 17, 2023

> This ML stuff makes a humble web dev like myself feel like a dog trying to read Tolstoy.

Just like any discussion between advanced web devs would make any humble woodworker feel?

And just like any discussion between advanced woodworkers would make a humble web dev feel?

"It's really simple, they're just using a No. 7 jointer plane with a high-angle frog and a PM-V11 blade to flatten those curly birch boards, then a No. 4 smoother plane with a Norris-type adjuster and a toothed blade for the final pass."

Whut?

"You could use Webpack to bundle your HTML, CSS and Babel-transpiled TypeScript 5 down to shim-included Ecmascript 4", "They're just using OAuth2 authentication with Passport.js and JWT tokens, which easily gets you CSRF protection", "Our e-learning platform uses LMS.js and xAPI.js, plus SCORM for course packaging and Moodle as the LMS backend.", ...

There was a time you didn't know what any of that meant.

Just because you don't know what the words mean shouldn't make it sound difficult. Not saying AI is easy, just that the jargon is not a good indication of difficulty and we should know better than to be so easily mystified.

brycedriesenga · on April 17, 2023

Hey, guys. Hey. Ready to talk plate processing and residue transport plate funneling? Why don't we start with joust jambs? Hey, why not? Plates and jousts. Can we couple them? Hell, yeah, we can. Want to know how? Get this. Proprietary to McMillan. Only us. Ready? We fit Donnely nut spacing grip grids and splay-flexed brace columns against beam-fastened derrick husk nuts and girdle plate Jerries, while plate flex tandems press task apparati of ten vertipin-plated pan traps at every maiden clamp plate packet. Knuckle couplers plate alternating sprams from the t-nut to the SKN to the chim line. Yeah. That is the McMillan way. And it's just another day at the office.

Rzor · on April 17, 2023

This post is double great and I will never forgive Amazon for canceling that show.

For those that don't know this is from a show called Patriot.

https://en.wikipedia.org/wiki/Patriot_(TV_series)

Scene: https://youtube.com/watch?v=-F-IHvF5OCA

trentnelson · on April 18, 2023

I remember seeing someone link to that scene recently as a joke on Twitter (about Twitter trying to explain Twitter Blue). Within a few days I’d watched the entire series… absolutely phenomenal show.

Edit: ah I actually saw the prior scene where Leslie was explaining to John what he expected (which is the setup for the linked bit): https://www.youtube.com/watch?v=G7Do2tlYLhs

therein · on April 17, 2023

Just tell me do we need a turbo encabulator or not?

nullsense · on April 19, 2023

I'll take 2

SanderNL · on April 17, 2023

Talk dirty to me!

intelVISA · on April 17, 2023

runtime polymorphism

ttul · on April 17, 2023

The thing is, machine learning sorta requires a few math prerequisites: linear algebra, differential equations, and to some degree vector calculus. Most web developers don’t have this background.

craigching · on April 17, 2023

If you want to understand the theory, that's true. If you want to develop an intuitive understanding without having to understand all the nuts and bolts (and I understand that can be a big ask for how some people learn/understand), give this a try: https://karpathy.ai/zero-to-hero.html

akiselev · on April 17, 2023

The irony is Karpathy presents the limit/epsilon definition of derivatives in the first half hour (quite well IMO and he never actually says “epsilon”) which is very much a nuts and bolts kind of explanation in calculus.

That said, when most people say differential equations they’re usually thinking of analytical solutions which is very much not necessary for practical ML.

boppo1 · on April 17, 2023

I dunno, I just watched it and he almost immediately throws out the formality of the delta-epsilon definition and starts using infinitesimals.

Thank god.

idiotsecant · on April 17, 2023

I would say the limit epsilon derivative is exactly the sort of thing grandparent post is talking about. It's quite intuitive and doesn't require hardly any mathematical foundation at all, other than basic geometry and algebra. You can understand topics that build on that simple concept without understanding the more formal derivative definitions.

SpaceL10n · on April 17, 2023

Web devs have become blue collar!? =P

Great idea, actually. I do hope for a curriculum that enables kids on the trade school path to learn more about programming. Why not Master/Journeyman/Apprentice style learning for web dev??

wnolens · on April 17, 2023

That's kind of how I think about bootcamps pumping out web devs. They're like trade schools, teaching you just enough fundamentals to know how to use existing tools.

ikurei · on April 17, 2023

I kind of agree, but I'd add that I don't think it's a bad thing.

tracker1 · on April 17, 2023

Mostly agree... though I don't think the bootcamps get enough fundamentals in. Not to mention that it takes the type of person that will go above and beyond what has been assigned to succeed trying to be a productive employee in the space. I'm self-taught and the first years of my career spent countless hours reading, practicing and solving problems. I still spend a good 10-15 hours a week reading and exploring software development and try to at least keep up with what's out there. In the end, the best you can do is be aware of what, or even that options are out there.

I can't imagine starting out today...

f1codz · on April 17, 2023

You make a good point. Except that a number of these concepts and tooling in the ML world have been slingshotted into the forefront in a relatively short time and it has been hard to play catch up. For eg. - someone said "frozen Vicuna" below - what does that mean?

idkyall · on April 17, 2023

Vicuna is a specific open source AI LLM: https://ai.plainenglish.io/vicuna-the-unparalleled-open-sour...

tracker1 · on April 17, 2023

Okay, I won't mention how much is wrong in the webdev statement... :-D

birdyrooster · on April 17, 2023

I love your analysis.

visarga · on April 17, 2023

> take BLIP2's ViT-L+Q-former

This thing takes an image and creates a representation matrix.

> connect it to Vicuna-13B with a linear layer

Vicuna is an open LLM, pretty good quality, not as good as GPT3.5 though.

This is the beautiful part - a mere multiplication is enough to convert the image tensor to text tensor. One freaking line of code, and a simple one.

> and train just the tiny layer on some datasets of image-text pairs

You then get a shitload of image-text pairs and train the model to describe the images in text. But keep both the image and text model frozen. Is that hard? No, just flip a flag. So this "linear projection layer" (a matrix multiplication) is the only learned part. That means it takes less time to train, needs fewer examples and requires less memory.

Training the image and text models was much more difficult. But here we don't train these models, they use them as ready-made parts. It's a hack on top of two unrelated models, so it is cheap.

In the end the finishing touches - they label 3500 high quality image-text pairs, and fine-tune on them. Now the model becomes truly amazing. It has broad visual intelligence, and scooped OpenAI who didn't release Image GPT-4 in the APIs yet.

The important lesson to take is that unrelated models can be composed together with a bit of extra training for the glue model. And that open AI is just as powerful as "Open"AI sometimes. It's breathing down their necks, just one step behind. This model is also significant for applications - it can power many automations in a flexible way.

rafaelero · on April 17, 2023

> This is the beautiful part - a mere multiplication is enough to convert the image tensor to text tensor. One freaking line of code, and a simple one.

I thought they were creating image tokens based on the queries during finetuning and appending them to the language model. They are not text tokens.

mjburgess · on April 17, 2023

In practice, it's a lot more like web dev than you might imagine.

The above means that the approach is web-dev like gluing, almost literally just,

    from existingliba import someop
    from existinglibb import anotherop
    from someaifw import glue

    a = someop(X)
    b = glue(a)
    Y = anotherop(b)

guax · on April 17, 2023

And just like webdev, each of those were done in a different platform and require arcane incantations and 5h of doc perusing to make it work on your system.

minimaxir · on April 17, 2023

This is why the Hugging Face transformer ecosystem is so good, as each of those blocks will roughly have the same unified API.

MattPalmer1086 · on April 17, 2023

You can just ask GPT how to do it. Much like a lot of web dev!

kerkeslager · on April 17, 2023

And the code GPT gives you won't work, much like a lot of web dev? ;P

pc86 · on April 17, 2023

Maybe it's because of how I use it, but the code ChatGPT gives me has always been super helpful and 99% correct. But, we have a policy at work not to use it for work product so I have to spend time changing enough of it where it's different, and I'm never copy/pasting anything. Enough changes to the structure and variables to make it sufficiently different that it can't be considered pasting company data into GPT, ask my question(s), see what comes back out, refactor/type manually into my IDE, test. I'd say one out of every 8-9 times I get something objectively wrong - a method that doesn't exist, something not compiling, etc. But it's faster than using google/DDG, especially with some prompting so that it just spits back code and not 5th-grade level explanatory paragraphs before and after. And well over half the time it does exactly what I need or sufficiently close that my initial refactoring step gets me the rest of the way.

MattPalmer1086 · on April 17, 2023

Would you say that this satisfies the spirit of the company policy? Or is it a bit of a hack to get around it?

I ask because we are about to produce a similar policy at work. We can see the advantages of it, but likewise, we can't have company data held in their systems.

SanderNL · on April 17, 2023

If I use it I also make sure it’s something completely non-core business, like an arcane piece of sorting or ugly rxjs construction.

I get the IP angst, but some companies think their GetGenericObjectFromDB() REST bs is secret sauce.

intelVISA · on April 17, 2023

To the average VC a computer switching on is secret sauce enough, the rest is really just an implementation detail.

pc86 · on April 17, 2023

The policy is to not send any "sensitive company data" into ChatGPT, which I 100% agree with. How we implement a given Vue component or a particular API isn't sensitive or particularly novel so if I strip the business logic out I do honestly believe I'm complying with the spirit of the policy.

EntrePrescott · on April 18, 2023

all the more proof that AI has reached the point where it can replace most web dev. You get what you train for ;P

Gravityloss · on April 17, 2023

at some point someone makes a service where you can let AI take over your computer directly. Easier that way! Curling straight to shell taken to next level.

wrayjustin · on April 17, 2023

So...AutoGPT? Now with command-line access! Have fun :)

https://github.com/Significant-Gravitas/Auto-GPT/

vorticalbox · on April 17, 2023

Found my next hobby project

KaoruAoiShiho · on April 17, 2023

Buddy this ain't 2022 anymore, ask chatgpt (with a plugin that can read docs).

amelius · on April 17, 2023

It's more like gardening:

    1. plant seed
    2. ...wait a very long time...
    3. observe completely unexpected but cool result

The unexpected part of step 3 is what makes this very different from any kind of engineering, even webdev.

Of course, there is a lot of engineering involved in good ML, but that is more comparable to agricultural engineering in the sense that it's just a lot of dumb plumbing that any engineer can do without knowledge of the actual application.

kerkeslager · on April 17, 2023

I mean, for me, the unexpected part of 3 is what got me into programming in general. The first time you type a mysterious incantation into an editor and a few more mysterious incantations into the console and the console prints "Hello, world" like it was supposed to, it's unexpected because it's hard to believe that any of this mysterious incantation stuff actually works at all.

As you get better at programming you have to take on harder problems to create the surprise of something working, because you gain confidence, and as you gain confidence, you start expecting your code to work. It's only when you've compiled the thing 6 times with small corrections and gotten segfaults each time and the 7th time you finally find the place you weren't updating the pointer and you correct it, but this is the 7th error you've corrected without the segfault going away, so you don't really expect it to fix the problem, but then you run it and it's fixed!

And then you get a job and the reality is that most of the jobs you're just writing CRUD apps and for a little while you can get some surprise out of learning the frameworks, but eventually you actually get really, really knowledgeable about the Postrgres/Django/React stack and nothing surprises you any more, but because nothing surprises you any more, you're really effective and you start being able to bill the big bucks but only for work on that stack because it takes time to struggle enough to get surprised, and the time that takes means your time is worth less to your clients. Money ruins everything. And if you don't do anything non-billable, it's easy to forget what programming felt like when you didn't know how your tools all worked inside and out. Not everyone takes this path but it's certainly the easiest path to take.

I think for a lot of folks who have been doing this for a long time, the reason ML is so exciting is it's getting them back out of their comfort zone, and into a space where they can experience surprise again.

But that surprise has always been available if you continue to find areas of programming that push you out of your comfort zone. For me it's been writing compilers/interpreters for programming languages. Crafting Interpreters was awesome: for the first time I benchmarked a program written in my language against a Python program, and my program was faster: I never expected I'd be able to do that! More recently, I wrote a generational GC. It's... way too memory-intensive to be used in my language which uses one-GC-per-thread for potentially millions of threads, but it certainly was a surprise when that worked.

Personally, I'm keeping track of ML enough to know broad strokes of things but I'm not getting my hands dirty with code until there are some giants to stand on the shoulders of. Those may already exist but it's not clear who they are yet. And I've got very little interest in plugging together opaque API components; I know how to make an API call. I want to write the model code and train it myself.

incidentnormal · on April 17, 2023

I like how you've expressed this insight, and it is so true.

Becoming great at a particular technology stack means modelling it in great detail in your head, so you can move through it without external assistance. But that leaves an arena without discovery, where you just reinforce the same synapses, leading to rigidity and an absence of awe.

tudorw · on April 17, 2023

count me in :)

thm · on April 17, 2023

And repeat that ~4 times to make it look like LangChain

teruakohatu · on April 17, 2023

There is a little more to it than that. Abstractions in ML are very leaky.

thewarrior · on April 17, 2023

I've only been reading ML stuff for a few months and I kind of understand what it's saying. This stuff isn't as complex as its made out to be.

It's just a bunch of black boxes AKA "pure functions".

BLIP2's ViT-L+Q-former AKA

    //I give you a picture of a plate of lobster it will say "A plate of lobster".

    getTextFromImage(image) -> Text

Vicuna-13B AKA

    //I give you a prompt and you return completion ChatGPT style
     getCompletionFromPrompt(text) -> Text

We want to take the output of the first one and then feed in a prompt to the LLM (Vicuna) that will help answer a question about the image. However the datatypes don't match. Lets add in a mapper.

    getAnswerToQuestion(image, question) -> answer 
        text = getTextFromImage(image)
        prompt = mapTextToPrompt(text)
        return getCompletionForPrompt(prompt)

Now where did this mapTextToPrompt come from ?

This is the magic of ML. We can just "learn" this function from data. And they plugged in a "simple" layer and learned it from a few examples of (image , question) -> answer. This is what frameworks like Keras, Pytorch allow you to do. You can wire up these black boxes with some intermediate layers and pass in a bunch of data and voila you have a new model. This is called differentiable programming.

The thing is you don't need to convert to text and then map back into numbers to feed into the LLM. You skip that and use the numbers it outputs and multiply directly with an intermediate matrix.

    getAnswerToQuestion(image, question) -> answer 
        text = getEmbeddingFromImage(image)
        embedding = mapEmbeddingToInputEmbeddingForLLM(text)
        return getCompletionForEmbedding(embedding)

Congratulations you now understood that sentence.

eternauta3k · on April 17, 2023

Interesting, so the LLM is "just" getting your question plus a normal text description of the image (as vectors)?

thewarrior · on April 17, 2023

At a high level yes.

More precisely - It gets the question After irs passed through a matrix that transforms the text description of the image so the LLM can “understand” it.

It maps from the space of one ML model to the other.

nullsense · on April 19, 2023

This feels like such an accessible explanation.

artificial · on April 17, 2023

Thank you for the insightful breakdown. Cheers!

MrGilbert · on April 17, 2023

Just get rid of all the abbreviations in your mind - they seem to be very intimidating. I really liked the explanation that Stephen Wolfram did on ChatGPT:

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

Maybe someone has resources to understand machine-learning on an ELI5 level.

wwalexander · on April 17, 2023

Wow, he waits until halfway through the article to mention A New Kind of Science. Usually he works it into the first couple of paragraphs!

alchemist1e9 · on April 17, 2023

I known it’s hard to believe but I sense LLMs have slightly knocked his ego down and injected a small dose of humility.

https://youtu.be/z5WZhCBRDpU

I pick that up in above video and also in the post above.

Definitely healthy for him which just to be clear I’m a huge Wolfram fan and the ego doesn’t really bother me, it’s just part of who he is, however I do find it nice that LLMs are having him self reflect more than typical.

HarHarVeryFunny · on April 17, 2023

Not a big Wolfram fan myself. I gave him the benefit of the doubt and bought "A New Kind of Science" (freakin' expensive when it first came out), and read the whole 1280 pages cover to cover ... Would have been better presented as a short blog post.

I find it funny how despite being completely uninvolved in ChatGPT he felt the need to inject himself into the conversation and write a book about it. I guess it's the sort of important stuff that he felt an important person like himself should be educating the plebes on.

Predictably he had no insight into it and will have left the plebes thinking it's something related to MNIST and cat-detection.

MrGilbert · on April 17, 2023

I just happen to read this article of him, which I found easy to understand. I'm neither a huge proponent nor opponent of the likes of his work. Or, bluntly speaking: I don’t know much else about his reputation in the community.

selfhoster11 · on April 17, 2023

Seriously, ChatGPT was the thing that gave me a foothold into the AI/machine learning world... because it gave me hope that a mere mortal can achieve something reasonable with this tech without a crazy amount of work and educational background.

anonzzzies · on April 17, 2023

There are really great resources now from eli5 about all of this tech to books like ‘the little learner’ which any programmer can get into. Yes, it takes effort but it is a great time for it.

152334H · on April 17, 2023

I don't have much experience myself. I only started ~10 months ago -- just a month or two before Stable Diffusion.

You just have to do it every day. It's fun!

jack_riminton · on April 17, 2023

Can you recommend what kind of small daily activities would help a web dev get into it?

152334H · on April 17, 2023

Regardless of what you want to learn, "small daily activities" is a bit hard. You can learn some stuff by osmosis, following the feeds of AI devs && AI channels, but the bulk of what I learn comes from starting projects & digging into code & reading papers.

If you can hold attention span over several days (I can't), work on a project bit-by-bit. Just make sure it uses modern AI stuff, and that you have smart people to talk around with.

pc86 · on April 17, 2023

Big "a monad is just a monoid in the category of endofunctors" vibes from this one.

nullsense · on April 19, 2023

But that's literally what it is...

craigching · on April 17, 2023

I was where you're at about ... oh wow, it's been almost ten years since I jumped into machine learning. Mind you, I've been learning on the side most of this time other than a theoretical class at the University of Minnesota. But, that aside, and depending on where you're at in your understanding, this is a great resource for catching up if you're really interested: https://karpathy.ai/zero-to-hero.html it was posted on HN a couple of weeks ago and I have to say it's a really good introduction and Andrej Karpathy is a passionate and excellent teacher. You may want to brush up on some intro Calculus, but it's very understandable.

mewpmewp2 · on April 17, 2023

Only because of big complicated sounding terms, that also exist in web dev.

relativeadv · on April 17, 2023

> like a dog trying to read Tolstoy

this got a chuckle out loud from me. great visual.

egeozcan · on April 17, 2023

This could be a great prompt to test the limits of txt2img models. The astronaut riding a horse got boring already :)

joaogui1 · on April 17, 2023

FWIW I work in LLMs and I consistently fail to do simple webdev stuff

EntrePrescott · on April 18, 2023

Maybe you're just holding it wrong: You're not supposed to let your LLM rest or chat idly while you do the webdev stuff yourself, but to make your LLM do the webdev stuff for you ;P

pansa2 · on April 17, 2023

Web stuff probably makes ML devs feel the same way.

ML is just a different field, using a different set of technologies from those you’re familiar with.

spaceman_2020 · on April 17, 2023

The best ML PhDs can’t do what frontend devs can: understand CSS :D

macawfish · on April 17, 2023