More

vnglst · 2025-03-11T17:14:06 1741713246

As I cannot change the title anymore I think that’s my best option now :-)

vnglst · 2025-03-11T16:55:46 1741712146

Wow this one is great!

vnglst · 2025-03-09T09:16:46 1741511806

Made with just HTML, CSS & JavaScript. Inspired by Trainspotting and Gina Trapani's Truisms.

vnglst · 2025-03-08T10:11:35 1741428695

Sorry dead link. New post is up here: https://news.ycombinator.com/item?id=43298945

vnglst · 2025-03-08T10:06:01 1741428361

Shepherd's Dog is a game I've wanted to create for a long time, but I never got the sheep flocking behaviour just right. The goal of the game is to herd all the sheep into the pen before nightfall. I've asked several models to create this game and I'm particularly impressed with what Claude 3.7 could do with a one-shot prompt.

- You can play the Claude game here (note: doesn't work on Safari for some reason): https://html-preview.github.io/?url=https://raw.githubuserco...

- o3-mini's version is here: https://html-preview.github.io/?url=https://raw.githubuserco...

Results of other models and a leaderboard is here: https://github.com/vnglst/when-ai-fails/blob/main/shepards-d...

Some videos: https://hachyderm.io/@vnglst/114125938185826311

Keyframe · 2025-03-11T11:04:46 1741691086

Shepherd's Dog is a game I've wanted to create for a long time

Not sure if you're aware, but there was a game like that for playstation and GBA, called Sheep! https://en.wikipedia.org/wiki/Sheep_(video_game) Here's some gameplay footage (player here didn't chose a dog to play with for some reason): https://www.youtube.com/watch?v=SP058CHQj20 Premise of the game is the same, you run the sheep to the designated area over obstacles.

vnglst · 2025-03-11T16:37:39 1741711059

Ah thanks for this. The game above is lovely and it’s really similar to what I had in mind (I was also thinking of lemmings!). I see in the other comments below that this idea of mine has been created as a game a lot of times already. Seems like I’m not as original as I thought haha

Keyframe · 2025-03-11T18:32:37 1741717957

Seems like I’m not as original as I thought haha

in creative work that's absolutely irrelevant. Don't even think about that. Everything has been done before; It's your take that counts, your vision!

AustinDev · 2025-03-11T03:16:29 1741662989

Just tried a 1-shot on Grok3 - Thinking and it couldn't get past the start button. Throws an error: | "<a class='gotoLine' href='#67:39'>67:39</a> Uncaught ReferenceError: startGame is not defined"

Scope issue.

No barking or dog player model but pretty similar in style to Claude's output.

What's interesting to me about playing with AI Codegen is each model has specific and sometimes overlapping output errors. Claude 3.7 really like to solve errors by returning dummy data as a 'fallback' when doing client or server calls. A little prompting can reduce this but not eliminate it. 'The tests always pass if you return dummy data'

https://jsfiddle.net/aL3ugtj1/

jchw · 2025-03-11T03:07:36 1741662456

Here is an attempt using Google Gemini 2.0 Pro Experimental.

https://gist.github.com/jchv/e8869a7cbe2d854a0ec93e946030d90...

It seems like it has some issues, but the result is interesting nonetheless. Just a one-shot like the others, needed a single "Keep going" but otherwise this is the vanilla output from the prompt.

Edit: Looks like you can share an HTML preview of a gist using html-preview.github.io, so here's that. https://html-preview.github.io/?url=https://gist.githubuserc... - It'll go to level 2 if you refresh the page and hit Restart, but I don't think it's possible to clear Level 2. The flock stays too far apart to fit enough sheep in the pen.

n4r9 · 2025-03-11T08:47:03 1741682823

I just played the Claude attempt and found that the "fence" in level 3 doesn't actually obstruct either the dog or the sheep. Otherwise pretty fun.

swyx · 2025-03-11T10:13:04 1741687984

great demos. one shotting isnt really fair imo, i feel like that might be hard even for a human to do (working without feedback). i'd be curious what deepseek would do with a bit more feedback.

breckenedge · 2025-03-11T01:51:48 1741657908

Since you’re releasing the code to GitHub, do you think you’ll eventually run into issues with the training data including prior versions of the game?

tdy_err · 2025-03-11T04:04:13 1741665853

The implied scenario being that the memory of its own output would result in the model producing degraded future output? Why is that a given?

mythrwy · 2025-03-11T10:57:48 1741690668

Probably the same reason that close relatives marrying each other for generations produces genetic problems.

Etherlord87 · 2025-03-11T11:27:38 1741692458

Not the same reason at all. In genetics the reason is that you're losing gene variety and eventually recessive genes aren't suppressed anymore. In case of LLM it's just error accumulation.

mythrwy · 2025-03-18T22:56:24 1742338584

It's a few days late but "losing gene variety" isn't the cause. What happens is genetic errors compound and are more likely to be expressed. I.E. "error accumulation".

Etherlord87 · 2025-03-23T08:13:31 1742717611

You're wrong. You clearly have the Internet, I don't understand why won't you just google it and learn about it instead of claiming stuff that is bs.

mythrwy · 2025-03-31T19:08:01 1743448081

How about a number of grad level genetics courses? Does that beat your google search? Because that is what I have. And what I am telling you is what happens.

This is really easily searched (as you said).

You might read up on it if interested. Check out why inbreeding can lead to expression of genetic defects. What is the mechanism? (hint: it's not "losing gene diversity" or "suppression").

Etherlord87 · 2025-04-01T11:54:26 1743508466

Very bad courses then.

https://biology.stackexchange.com/questions/58769/what-are-t...

mythrwy · 2025-04-01T14:18:46 1743517126

Without getting into the validity of the source, let's look at what it says:

Here is the first sentence from the top answer:

`You are right. Inbreeding strongly increases overall homozygosity which subjects inbred individuals to diseases caused by rare recessive alleles.`.

Let's see what homozygosity means shall we?

https://www.genome.gov/genetics-glossary/homozygous

`Homozygous, as related to genetics, refers to having inherited the same versions (alleles) of a genomic marker from each biological parent. Thus, an individual who is homozygous for a genomic marker has two identical versions of that marker. By contrast, an individual who is heterozygous for a marker has two different versions of that marker.`

In other words, errors can accumulate and are more likely to be expressed. Not "gene diversity" (this is a topic relating to evolutionary fitness, selection potential etc.), not "suppression". Error accumulation.

Which is the exact analogy I made initially.

Etherlord87 · 2025-04-01T21:48:16 1743544096

I had this conversation before. I point out how your interpretation is insane and doesn't follow logical reasoning, and you accuse me of gaslighting. I don't want to waste anyone else's time. We could just paste to an AI our both initial statements and ask who is more correct, but I'm sure you would either say AIs (all of them or 99% of them) are wrong, or you would interpret them saying I'm more correct, as you being right.

I have no problems being wrong on the Internet. Unfortunately, for some magical reason, in the overwhelming majority of my conversations, I either recognize it within a minute (or one reply when in writing), or never.

mythrwy · 2025-04-02T00:39:47 1743554387

Let me give you a simple example maybe you will understand better.

Let's say a person has a recessive faulty gene. The gene doesn't get expressed because there is only one copy (recessive). We can notate this Aa (small "a" being the faulty gene, large "A" being the good copy). The person has two copies because they get one from each parent.

So "Aa" has a partner we can notate as "AA" (two good copies of the gene). AA and Aa have a child. What is the chance the child has the recessive gene? 25% because we have 4 possibilities with 1 bad outcome. Can the child have two bad copies (i.e. "aa" where the gene gets expressed)? No, they cannot because there are not two copies available from the parents, only one. At most they get "Aa". 75% chance they get "AA".

Let's say AA and Aa have a bunch of kids, the kids intermarry. Then their kids intermarry. Now what is the chance of an individual having two bad copies (i.e "aa"). What is the chance they have 1 bad copy (Aa)?

It's just probability calculations, and the expression becomes more probable as there are more copies of the bad gene in the gene pool. I.E within a population, the errors accumulate, they build up, there is a larger chance of getting expression of the defect (aa) with continued inbreeding.

This works with desirable genes too which is why we have so many kinds of dogs for instance. We select for it and build up copies of gene expressions we want to see to the point there is a 100% (or close to) chance of expression.

Hopefully you get this now. If not, read up on Mendelian genetics and table calculations maybe that will help you see.

------------------------

So let me take this back to the original example of LLMs. Suppose there is 1% chance an LLM confidently claims Python library "Foo" exists and does XX when it's not true. This is analogous to a bad copy of the gene. If you train on that output (i.e. "inbreeding"), then use that as a reference (more inbreeding), soon many sources will say "Foo" exists and you'll have a larger chance of getting "Foobarred" information from the LLM.

Chaosvex · 2025-03-11T10:40:09 1741689609

Read about model collapse. The TL;DR is garbage in, garbage out.

https://en.wikipedia.org/wiki/Model_collapse

vnglst · 2025-03-08T09:15:29 1741425329

Shepherd's Dog is a game I've wanted to create for a long time, but I never got the sheep flocking behaviour just right. The goal of the game is to herd all the sheep into the pen before nightfall. I've asked several models to create this game and I'm particularly impressed with what Claude 3.7 could do with a one-shot prompt.

- You can play the Claude game here (note: doesn't work on Safari for some reason): https://html-preview.github.io/?url=https://raw.githubuserco...

- o3-mini's version is here: https://html-preview.github.io/?url=https://raw.githubuserco...

Results of other models and a leaderboard is here: https://github.com/vnglst/when-ai-fails/blob/main/shepards-d...

Some videos: https://hachyderm.io/@vnglst/114125938185826311

vnglst · 2025-02-19T18:39:51 1739990391

Oh no, thanks for pointing this out! I asked GTP-4o to convert the image to text for me and I only checked some of the cards, assuming the rest would be correct. That was a mistake.

I've now corrected the experiment to accurately take the image into account. This meant that Deepseek was no longer able to find all the sets, but o3-mini still did a good job.

vnglst · 2025-02-15T10:28:55 1739615335

Set is a card game where players have to identify sets of three cards from a layout of 12. Each card features a combination of four attributes: shape, color, number, and shading. A valid set consists of three cards where each attribute is either the same on all three cards or different on each. The goal is to find such sets quickly and accurately.

Though this game is a solved computer problem — easily tackled by algorithms or deep learning — I thought it would be interesting to see if Large Language Models (LLMs) could figure it out.

oidar · 2025-02-19T00:46:12 1739925972

If you think this is fun, try to see how it garbles predicate logic.

vnglst · 2024-06-07T06:59:11 1717743551

Thanks Joris!

According the the Wageningen University report there is! It's small road, highways but also rail road tracks. The actual report can be found here: https://research.wur.nl/en/publications/landelijk-grondgebru...

This also contains the definitions of land use categories. I've grouped all of the above (roads, rail, highways) under infrastructure.

jules · 2024-06-08T11:26:58 1717846018

From reading the description, perhaps "urban greenery" would be a better term than "urban grass".

vnglst · 2024-06-05T17:42:27 1717609347

Thanks! Yeah, getting the right data is challenging, but other than that it’s doable reusing a lot of the current rendering code.