Let's hold our breath. Those are specifically crafted hand-picked good videos, w...

treesciencebot · on Feb 15, 2024

The examples are most certainly cherry-picked. But the problem is there are 50 of them. And even if you gave me 24 hour full access to SVD1.1/Pika/Runway (anything out there that I can use), I won't be able to get 5 examples that match these in quality (~temporal consistency/motions/prompt following) and more importantly in the length. Maybe I am overly optimistic, but this seems too good.

xnx · on Feb 16, 2024

Credit to OpenAI for including some videos with failures (extra limbs, etc.). I also wonder how closely any of these videos might match one from the training set. Maybe they chose prompts that lined up pretty closely with a few videos that were already in there.

htrp · on Feb 15, 2024

https://twitter.com/sama/status/1758200420344955288

They're literally taking requests and doing them in 15 minutes.

drdaeman · on Feb 15, 2024

Cool, but see the drastic difference in quality ;)

golol · on Feb 15, 2024

Lack of quality in the details yes but the fact that characters and scenes depict consistent and real movement and evolution as opposed to the cinemagraph and frame morphing stuff we have had so far is still remarkable!

zamadatix · on Feb 15, 2024

That particular example seems to have more a "cheap 3d" style to it but the actual synthesis seems on par with the examples. If the prompt had specified a different style it'd have that style instead. This kind of generation isn't like actual animating, "cheap 3d" style and "realistic cinematic" style take roughly the same amount of work to look right.

gigglesupstairs · on Feb 15, 2024

Drastic difference in quality of the prompts too. Ones used in the OP are quite detailed ones mostly.

ShamelessC · on Feb 15, 2024

There are absolutely example videos on their website which have worse quality than that.

karmasimida · on Feb 15, 2024

It has a comedy like quality lol

But all to be said, it is no less impressive after this new demo

z7 · on Feb 15, 2024

Depends on the quality of the prompts.

minimaxir · on Feb 15, 2024

The output speed doesn't disprove possible cherry-picking, especially with batch generation.

efrank3 · on Feb 15, 2024

Who cares? If it can be generated in 15 minutes then it's commercially useful.

lostemptations5 · on Feb 16, 2024

Especially of you think that after you can get feedback and try again..15 minutes later have a new one...try again...etc

djoletina · on Feb 15, 2024

What is your point? That they make multiple ones and pick out the best ones? Well duh? That’s literally how the model is going to be used.

dang · on Feb 15, 2024

Please make your substantive points without swipes. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

raydev · on Feb 15, 2024

OpenAI people running these prompts have access to way more resources than any of us will through the API.

timdiggerm · on Feb 15, 2024

Looks ready for _Wishbone_

999900000999 · on Feb 15, 2024

The year is 2030.

Sarah is a video sorter, this was her life. She graduated top of her class in film, and all she could find was the monotonous job of selecting videos that looked just real enough.

Until one day, she couldn't believe it. It was her. A video of of her in that very moment sorting. She went to pause the video, but stopped when he doppelganger did the same.

esafak · on Feb 16, 2024

https://en.wikipedia.org/wiki/Joan_Is_Awful

Zondartul · on Feb 16, 2024

I got reminded of an even older sci-fi story: https://qntm.org/responsibility

sdombi · on Feb 19, 2024

Man i was looking for this story for a year or so... thanks for sharing

turnsout · on Feb 15, 2024

Seems like in about two years I’ll be able to stuff this saved comment into a model and generate this full episode of Black Mirror

dragonwriter · on Feb 15, 2024

> Stable Diffusion is still the go-to solution. I strongly suspect the same thing with Sora.

Sure, for people who want detailed control with AI-generated video, workflows built around SD + AnimateDiff, Stable Video Diffusion, MotionDiff, etc., are still going to beat Sora for the immediate future, and OpenAI's approach structurally isn't as friendly to developing a broad ecosystem adding power on top of the base models.

OTOH, the basic simple prompt-to-video capacity of Sora now is good enough for some uses, and where detailed control is not essential that space is going to keep expanding -- one question is how much their plans for safety checking (which they state will apply both to the prompt and every frame of output) will cripple this versus alternatives, and how much the regulatory environment will or won't make it possible to compete with that.

theLiminator · on Feb 15, 2024

I suspect given equal effort into prompting both, Sora probably provides superior results.

dragonwriter · on Feb 15, 2024

> I suspect given equal effort into prompting both, Sora probably provides superior results

Strictly to prompting, probably, just as that is the case with Dall-E 3 vs, say, SDXL.

The thing is, there’s a lot more that you can do than just tweaking prompting with open models, compared to hosted models that offer limited interaction options.

karmasimida · on Feb 15, 2024

Generate stock video bits I think.

og_kalu · on Feb 15, 2024

It doesn't matter if they're cherrypicked when you can't match this quality with SD or Pika regardless of how much time you had.

and i still prefer Dalle-3 to SD.

sebzim4500 · on Feb 15, 2024

In the past the examples tweeted by OpenAI have been fairly representative of the actual capabilities of the model. i.e. maybe they do two or three generations and pick the best, but they aren't spending a huge amount of effort cherry-picking.

ChildOfChaos · on Feb 15, 2024

Stable diffusion is not the go-to solution, it's still behind midjourney and DAllE

educaysean · on Feb 15, 2024

Would love to see handpicked videos from competitors that can hold their own against what SORA is capable of

barfingclouds · on Feb 16, 2024

Look at Sam altman’s twitter where he made videos on demand from what people prompted him

schleck8 · on Feb 15, 2024

Wrong, this is the first time I've seen an astronaut with a knit cap.

blibble · on Feb 15, 2024

they're not fantastic either if you pay close attention

there are mini-people in the 2060s market and in the cat one an extra paw comes out of nowhere

dartos · on Feb 15, 2024

The woman’s legs move all weirdly too

throwaway4233 · on Feb 15, 2024

While Sora might be able to generate short 60-90 second videos, how well it would scale with a larger prompt or a longer video remains yet to be seen. And the general logic of having the model do 90% of the work for you and then you edit what is required might be harder with videos.

sebastiennight · on Feb 15, 2024

60 seconds at a time is much better than enough.

Most fictional long-form video (whether live-action movies or cartoons, etc) is composed of many shots, most of them much shorter than 7 seconds, let alone 60.

I think the main factor that will be key to generate a whole movie is being able to pass some reference images of the characters/places/objects so they remain congruent between two generations.

You could already write a whole book in GPT-3 from running a series of one-short-chapter-at-a-time generations and passing the summary/outline of what's happened so far. (I know I did, in a time that feels like ages ago but was just early last year)

Why would this be different?

throwaway4233 · on Feb 15, 2024

> I think the main factor that will be key to generate a whole movie is being able to pass some reference images of the characters/places/objects so they remain congruent between two generations.

I partly agree with this. The congruency however needs to extend to more than 2 generations. If a single scene is composed of multiple shots, then those multiple shots need to be part of the same world the scene is being shot in. If you check the video with the title `A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.` the surroundings do not seem to make sense as the view starts with a market, spirals around a point and then ends with a bridge which does not fit into the market. If the the different shots generated the model did fit together seamlessly, trying to make the fit together is where the difficulty comes in. However I do not have any experience in video editing, so it's just speculation.

esafak · on Feb 16, 2024

The CGI industry is about to be turned upside down. They charge hundreds of thousands per minute, and it takes them forever to produce the finished product.

Solvency · on Feb 15, 2024

You do realize virtually all movies are made up of shots often lasting no longer than 10 seconds. Edited together. Right.

Der_Einzige · on Feb 16, 2024

The best films have long takes. Children of men or stalker come to mind

imbnwa · on Feb 17, 2024

Copacabana tracking shot in Goodfellas