So, these demos look outstanding, far better than Wan-2.1 which is the current open weights leader in the video generation community.
I’m particularly interested in their claim that they’re improving on real world physics and understanding, sort of a JEPA-reminiscent claim. Nothing about their model, training, infra in the announcement.
I’ve never used Runway, but their pitch and partners (Lionsgate, for instance) tell me they’re working on making a tool for industry pros —- this already looks to me like it will be useful if it integrates well with existing tools.
One of the most interesting shots they showed was an infill based on sketch; they had an image of a misty farm area, and a sketch of a farmhouse, they showed the farmhouse rendered in, then the same house on fire, then what looked like maybe a color grading step. I’m curious to play with this and see how good it is.
*UPDATE* I just paid for a monthly standard subscription to try it out, and .. couldn’t figure out how to get access to it at all. I can say that Runway 3 ML Turbo gives real nightmare fuel on cel-shaded source images in I2V use cases though. Waiting to see if I get my refund or access.
> improving on real world physics and understanding
Can't help but flashback to when Sora was first previewed and the hype crew were claiming it was a world physics engine and now when it finally shipped months and months later no one uses it and it didn't deliver on any of the hype at all.
No-one uses it because there are better and cheaper and faster and more open models out there. Had they released it right after the announcement they could have surfed the hype wave. And of course do these models simulate the world to some degree, they have to. It's not a universe simulator to be sure, but stuff like gravity, momentum, etc. is kind of inferred,
Kling, Hailuo Minimax, and Google Veo are the best tools. Kling is the leader by an extremely wide margin.
Wan video is a brand new open weights model that is developing a nice research and tooling ecosystem around it. Netflix has been playing with it. Further research and tooling will be built on top of it for sure.
Sora is garbage (currently [1]) and Runway, Luma, and Pika are mid. Pika kind of left the race to focus on social media content and memes.
There are a bunch of other stealth / research foundation models that will be rolling out soon. It's a hyper competitive market and consumers are going to win.
[1] Given the insane capabilities of the just released OpenAI 4o image gen, I expect the next Sora to be impressive.
I wonder if it would be possible to create an AI image/video editor very much like blender or photoshop - where you get to drag objects around, perhaps based on a transformer(?) model, where each token(s) encode an object similarly how a 3D game engine encodes a game object - like latent vectors matrix rows that correspond to position in world space, size, texture, etc.
The 'renderer' would be a neural net that would take this soup of tokens, and resolve it to a 2d frame.
The underlying logic engine would be a human one - or perhaps a traditional video game engine, that emits the tokens from which the context can be built up and decoded into an image.
This is still nowhere near as good as Kling or Veo. And word is that Sora's next release is mind-blowing.
> far better than Wan-2.1 which is the current open weights leader
Open models still have tremendous value. You aren't censored or controlled with Wan-2.1, and censorship is more annoying than you might think. We've been making a zombie film and even prompts that relate to blood and gore can get censored. Furthermore, you can plug Wan into into open source tools and workflows for enhanced controllability. Wan literally just launched a month ago (Wan 2 was the first release), so it's a really unfair assessment.
I think Runway as a company is dead in the water. They raised way too much money, and there are a billion competitors with similar shaped products. They'll probably wind up getting quietly acquihired under their valuation. The video models are hugely fungible.
> I’ve never used Runway, but their pitch and partners (Lionsgate, for instance) tell me they’re working on making a tool for industry pros
They don't have industry partnerships beyond Lionsgate. It's [redacted] that is selling to Disney and the tier-1 studios, not Runway. You'll hear more news relating to some of their live action films soon. I know those folks are on this forum.
In any case, the floodgate of major studios using AI is about to open. I'm more excited about the independent professionals, though. That's where the hyper-local and niche creativity lies.
I paid for a month worth of credits to try it out, not this version but the previous and I depleted the credits really really fast, you really need to keep iterating a lot to get a good result tbh. You can see these generated clips here: https://www.youtube.com/watch?v=jpE-8scsOy0
Pretty impressive. I watched "The Retrieval". It gave me slight Scavengers Reign vibes, maybe inspired by it. There are still some minor/subtle artifacts that give it away as an AI generation, but still totally watchable and enjoyable.
I suspect we'll see some fully AI generated feature length films in the future. There will be all sorts of discourse about it, but IMO at the end of the day, it's just another tool for expression.
This looks really pretty and the website hints at professional use, but I'd like to ask anyone who works in the media industry - is this model good enough, or convenient enough to fit into the production pipeline of professional video editors or will this be another novelty model used to generate memes and stuff?
Lots of quick cuts on the longer movies. Good quality but I can't find any details on perhaps one of the most important questions, which is what's the maximum duration for either T2V or I2V? I have yet to see any decent cohesive generators beyond roughly 10 seconds.
Hopefully Gen-4 can improve.