Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It finally feels like the professional tools have greatly outpaced the open source versions. While wan and hunyuan are solid free options, the latest from Google and Runway have started to feel like a league above. Interestingly it feels like the biggest differentiator is editing tools - ability to prompt motion, direction, cuts, or weaving in audio, rather than just pure ability to one shot.

These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.



> While wan and hunyuan are solid free options, the latest from Google and Runway

The Tencent Hunyuan team is cooking.

Hunyuan Image 2.0 [1] was announced on Friday and it's pretty amazing. It's extremely high quality text-to-image and image-to-image with millisecond latency [2]. It's so fast that they've built a real time 2D drawing canvas application with it that pretty much duplicates Krea's entire product offering.

Unfortunately it looks like the team is keeping it closed source unlike their previous releases.

Hunyuan 3D 2.0 was good, but they haven't released the stunning and remarkable Hunyuan 3D 2.5 [3].

Hunyuan Video hasn't seen any improvements over Wan, but Wan also recently had VACE [4], which is a multimodal control layer and editing layer. The Comfy folks are having a field day with VACE and Wan.

[1] https://wtai.cc/item/hunyuan-image-2-0

[2] https://www.youtube.com/watch?v=1jIfZKMOKME&t=1351s

[3] https://www.reddit.com/r/StableDiffusion/comments/1k8kj66/hu...

[4] https://github.com/ali-vilab/VACE


I think open source still has an important advantage in the pro environment despite being less convenient, and it's the possibility of adding things in between the generation process like control net, and custom loras with new concepts or characters.

Plus in local generation you're not limited by the platform moderation that can be too strict and arbitrary and fail with the false positives.

Yes comfy UI can be intimidating at first vs an easy to use chatgpt-like ui, but the lack of control make me feel these tools will still not being used in professional productions in the short term, but more in small YouTube channels and smaller productions.


I don't think this is just about convenience - you're not going to get these results with a 14B video model. I'd much prefer to have something I could hack on in ComfyUI but the open weights models don't compete with this anymore than a 32B LLM competes with Gemini 2.5 Pro for coding. And at least in coding you can easily edit the output from the LLM regardless...


> you're not going to get these results with a 14B video model

Foundation models are starting to outstrip any consumer hardware we have.

If Nvidia wants to stay ahead of Google's data center TPUs for running all of these advanced workloads, they should make edge GPU compute a priority.

There's a future where everything is a thin client to Google's data centers. Nvidia should do everything in its power to prevent that from happening.


Your post strangely sounds like Nvidia primarily makes graphic cards for consumers.

Last time I checked, they couldn't produce enough H100s/GB100s to satisfy demand from everyone and their mother running a data center. And their most recent consumer hardware offerings have been repeatedly called a "paper launch" - probably because consumer hardware isn't a priority, given the price (and profit) delta.


I read their comment as meaning that Nvidia should prioritise a specific kind of consumer/prosumer hardware.

Nobody is running H100s at home, nor are most video companies running ones. So the choice for them is to "rent" them from Google, or... invest a lot in almost impossible to obtain Nvidia hardware? One has lower initial cost, and is available now.


Thanks for the (possible) clarification.

But as long as Google isn't their _only_ customer, why would Nvidia care?


>There's a future where everything is a thin client to Google's data centers. Nvidia should do everything in its power to prevent that from happening.

there has always been, the mainframe concept is not new. but it goes in and out of fashion.

>>>> mainframe

<<<< personalpc

>>>> web pages/social media

<<<< personal phones/edge

>>>> cloud ai

<<<< ???? personal robotics, chips and ai ???

>>>> ???? rented swarms ???


Control net etc can be served via API; the intrinsic advantage of open-source is the ability to train and run inference privately.


Someone out there might care about nudity, but unfortunately, nobody that matters.


> the agency/hollywood use cases.

It's for advertising.


IMO, this is a misconception. For example, in the case of social media display ads (i.e., not the typical Google text ad), most campaigns are "saturation," they only work if the creatives are seen 100+ times by the intended audience and look more or less exactly the same, which is kind of the exact opposite theory that benefits from being able to create unlimited personalized creatives.


We already have seen that Opensource can compete which is a lot more than people expected. After all opensource and running huge models?

But what it means, that with time, Opensource will be as good as what commercial offerings now have. Hardware will get cheaper, research is open or delayed open.


Has anyone cracked the nut of making videos longer than a few seconds though? No one seems to have made any progress on this. This is all nearly worthless until that is addressed.


I thought that for a while. Until it was pointed out to me that most long videos are made of 6 second shots.

Generating a long video one shot at a time kind of makes sense, as long as there's good consistency between shots


It would narrow down what you could do by quite a bit if you are limited to 6 seconds per shot. While that is average, there are many shots that are longer. Also, you bring up a good point about consistency between shots. That doesn't seem like as hard of a problem, but it's still a big one.


I don’t care if the normies dont do long shots. The best movies have very long shots. Children of men or 1917. Until AI video gen can get past 5-10 second shots of slop, we won’t see any major critically acclaimed AI movies or related.


I would think there’s a major difference in the inference time commute for tools like this. And major providers can spend a lot more (at a loss) on the runtime compute. That’s just a guess though.


We will know GAI exists when there is no difference, because anything can be coded at any level of quality :)


Well no, humans have 'natural' general intelligence and there's an obvious gap between an expert and a novice at any task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: