They have got pretty good documentation too[1]. And Looks like we have day 1 support for all major inference stacks, plus so many size choices. Quants are also up because they have already worked with many community quant makers.
Not even going into performance, need to test first. But what a stellar release just for attention to all these peripheral details alone. This should be the standard for major release, instead of whatever Meta was doing with Llama 4 (hope Meta can surprise us at LlamaCon tomorrow though).
Second this , they patched all major llm frameworks like llama.cpp, transformers , vllm, sglang, ollama etc weeks before for qwen3 support and released model weights everywhere around same time. Like a global movie release. Cannot undermine mine this level of detail and effort.
Alibaba, I have a huge favor to ask if you're listening. You guys very obviously care about the community.
We need an answer to gpt-image-1. Can you please pair Qwen with Wan? That would literally change the art world forever.
gpt-image-1 is an almost wholesale replacement of ComfyUI and SD/Flux ControlNets. I can't underscore how big of a deal it is. As such, OpenAI has leapt ahead and threatens to start capturing more of the market for AI images and video. The expense of designing and training a multimodal model presents challenges to the open source community, and it's unlikely that Black Forest Labs or an open effort can do it. It's really a place where only Alibaba can shine.
If we get an open weights multimodal image gen model that we can fine tune, then it's game over - open models will be 100% the future. If not, then the giants are going to start controlling media creation. It'll be the domain of OpenAI and Google alone. Firing a salvo here will keep media creation highly competitive.
So please, pretty please work on an LLM/Diffusion multimodal image gen model. It would change the world instantly.
And keep up the great work with Wan Video! It's easily going to surpass Kling and Veo. The controllability is already well worth the tradeoffs.
I don't know, the AI image quality has gotten good but it's still slop.
We are forgetting what makes art, well art.
I am not even an artist but yeah I see people using AI for photos and they were so horrendous pre chatgpt-imagen that I had literally told one person if you are going to use AI images, might as well use chatgpt for it.
Also though I would also like to get something like chatgpt-image generating qualities from an open source model. I think what we are really looking for is cheap free labour of alibaba team.
We are wanting for them / anyone to create open source tool so that anyone can then use it, thus reducing the monopoly of openai but that is not what most people are wishing for, they are wishing for this to lead to reduction of price so that they can use it either on their own hardware for very few cost or some providers on openrouter and its alikes for cheap image generation with good quality.
Earlier people used to pay artists, then people started using stock photos, then Ai image gen came, and now we have gotten AI image pretty much good with chatgpt and now people don't even want to pay chatgpt that much money, they want to use it for literal cents.
Not sure how long this trend will continue, when deepseek r1 launched, I remember people being happy that it was open source but 99% people couldn't self host it like I can't because of its needs and we were still using API but just because it was open source, it reduced the price way too much forcing others to reduce it as well, really making a cultural pricing shift in AI.
We are in this really weird spot as humans.
We want to earn a lot of money yet we don't want to pay anybody money/ want free labour from open source which is just disincentivizing open source because now people like to think its free labour and they might be right.
It's pretty much expected that everything is "world shaking" in the modern day tech world. Now whether it's true or not is a different thing everytime. I'm fairly certain even the 4o image gen model has shown weaknesses that other approaches didn't, but you know, newer means absolutely better and will change the world.
oh boy I had a smirk after reading this comment because its partially true.
When deepseek r1 came, it lit the markets on fire (atleast american) and then many thought it would be the best forever / for a long time.
Then came grok3 , then claude 3.7 , then gemini 2.5 pro.
Now people comment that gemini 2.5 pro is going to stay forever.
When deepseek came, there were articles like this on HN:
"Of course, open source is the future of AI"
When Gemini 2.5 Pro came there were articles like this:
"Of course, google build its own gpu's , and they had the deepnet which specialized in reinforced learning, Of course they were going to go to the Top"
We as humans are just trying to justify why certain company built something more powerful than other companies. But the fact is, that AI is still a black box, People were literally say for llama 4:
"I think llama 4 is going to be the best open source model, Zuck doesn't like to lose"
Nothing is forever, its all opinions and current benchmarks. We want the best thing in benchmark and then we want an even better thing, and we would justify why / how that better thing was built.
Every time, I saw a new model rise, people used to say it would be forever.
And every time, Something new beat to it and people forgot the last time somebody said something like forever.
So yea, deepseek r1 -> grok 3 -> claude 3.7 -> gemini 2.5 pro (Current state of the art?), each transition was just some weeks IIRC.
Your comment is a literal fact that people of AI forget.
yes, but there is a finite number of them, by default equal to the number of available cores. If your connection stays in c-land for too long you might run into trouble, if more than one connection are desired.
Love this game. It taught me to look for a move like g4.
Also it was probably not objectively the best move (and definitely not Spassky's best game) but Tim Krabbé made a list of 110 most fantastic moves ever played, and he put Spassky's 16...Nc6 against Averbakh as no. 1 (certainly an unorthodox choice):
Well if you don't do that, you would only have a lowest common denominator system with none of the advanced features, and the world has enough of those.
If I wanted to emulate a fraction of org-mode features I use daily in markdown, I would have to use Obsidian with a bunch of plugins, and we would end up in same boat.
I don't know the exact spec situation, but I know that comprehensive parsing libraries in modern non-elisp languages exist. An org-mode contributor has one in Julia[1], another in Dart[2] that powers a Flutter app, and there are many tree-sitter grammar based tools that are useable from neovim (e.g. [3]). The basic org2html or org2pdf CLI needs are already addressed by Pandoc.
The format/spec isn't really the problem. It's just that parsing and rendering Org Mode files is like parsing and rendering .PSD files - getting your app to open and write PSDs alone does not turn your app into Photoshop; you're still missing 99% of the features.
Right but if you implement the rest of the features, you become an org-mode IDE anyway. I don't mean that's a bad thing, syntax is just bare minimum, most usefulness comes from interactive developer experience anyway. I was just addressing the point parent raised that marrying markup with IDE is a bad idea, and that when it comes to syntax org-mode is hardly underspecified compared to other markup languages.
Would it be neat to have an alternate complete implementation of org-mode elsewhere? Sure. But unlike say Photoshop, Emacs is already open and extremely hackable. That mitigates most of the reasons you would want to.
I use it to write long running little scripts. It's better than shell, and imo more useful out of the box than Lua. They are basically for I/O (often using sqlite which it has great integration with, or using expect which is also great) so I don't care about performance, but I like that it requires much less memory compared to say Python for that kind of little things (there is also jimtcl which is also neat in this regard).
So far an API has been less of a priority than focusing on the user-facing product. But it seems there's a reasonable amount of demand for it, which we'll consider.
I consider AIs without API access even as non existent. Not everybody wants a web interface and waste time on copy&paste all the time. APIs can hook the filesystem directly with an AI, make complicated prompt engineering and multi file changes a non-issue. And they should also help you to make more money (don't undersell the API access and you're fine).
Without an API the community can also not compare Phind-405B to other models easily.
Would be great to have access to your model in a LLM gateway like https://openrouter.ai/
You should also consider the ecosystem value that might be created for your product. There’s a prior example.
ChatGPT amazed people but its UI didn’t. A bunch of better UI’s showed up building on the OpenAI API and ChatGPT own. They helped people accomplish more which further marketed the product.
You can get this benefit with few downsides if you make the API simple with few promises about feature. “This is provided AS IS for your convenience and experimentation.”
I think an API would be fantastic for use cases like Aider / SWE agents. The primary issue besides fully understanding the code base is having up-to-date knowledge on libraries and packages. Perplexity has "online" models. And phind with Claude, GTP-4o, Phind 70 + search / rag would be awesome.
Similar feeling here. I found that simple full-text search goes a long way to scale without needing to impose structures (and things like backlinks). It will probably not be enough for prolific note takers but I think it's enough for people with say <1000 files.
(although I use Xeft[1] which is spiritually the same but I find the UI cleaner and snappier because it uses Xapian rather than elisp for full-text search).
I actually switched from Deft -> Xeft -> Howm as my main UI for searching and creating notes :).
Xeft is indeed great, and I still keep it around for more involved searches (like “+key1 -key2” and “key1 NEAR key2”). But most of my searches now go through Howm, which supports fulltext as-you-type searching (can be invoked by pressing “g” for “grep” from its note list), and it can shell out to Ripgrep to speed it up.
Not even going into performance, need to test first. But what a stellar release just for attention to all these peripheral details alone. This should be the standard for major release, instead of whatever Meta was doing with Llama 4 (hope Meta can surprise us at LlamaCon tomorrow though).
[1] https://qwen.readthedocs.io/en/latest/