Hacker News new | past | comments | ask | show | jobs | submit login

I'm having a lot of fun experimenting with stuff like this. I'm trying to put together an unrealengine blueprints style graph editor to allow people to design workflows like this where you start with the user prompt input, which goes to one agent, which makes an initial attempt, and then that conversation history gets passed to another "agent" with a different system prompt telling it to be a harsh critic, but to also give a pass/fail signal, and loop back until the critic judges pass, then send that back to the user as output. Ideally as a little website that can call your own LLM endpoints and save/load/share workflow graphs.

Mistral small 3.1 and gemma 3 feel like the first semi-competent models that can be run locally, but that competence is just a seed, and they still need to be guided with a framework that keeps them on track.

Try giving it python execution in a loop and tell it to explore the world. It'll start trying to download and read news and stuff.




I am thinking the same thing! Multiple "personalities", in parallel, or in series. For example, I have approximated, in GPT, some of Gemini's ability to call out nonsense, sloppy thinking, by telling GPT to be mean! (The politeness seems to filter out much that is of great value!)

However, the result is not pleasant to read. Gemini solved this in their training, by doing it in two phases... and making the first phase private! ("Thinking.")

So I thought, what I need is a two-phase approach, where that "mean" output gets humanized a little bit. (It gets harsh to work in that way for more than short intervals.)

As a side note, I think there would be great value in a UI that allows a "group chat" of different LLM personalities. I don't know if such a thing exists, but I haven't seen it yet, although the message object format seems to have been designed with it in mind (e.g. every message has a name, to allow for multiple users and multiple AIs).

Even better if it supports multiple providers, since they have different strengths. (It's like getting a second opinion.)


I disagree.

If anything, telling GPT to be blunt seems to downgrade its IQ; it hallucinates more and makes statements without considering priors or context. I jokingly call it Reddit mode.


why would that be a joke? there's a ton of Reddit comments in the training data, and the output is of similar quality. LLMs are literally outputting average Reddit comments.


I have hard similar things but I think that's an exaggeration. When I tell GPT o3 or o4-high to assume a professional air, it stops acting like a meat-based AIs on r/politics; specifically, it stops making inane assumptions about the situation and starts becoming useful again.

For example, I had a question from a colleague that made no sense and I was trying to understand it. After feeding the question to GPT 3o, it aggressively told me that I made a major mistake in a quote and I had to make major changes. (It would be OK if this is what the colleague had said, but this wasn't the case). In reality the colleague had misunderstood something about the scope of the project and GPT had picked up on the other person's opinion as the "voice of reason" and just projected what it thought he was saying in a stronger way.

I changed its instructions to "Be direct; but polite, professional and helpful. Make an effort to understand the assumptions underlying your own points and the assumptions made by the user. Offer outside-of-the-box thinking as well if you are being too generic.". The aggro was immediately lost, and it instead it actually tried to clarify what my colleague was saying and being useful again.

I agree with those who say the vanilla version is sycophantic, but the plain talk version has far too many bad habits from the wrong crowd. It's a bit like Monday; lots of aggro, little introspection of assumption.


Reddit works hard to make comments accessible to only Google. However MS + OIA might have grabbed something before Reddit-Google contract.


See, he's not joking, he's "joking" ...


> As a side note, I think there would be great value in a UI that allows a "group chat" of different LLM personalities.

This is the basic idea behind autogen. They also have a web UI now in autogen studio, it's gotten a bit better. You can create "teams" of agents (with different prompts, themes, tools, etc.) and have them discuss / cooperate. I think they even added memory recently. Have a look at it, might be what you need.


MoE, but an abstraction deeper?


I think you can do most of this already with llm-consortium (maybe needs the llm-openrouter plugin with my pr merging)

A consortium sends the same prompt to multiple models in parallel and the responses are all sent to one arbiter model which judges the model responses. The arbiter decides if more iterations are required. It can also be forced to iterate more until confidence-threshold or min-iterations.

Now, using the pr i made to llm-openrouter, you can save an alias to a model that includes lots of model options. For examples, you can do llm openrouter save -m qwen3 -o online -o temperature 0, system "research prompt" --name qwen-researcher

And now, you can build a consortium where one member is an online research specialist. You could make another uses JSON mode for entity extraction, and a third which writes a blind draft. The arbiter would then make use of all that and synthesize a good answer.


Any links or names of example implementations of this?


https://github.com/irthomasthomas/llm-consortium

also, you aren't limited to cli. When you save a consortium it creates a model. You can then interact with a consortium as if it where a normal model (albeit slower and higher quality). You can then serve your custom models on an openai endpoint and use them with any chat client that supports custom openai endpoints.

The default behaviour is to output just the final synthesis, and this should conform to your user prompt. I recently added the ability to continue conversations with a consortium. In this case it only includes your user prompt and final synthesis in the conversation, so it mimics a normal chat, unlike running multiple iterations in the consortium, where full iteration history and arbiter responses are included.

UV tool install llm

llm install llm-consortium

llm install llm-model-gateway

llm consortium save qwen-gem-sonnet -m qwen3-32b -n 2 -m sonnet-3.7 -m gemini-2.5-pro --arbiter gemini-2.5-flash --confidence-threshold 95 --max-iterations 3

llm serve qwen-gem-sonnet

In this example I used -n 2 on the qwen model since it's so cheap we can include multiple instances of it in a consortium

Gemini flash works well as the arbiter for most prompts. However if your prompt has complex formatting requirements, then embedding that within an already complex consortium prompt often confuses it. In that case use gemini-2.5-pro for the arbiter. .


Have you tried n8n? It allows you to build flows like that - you can run the community version in a Docker container within a few minutes and share the configurations for the flows you have built very easily.


_#_ has to be one of the worst word shortening schemes I've ever seen get widespread. It only works with a very small number of long-lived technologies, in which case they basically just get a nickname, "k8s" "i18n". It does not at all work for larger contexts. You're basically making someone solve a crossword (2 across, 10 letters with two filled in) just to parse your sentence.


I just googled it and it looks like “n8n” is the name of the service. The op wasn’t abbreviating anything so I don’t think it’s the same phenomenon as what you’re describing.


Well, the service is doing the same thing though. The part I don't understand is that I assume n8n is short for "Nation" but literally every single person I've seen talk about it on YouTube (which is quite a lot) say "En Eight En" every time.


nation is too short for 8 - maybe navigation?


Looks like n8n is short for nodemation


Why do we do this to ourselves?


Techno-flagellation is the only way to atone


So the 8 stands for "odematio"? That sounds about right.



The app is actually called n8n - https://n8n.io/


It's just another form of any other jargon - unknown until you know it, and usually specific to the use case. I see k8s and i18n or a11y and I know exactly what they mean because at some point I learned it and it's part of the world I live in. Searching for stuff is how we learn, not solving crosswords.


I kind of get k8s and can live with i18n (at least it's a long word). But a11y just shouldn't exist. "Oh look, it looks like ally, what a cute play on words". Yeah, but for a dumb joke and 9 saved keystrokes you literally made the word accessibility less accessible. That's exactly the opposite of what accessibility is about


Right, my complaint is that it only works like jargon, where you are just giving something a context-specific nickname. As a word shortening scheme, it's terrible. A world where many projects have names like s11g is a nightmare.


No it's not just part of the world and it's fatality we have to live with like gravity. Abbreviation can in rare occasion have a net benefit, but only in very narrow highly unusual context do they bring any general benefit. Most often than not it just obfuscate the message for new comers, making artificial entry barrier higher.


I had not, but that looks awesome. Microsoft put out something called "agent flows" that also fits this category.[1] I'm working on more of an "at home" version - no "talk to sales" button.

https://www.microsoft.com/en-us/microsoft-copilot/blog/copil...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: