Being able to see the thinking trace in R1 is so useful, as you can go back and see if it's getting stuck, making a wrong assumption, missing data, etc. To me that makes it materially more useful than the OpenAI reasoning models, which seem impressive, but are much harder to inspect/debug.
Interesting.. In the official API [1], there's no way to prefill the reasoning_content:
> Please note that if the reasoning_content field is included in the sequence of input messages, the API will return a 400 error. Therefore, you should remove the reasoning_content field from the API response before making the API request
So the best I can do is pass the reasoning as part of the context (which means starting over from the beginning).
Fundamentally the UI is up to you, I have a "typing-pauses-inference-and-starts-gaslighting" feature in my homebrew frontend, but in OpenWebUI/Sillytavern you can just pause it and edit the chain of thought and then have it continue from the edit.
That's a great idea. In your frontend, do you write in the same text entry field as the bot? I use oobabooga/text-generation-webui and I findit's a little awkward to edit the bot responses.
Thanks, for what it's worth unless you particularly need to use exl2 ollama works great for local inference and you can prompt together a half decent chat UI for yourself in a matter of minutes these days which gives you full control over everything.
I also lean a lot on https://www.npmjs.com/package/amallo which is a api wrapper i wrote for ollama which makes this sort of hacking very very easy. (not that the default lib is bad, i just didn't like the ergonomics)
I would actually love if it would just ask me simple questions (just yes/no) when its thinking about something i wasnt clear about and i could help it this way, its a bit sad seeing it write out the assumption and then take the wrong conclusion
You probably have the hardware to run the smallest distill, it runs even on my ancient laptop. It's not very smart but it still does the CoT and you can have fun editing it.
You can add that to the prompt. If you're running into those situation with vague assumption, ask it to provide either the answer or questions to provide any useful missing information.
It's almost like watching a stoned centipede having a panic attack about moving its legs. It also makes it obvious that these models (not just R1 I suppose) need to learn some kind of priority estimation to stop overthinking irrelevant issues and leave them to the normal token prediction, while focusing on the stuff that matters.
Nevertheless, R1's reasoning chains are already shorter in tokens than o1's while having similar results, and apparently o3-mini's too.
the fact that openai hides the reasoning tokens from us to begin with shows that what they are doing behind the scenes isnt all that impressive, and likely easily cloned (r1)
Using R1 with Perplexity has impressed me in a way that none of the previous models have, and I can't even figure out if it's actually R1, seems likely that its a 70B-llama distillation since that's what AWS offers on Bedrock but from what I can find Perplexity does have their own H100 cluster through Amazon so it's feasible they could be hosting the real thing? But I feel like they would brag about that achievement instead of being coy and simply labeling "Deepseek R1 - Hosted in US"
I played with their model, and I want able to make him follow any instructions, it looked like it just reads first message and ignore rest of the conversation. not sure if they is bug with oupenrouter or model, but I was highly disappointed.
from way how it thinks/responds looks like it's one of destinations , likely llama one
I also suspect that many of free/cheap providers also serve llama instead of real R1
I did notice it switched models on me once after the first message! Have to make sure the "Pro" dropdown is selected R1 each message. I've had a detailed back and forth where I pasted python tracebacks to have R1 rewrite the code and came away very impressed [0]. Unfortunately saved conversations don't retain the thought-process so you can't see how it debugged its own error where numpy and pandas weren't playing along. I got my result of 283 zip codes that cover most of the 50 states with a hundred mile radius from each zip, plus a script to draw a map of the result [1]. (Later R1 helped me write a script to crawl dealership addresses using this list of zips and a "locate dealers" JSON endpoint left open)
I am running the 7B distilled version locally. I asked it to create a skeleton MEAN project. Everything was great but then it started to generate the front-end and I noticed the file extension (.tsx) and then saw react getting imported.
I gave the same prompt to sonnet 3.5 and not a single hiccup.
Maybe not an indication that Deepseek is worse/bad (I am using a distilled version), but moreso speaks to much react/nextjs is out in the world influencing the front-end code that is referenced.
Agreed. These locked-down, proprietary models do not interest me. And I certainly am not building product with them - being shackled to a specific provider is a needless business risk.
R1 (70B-distill) itself is very uncensored, will give you full account of tiannanmen square from vague prompts. Asking R1 "what significant things happened in china in 1989" had it volunteering that "the death toll was in the hundreds or thousands and the exact number remains disputed to this day". The only thing that's censored is the web interface.
When asking it about the concept of human rights and the various forms in which it manifests (i.e. demographic equality under the law). I get a mixture of mundane nuance and bizarre answers that Xi Jingping himself could have written. With references to unity and the importance of social harmony over the "freedoms of the few".
This tracks when considering that the model was trained on western model outputs and then tuned post-training to (poorly) align it with Chinese values.
I definitely am not getting that, perhaps the 671b model is notably worse than the 70b llama distill in this respect. 70b seemed pretty happy to talk about the ethnic cleansing of the Uyghurs in Xinjiang by the CCP and Palestinians in Gaza by Israel, it did some both-sides ing but it generally seemed to provide a balanced-ish viewpoint. At least I think it provided a viewpoint that comports with my best guess of what the average person globally would consider balanced.
My favorite experience with the 70b distill was to ask it why communism consistently resulted in mass murder. It gave an immediate boilerplate response saying it doesn't and glorifying the Chinese communist party, then went into think mode and talked itself into the position that communism has, in fact, consistently resulted in resulted in mass murder.
They have under utilized the chain of thought in their resoning, it ought to be thinking something like "I need to be careful to not say anything that could bring embarrassment to the party"..
but perhaps the online versions do actually preload the reasoning this way. :P
You can get around it based on how you ask the question. If you follow whatever X/reddit posts you might have seen for the most part, yes, you get the thinking stream to immediately stop and get the safety message.
I’ve been incredibly pleased with DeepSeek this past week. Wonderful product, I love seeing its brain when it’s thinking.