I’ll take the China Deluxe instead, actually. I’ve been incredibly pleased with ...

mechagodzilla · 2025-01-31T19:26:17 1738351577

Being able to see the thinking trace in R1 is so useful, as you can go back and see if it's getting stuck, making a wrong assumption, missing data, etc. To me that makes it materially more useful than the OpenAI reasoning models, which seem impressive, but are much harder to inspect/debug.

thot_experiment · 2025-01-31T19:32:36 1738351956

Running it locally lets you INTERJECT IN IT'S THINKING IN REALTIME and I cannot stress enough how useful that is.

thenameless7741 · 2025-01-31T20:14:13 1738354453

Interesting.. In the official API [1], there's no way to prefill the reasoning_content:

> Please note that if the reasoning_content field is included in the sequence of input messages, the API will return a 400 error. Therefore, you should remove the reasoning_content field from the API response before making the API request

So the best I can do is pass the reasoning as part of the context (which means starting over from the beginning).

[1] https://api-docs.deepseek.com/guides/reasoning_model

Gooblebrai · 2025-01-31T19:37:51 1738352271

You mean it reacts to you writing something while it's thinking of that you can stop it while it's thinking?

hmottestad · 2025-01-31T19:41:35 1738352495

You can stop it at any time, then modify what it's written so far...then press continue and let it continue thinking and answering.

thot_experiment · 2025-01-31T19:42:18 1738352538

Fundamentally the UI is up to you, I have a "typing-pauses-inference-and-starts-gaslighting" feature in my homebrew frontend, but in OpenWebUI/Sillytavern you can just pause it and edit the chain of thought and then have it continue from the edit.

Gracana · 2025-01-31T19:48:26 1738352906

That's a great idea. In your frontend, do you write in the same text entry field as the bot? I use oobabooga/text-generation-webui and I findit's a little awkward to edit the bot responses.

thot_experiment · 2025-01-31T22:39:50 1738363190

No, but the chat divs are all contenteditable.

Gracana · 2025-01-31T22:48:04 1738363684

Oh! That is an excellent solution. I wish it was that easy in every UI.

thot_experiment · 2025-01-31T23:14:14 1738365254

Thanks, for what it's worth unless you particularly need to use exl2 ollama works great for local inference and you can prompt together a half decent chat UI for yourself in a matter of minutes these days which gives you full control over everything. I also lean a lot on https://www.npmjs.com/package/amallo which is a api wrapper i wrote for ollama which makes this sort of hacking very very easy. (not that the default lib is bad, i just didn't like the ergonomics)

amarcheschi · 2025-01-31T19:33:22 1738352002

Oh this is so cool

bn-l · 2025-01-31T21:50:03 1738360203

How are you running it locally??

thot_experiment · 2025-01-31T22:38:46 1738363126

I am running a 4bit imatrix quant of the 70b distill with quantized context. It fits in the 43gb of vram I have.

c-fe · 2025-01-31T19:33:39 1738352019

I would actually love if it would just ask me simple questions (just yes/no) when its thinking about something i wasnt clear about and i could help it this way, its a bit sad seeing it write out the assumption and then take the wrong conclusion

thot_experiment · 2025-01-31T19:36:28 1738352188

You can run it locally, pause it when it thinks wrong and correct it's chain of thought.

c-fe · 2025-01-31T19:37:50 1738352270

Oh wow I did not know and dont have the hardware to run it locally unfortunately

thot_experiment · 2025-01-31T19:44:19 1738352659

You probably have the hardware to run the smallest distill, it runs even on my ancient laptop. It's not very smart but it still does the CoT and you can have fun editing it.

viraptor · 2025-01-31T19:51:22 1738353082

You can add that to the prompt. If you're running into those situation with vague assumption, ask it to provide either the answer or questions to provide any useful missing information.

orbital-decay · 2025-01-31T23:21:14 1738365674

It's almost like watching a stoned centipede having a panic attack about moving its legs. It also makes it obvious that these models (not just R1 I suppose) need to learn some kind of priority estimation to stop overthinking irrelevant issues and leave them to the normal token prediction, while focusing on the stuff that matters.

Nevertheless, R1's reasoning chains are already shorter in tokens than o1's while having similar results, and apparently o3-mini's too.

czk · 2025-01-31T19:52:51 1738353171

the fact that openai hides the reasoning tokens from us to begin with shows that what they are doing behind the scenes isnt all that impressive, and likely easily cloned (r1)

would be nice if they made them visible now

jazzyjackson · 2025-01-31T19:33:13 1738351993

Using R1 with Perplexity has impressed me in a way that none of the previous models have, and I can't even figure out if it's actually R1, seems likely that its a 70B-llama distillation since that's what AWS offers on Bedrock but from what I can find Perplexity does have their own H100 cluster through Amazon so it's feasible they could be hosting the real thing? But I feel like they would brag about that achievement instead of being coy and simply labeling "Deepseek R1 - Hosted in US"

coder543 · 2025-01-31T22:17:04 1738361824

> seems likely that its a 70B-llama distillation since that's what AWS offers on Bedrock

I think you misread something. AWS mainly offers the full size model on Bedrock: https://aws.amazon.com/blogs/aws/deepseek-r1-models-now-avai...

They talk about how to import the distilled models and deploy those if you want, but AWS does not appear to be officially supporting those.

jazzyjackson · 2025-01-31T23:52:00 1738367520

Aha! Thanks that's what I was looking for, I ended up on the blog of how to import custom models, including deepseek distills

https://aws.amazon.com/blogs/machine-learning/deploy-deepsee...

Szpadel · 2025-01-31T20:07:47 1738354067

I played with their model, and I want able to make him follow any instructions, it looked like it just reads first message and ignore rest of the conversation. not sure if they is bug with oupenrouter or model, but I was highly disappointed.

from way how it thinks/responds looks like it's one of destinations , likely llama one I also suspect that many of free/cheap providers also serve llama instead of real R1

jazzyjackson · 2025-01-31T23:50:33 1738367433

I did notice it switched models on me once after the first message! Have to make sure the "Pro" dropdown is selected R1 each message. I've had a detailed back and forth where I pasted python tracebacks to have R1 rewrite the code and came away very impressed [0]. Unfortunately saved conversations don't retain the thought-process so you can't see how it debugged its own error where numpy and pandas weren't playing along. I got my result of 283 zip codes that cover most of the 50 states with a hundred mile radius from each zip, plus a script to draw a map of the result [1]. (Later R1 helped me write a script to crawl dealership addresses using this list of zips and a "locate dealers" JSON endpoint left open)

[0] https://www.perplexity.ai/search/how-can-i-construct-a-list-...

[1] https://imgur.com/BhPMCfO

leovander · 2025-01-31T19:31:17 1738351877

I am running the 7B distilled version locally. I asked it to create a skeleton MEAN project. Everything was great but then it started to generate the front-end and I noticed the file extension (.tsx) and then saw react getting imported.

I gave the same prompt to sonnet 3.5 and not a single hiccup.

Maybe not an indication that Deepseek is worse/bad (I am using a distilled version), but moreso speaks to much react/nextjs is out in the world influencing the front-end code that is referenced.

satvikpendem · 2025-01-31T19:51:43 1738353103

You are not actually running DeepSeek, those distilled models have nothing to do with DeepSeek itself and are just finetuned on DeepSeek responses.

dghlsakjg · 2025-01-31T21:28:15 1738358895

They were finetuned by Deepseek from what I can tell.

rafaquintanilha · 2025-01-31T19:33:12 1738351992

You know you are running an extremely nerfed version of the model, right?

leovander · 2025-01-31T19:35:08 1738352108

I did update my comment, but said that I am using the distilled version, so yes?

cbg0 · 2025-01-31T19:40:16 1738352416

Even the full model scores below Claude on livebench so a distilled version will likely be even worse.

rsanek · 2025-01-31T19:50:52 1738353052

Based on the leaderboard R1 is significantly better than Claude? https://livebench.ai/#/

cbg0 · 2025-01-31T21:05:36 1738357536

Not at coding.

thefourthchime · 2025-01-31T19:36:26 1738352186

I've seen it get into long 5 minute chains of thought where it gets totally confused.

anon373839 · 2025-01-31T20:26:49 1738355209

Agreed. These locked-down, proprietary models do not interest me. And I certainly am not building product with them - being shackled to a specific provider is a needless business risk.

bushbaba · 2025-01-31T19:44:30 1738352670

I did a blind test and still prefer Gemini, Claude, and OpenAI to deepseek.

coliveira · 2025-01-31T19:33:50 1738352030

Yes, it is a great product, especially for coding tasks.

istjohn · 2025-01-31T19:28:18 1738351698

I recently tried Gemini-1.5-Pro for the first time. It was clearly better than DeepSeek or any of the OpenAI models available to Plus subscribers.

esafak · 2025-01-31T22:54:00 1738364040

Try https://deepmind.google/technologies/gemini/flash-thinking/

amarcheschi · 2025-01-31T19:32:49 1738351969

Seeing the cot can provide some insights on what's happening in his "mind" and that alone it's quite worth it imho

wg0 · 2025-01-31T19:44:50 1738352690

Sometimes its thinking is more useful than the actual output.

xeckr · 2025-01-31T19:32:01 1738351921

Have you tried seeing what happens when you speak to it about topics which are considered politically sensitive in the PRC?

thot_experiment · 2025-01-31T19:34:51 1738352091

R1 (70B-distill) itself is very uncensored, will give you full account of tiannanmen square from vague prompts. Asking R1 "what significant things happened in china in 1989" had it volunteering that "the death toll was in the hundreds or thousands and the exact number remains disputed to this day". The only thing that's censored is the web interface.

GoatInGrey · 2025-01-31T19:41:16 1738352476

When asking it about the concept of human rights and the various forms in which it manifests (i.e. demographic equality under the law). I get a mixture of mundane nuance and bizarre answers that Xi Jingping himself could have written. With references to unity and the importance of social harmony over the "freedoms of the few".

This tracks when considering that the model was trained on western model outputs and then tuned post-training to (poorly) align it with Chinese values.

thot_experiment · 2025-01-31T19:52:27 1738353147

I definitely am not getting that, perhaps the 671b model is notably worse than the 70b llama distill in this respect. 70b seemed pretty happy to talk about the ethnic cleansing of the Uyghurs in Xinjiang by the CCP and Palestinians in Gaza by Israel, it did some both-sides ing but it generally seemed to provide a balanced-ish viewpoint. At least I think it provided a viewpoint that comports with my best guess of what the average person globally would consider balanced.

nullc · 2025-02-01T06:27:17 1738391237

My favorite experience with the 70b distill was to ask it why communism consistently resulted in mass murder. It gave an immediate boilerplate response saying it doesn't and glorifying the Chinese communist party, then went into think mode and talked itself into the position that communism has, in fact, consistently resulted in resulted in mass murder.

They have under utilized the chain of thought in their resoning, it ought to be thinking something like "I need to be careful to not say anything that could bring embarrassment to the party"..

but perhaps the online versions do actually preload the reasoning this way. :P

leovander · 2025-01-31T19:34:11 1738352051

You can get around it based on how you ask the question. If you follow whatever X/reddit posts you might have seen for the most part, yes, you get the thinking stream to immediately stop and get the safety message.