Hacker News new | past | comments | ask | show | jobs | submit login

I’ll take the China Deluxe instead, actually.

I’ve been incredibly pleased with DeepSeek this past week. Wonderful product, I love seeing its brain when it’s thinking.




Being able to see the thinking trace in R1 is so useful, as you can go back and see if it's getting stuck, making a wrong assumption, missing data, etc. To me that makes it materially more useful than the OpenAI reasoning models, which seem impressive, but are much harder to inspect/debug.


Running it locally lets you INTERJECT IN IT'S THINKING IN REALTIME and I cannot stress enough how useful that is.


Interesting.. In the official API [1], there's no way to prefill the reasoning_content:

> Please note that if the reasoning_content field is included in the sequence of input messages, the API will return a 400 error. Therefore, you should remove the reasoning_content field from the API response before making the API request

So the best I can do is pass the reasoning as part of the context (which means starting over from the beginning).

[1] https://api-docs.deepseek.com/guides/reasoning_model


You mean it reacts to you writing something while it's thinking of that you can stop it while it's thinking?


You can stop it at any time, then modify what it's written so far...then press continue and let it continue thinking and answering.


Fundamentally the UI is up to you, I have a "typing-pauses-inference-and-starts-gaslighting" feature in my homebrew frontend, but in OpenWebUI/Sillytavern you can just pause it and edit the chain of thought and then have it continue from the edit.


That's a great idea. In your frontend, do you write in the same text entry field as the bot? I use oobabooga/text-generation-webui and I findit's a little awkward to edit the bot responses.


No, but the chat divs are all contenteditable.


Oh! That is an excellent solution. I wish it was that easy in every UI.


Thanks, for what it's worth unless you particularly need to use exl2 ollama works great for local inference and you can prompt together a half decent chat UI for yourself in a matter of minutes these days which gives you full control over everything. I also lean a lot on https://www.npmjs.com/package/amallo which is a api wrapper i wrote for ollama which makes this sort of hacking very very easy. (not that the default lib is bad, i just didn't like the ergonomics)


Oh this is so cool


How are you running it locally??


I am running a 4bit imatrix quant of the 70b distill with quantized context. It fits in the 43gb of vram I have.


I would actually love if it would just ask me simple questions (just yes/no) when its thinking about something i wasnt clear about and i could help it this way, its a bit sad seeing it write out the assumption and then take the wrong conclusion


You can run it locally, pause it when it thinks wrong and correct it's chain of thought.


Oh wow I did not know and dont have the hardware to run it locally unfortunately


You probably have the hardware to run the smallest distill, it runs even on my ancient laptop. It's not very smart but it still does the CoT and you can have fun editing it.


You can add that to the prompt. If you're running into those situation with vague assumption, ask it to provide either the answer or questions to provide any useful missing information.


It's almost like watching a stoned centipede having a panic attack about moving its legs. It also makes it obvious that these models (not just R1 I suppose) need to learn some kind of priority estimation to stop overthinking irrelevant issues and leave them to the normal token prediction, while focusing on the stuff that matters.

Nevertheless, R1's reasoning chains are already shorter in tokens than o1's while having similar results, and apparently o3-mini's too.


the fact that openai hides the reasoning tokens from us to begin with shows that what they are doing behind the scenes isnt all that impressive, and likely easily cloned (r1)

would be nice if they made them visible now


Using R1 with Perplexity has impressed me in a way that none of the previous models have, and I can't even figure out if it's actually R1, seems likely that its a 70B-llama distillation since that's what AWS offers on Bedrock but from what I can find Perplexity does have their own H100 cluster through Amazon so it's feasible they could be hosting the real thing? But I feel like they would brag about that achievement instead of being coy and simply labeling "Deepseek R1 - Hosted in US"


> seems likely that its a 70B-llama distillation since that's what AWS offers on Bedrock

I think you misread something. AWS mainly offers the full size model on Bedrock: https://aws.amazon.com/blogs/aws/deepseek-r1-models-now-avai...

They talk about how to import the distilled models and deploy those if you want, but AWS does not appear to be officially supporting those.


Aha! Thanks that's what I was looking for, I ended up on the blog of how to import custom models, including deepseek distills

https://aws.amazon.com/blogs/machine-learning/deploy-deepsee...


I played with their model, and I want able to make him follow any instructions, it looked like it just reads first message and ignore rest of the conversation. not sure if they is bug with oupenrouter or model, but I was highly disappointed.

from way how it thinks/responds looks like it's one of destinations , likely llama one I also suspect that many of free/cheap providers also serve llama instead of real R1


I did notice it switched models on me once after the first message! Have to make sure the "Pro" dropdown is selected R1 each message. I've had a detailed back and forth where I pasted python tracebacks to have R1 rewrite the code and came away very impressed [0]. Unfortunately saved conversations don't retain the thought-process so you can't see how it debugged its own error where numpy and pandas weren't playing along. I got my result of 283 zip codes that cover most of the 50 states with a hundred mile radius from each zip, plus a script to draw a map of the result [1]. (Later R1 helped me write a script to crawl dealership addresses using this list of zips and a "locate dealers" JSON endpoint left open)

[0] https://www.perplexity.ai/search/how-can-i-construct-a-list-...

[1] https://imgur.com/BhPMCfO


I am running the 7B distilled version locally. I asked it to create a skeleton MEAN project. Everything was great but then it started to generate the front-end and I noticed the file extension (.tsx) and then saw react getting imported.

I gave the same prompt to sonnet 3.5 and not a single hiccup.

Maybe not an indication that Deepseek is worse/bad (I am using a distilled version), but moreso speaks to much react/nextjs is out in the world influencing the front-end code that is referenced.


You are not actually running DeepSeek, those distilled models have nothing to do with DeepSeek itself and are just finetuned on DeepSeek responses.


They were finetuned by Deepseek from what I can tell.


You know you are running an extremely nerfed version of the model, right?


I did update my comment, but said that I am using the distilled version, so yes?


Even the full model scores below Claude on livebench so a distilled version will likely be even worse.


Based on the leaderboard R1 is significantly better than Claude? https://livebench.ai/#/


Not at coding.


I've seen it get into long 5 minute chains of thought where it gets totally confused.


Agreed. These locked-down, proprietary models do not interest me. And I certainly am not building product with them - being shackled to a specific provider is a needless business risk.


I did a blind test and still prefer Gemini, Claude, and OpenAI to deepseek.


Yes, it is a great product, especially for coding tasks.


I recently tried Gemini-1.5-Pro for the first time. It was clearly better than DeepSeek or any of the OpenAI models available to Plus subscribers.



Seeing the cot can provide some insights on what's happening in his "mind" and that alone it's quite worth it imho


Sometimes its thinking is more useful than the actual output.


Have you tried seeing what happens when you speak to it about topics which are considered politically sensitive in the PRC?


R1 (70B-distill) itself is very uncensored, will give you full account of tiannanmen square from vague prompts. Asking R1 "what significant things happened in china in 1989" had it volunteering that "the death toll was in the hundreds or thousands and the exact number remains disputed to this day". The only thing that's censored is the web interface.


When asking it about the concept of human rights and the various forms in which it manifests (i.e. demographic equality under the law). I get a mixture of mundane nuance and bizarre answers that Xi Jingping himself could have written. With references to unity and the importance of social harmony over the "freedoms of the few".

This tracks when considering that the model was trained on western model outputs and then tuned post-training to (poorly) align it with Chinese values.


I definitely am not getting that, perhaps the 671b model is notably worse than the 70b llama distill in this respect. 70b seemed pretty happy to talk about the ethnic cleansing of the Uyghurs in Xinjiang by the CCP and Palestinians in Gaza by Israel, it did some both-sides ing but it generally seemed to provide a balanced-ish viewpoint. At least I think it provided a viewpoint that comports with my best guess of what the average person globally would consider balanced.


My favorite experience with the 70b distill was to ask it why communism consistently resulted in mass murder. It gave an immediate boilerplate response saying it doesn't and glorifying the Chinese communist party, then went into think mode and talked itself into the position that communism has, in fact, consistently resulted in resulted in mass murder.

They have under utilized the chain of thought in their resoning, it ought to be thinking something like "I need to be careful to not say anything that could bring embarrassment to the party"..

but perhaps the online versions do actually preload the reasoning this way. :P


You can get around it based on how you ask the question. If you follow whatever X/reddit posts you might have seen for the most part, yes, you get the thinking stream to immediately stop and get the safety message.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: