Hacker Newsnew | past | comments | ask | show | jobs | submit | pplonski86's commentslogin

I have dual boot on decent laptop, doing nothing, on windows fan is always on, computing something? On Linux it is just silent

What if AI starts to have sense of craft? we just miss the verify and critique models, that will tell other models what looks good

thank you for sharing, is there a new container for each code run, or it stays the same for whole conversation?

It’s maintained for the conversation. You can ask it for details like this.

There are so many models, is there any website with list of all of them and comparison of performance on different tasks?

The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.

But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.


I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

> honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

Me too!

> I like Gemini Pro's UI over Claude so much

This I don't understand. I mean, I don't see a lot of difference in both UIs. Quite the opposite, apart from some animations, round corners and color gradings, they seem to look very alike, no?


Y'know I ended up buying Kimi's moderato plan which is 19$ but they had this unique idea where you can talk to a bot and they could reduce the price

I made it reduce the price of first month to 1.49$ (It could go to 0.99$ and my frugal mind wanted it haha but I just couldn't have it do that lol)

Anyways, afterwards for privacy purposes/( I am a minor so don't have a card), ended up going to g2a to get a 10$ Visa gift card essentially and used it. (I had to pay a 1$ extra but sure)

Installed kimi code on my mac and trying it out. Honestly, I am kind of liking it.

My internal benchmark is creating pomodoro apps in golang web... Gemini 3 pro has nailed it, I just tried the kimi version and it does have some bugs but it feels like it added more features.

Gonna have to try it out for a month.

I mean I just wish it was this cheap for the whole year :< (As I could then move from, say using the completely free models)

Gonna have to try it out more!



There are many lists, but I find all of them outdated or containing wrong information or missing the actual benchmarks I'm looking for.

I was thinking, that maybe it's better to make my own benchmarks with the questions/things I'm interested in, and whenever a new model comes out run those tests with that model using open-router.


Thank you! Exactly what I was looking for

Maybe point 9, trust but verify, should be extended to AI coworkers as well. I would love to have tools to verify AI code by quantity.

Whatever human that is in charge of the chat bots is your coworker. That person that is responsible for the output of the bots is the one that you would trust but verify with.

Are you aware of any lightweight sandboxes for Python? not browser based

You mean for running unsafe Python code?

I'm on a multi-year quest to answer that question!

The best I've found is running Python code inside Pyodide in WASM in Node.js or Deno accessed from Python via a subprocess, which is a wildly convoluted way to go but does appear to work! https://til.simonwillison.net/deno/pyodide-sandbox

Here's a related recent experimental library which does something similar but with JavaScript rather than Python as the unsafe language, again via Deno in a subprocess: https://github.com/simonw/denobox

I've also experimented with using wasmtime instead of Deno: https://til.simonwillison.net/webassembly/python-in-a-wasm-s...


Stay tuned, we are about to release a new version of Wasmer with WASIX, that allows for things that can't currently be done with Pyodide:

  * Multithreaded support
  * Calling subprocesses
  * Signals
  * Full networking support
  * Support for greenlets (say hi to SQLAlchemy!) :)
It requires a small effort in wasmer-js, but it already works fully on the server! :)

Thank you! With WASM I can’t use all pypi packages and can’t connect to database, that’s why I’m looking for python based solution

In that case you'll need to look at general purpose sandboxes you can run Python in - stuff like Firecracker or Bubblewrap on Linux or sandbox-exec on macOS.

With Wasmer you should be able to use all pypi packages (even the native ones), although we are a bit light on the native packages we support now

very nice comparison! I'd like to see on what examples OCR engines fail


Is codex working well with python notebooks?


Wow! how do you make marketing for so many projects?


It's the most difficult part. In my experience paid ads do not work very well so I am not relying too much on those. I usually use social media with UGC videos created either by me or by content creators. I also reach out on Instagram, even dating apps, to users and pay them to use/promote a product.

Recently I started to use n8n automation to post on Twitter/LinkedIn, however I tend to keep those posts short since they are created with LLM's and do not seem authentic.

As for the SEO part, I usually upload search console extracts into Perplexity deep research and ask for actions on how to improve ranking for different keywords.


Lately I was trying ask LLMs to generate SVG pictures, do you have famous pelican on bike created by flash model?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: