I get the same feeling that I'm "not being productive" while playing video games, watching tv, etc that seems to kill any enjoyment from doing these things.
For me learning piano has been a great alternative to programming in the off hours (typing is quite transferrable too!). Highly recommend if you're like me on screens all day.
I usually am a huge fan of “copilot” tools (I use cursor, etc) and Claude has always been my go to.
But Sonnet 3.7 actually seems dangerous to me, it seems it’s been RL’d _way_ too hard into producing code that won’t crash — to the point where it will go completely against the instructions to sneak in workarounds (e.g. returning random data when a function fails!). Claude Code just makes this even worse by giving very little oversight when it makes these “errors”
this is a huge issue for me as well. It just kind of obfuscates errors and masks the original intent, rather than diagnosing and fixing the issue. 3.5 seemed to be more clear about what it's doing and when things broke at least it didn't seem to be trying to hide anything.
I don’t think this hits at the heart of the issue? Even if we can catch AI text with 100% accuracy, any halfway decent student can rewrite it from scratch using o1s ideas in lieu of actually learning.
This is waay more common and just impossible to catch. The only students caught here are those that put no effort in at all
> rewrite it from scratch ... in lieu of actual learning
If one can "rewrite it from scratch" in a way that's actually coherent and gets facts correct, then they learned the material and can write an original paper.
> This is waay more common and just impossible to catch.
It seems a good thing that this is more common and, naturally, it would -- perhaps should, given the topic -- be impossible to catch someone cheating when they're not cheating.
Just another +1 that if you’re going to give vscode a fair shot, it’s much better to go with vscode-neovim than the standard vim extension. You can even map most of your config right over.
Benchmarking for this project is a bit weird, since 1) only linear scans are supported, and 2) it's an "embeddable" vector search tool, so it doesn't make a lot of sense to benchmark against "server" vector databases like qdrant or pinecone.
That being said, ~generally~ I'd say it's faster than using numpy and tools like txtai/chromadb. Faiss and hnswlib (bruteforce) are faster because they store everything in memory and use multiple threads. But for smaller vector indexes, I don't think you'd notice much of a difference. sqlite-vec has some support for SIMD operations, which speeds things up quite a bit, but Faiss still takes the cake.
Author of txtai here - great work with this extension.
I wouldn't consider it a "this or that" decision. While txtai does combine Faiss and SQLite, it could also utilize this extension. The same task was just done for Postgres + pgvector. txtai is not tied to any particular backend components.
Ya I worded this part awkwardly - I was hinting that querying a vector index and joining with metadata with sqlite + sqlite-vec (in a single SQL join) will probably be faster than other methods, like txtai, which do the joining phase in a higher level like Python. Which isn't a fair comparison, especially since txtai can switch to much faster vector stores, but I think is fair for most embedded use-cases.
That being said, txtai offers way more than sqlite-vec, like builtin embedding models and other nice LLM features, so it's definitely apples to oranges.
With this, what DuckDB just added and pgvector, we're seeing a blurring of the lines. Back in 2021, there wasn't a RDBMS that had native vector support. But native vector integration makes it possible for txtai to just run SQL-driven vector queries...exciting times.
I think systems that bet on existing databases eventually catching up (as is txtai's model) vs trying to reinvent the entire database stack will win out.
The most useful feature of llms is how much output you get from such little signal. Just yesterday I created a fairly advanced script from my phone on the bus ride home with chatgpt which was an absolute pleasure. I think multi-prompt conversations don't get nearly as much attention as they should in llm evaluations.
I suppose multi-prompt conversations are just a variation on few-shot prompting. I do agree though, that they don't play a big enough role in eval, but also in the heads of many people. So many capable engineers I now nope out of GPT because the first answer isn't satisfactory, instead of continuing the dialog.
Switched to Firefox a month or two ago, mostly for ublock origin on android, and unlimited history (seriously, why is this not the standard?)
I've tried to like it but honestly it's been painful. MacOS Sonoma seems to have a hover bug, which has been unresolved through the last 3 bug fix updates. Performance is "fine" but seems to lag with many tabs open which was never an issue in chrome (this is on an M2 pro!) PDF reader also seems significantly slower as well. At this point I'm considering going back to chrome.
Unlimited history is nice but I hate how history works out of the box. I might be doing something wrong but ctrl+h pops up a sidebar which shows every single website visited today in no discernible order. I've learned about ctrl+shift+h which is better but even there, the UI is a bit lacking compared to what Chrome has out of the box. Is there anything I can do to improve this?
For me learning piano has been a great alternative to programming in the off hours (typing is quite transferrable too!). Highly recommend if you're like me on screens all day.