Hacker Newsnew | past | comments | ask | show | jobs | submit | cgearhart's favoriteslogin

Just wanted to advertise that the EFF recently released an open source tool for detecting cell-site simulators. The hardware is like $20 and it's pretty easy to setup yourself. Worth having around to stay aware of what's out there, especially if you live in one of the places recently targeted by the administration.

https://github.com/EFForg/rayhunter/


All user facing LLMs go through Reinforcement Learning. Contrary to popular belief, RL's _primary_ purpose isn't to "align" them to make them "safe." It's to make them actually usable.

LLMs that haven't gone through RL are useless to users. They are very unreliable, and will frequently go off the rails spewing garbage, going into repetition loops, etc.

RL learning involves training the models on entire responses, not token-by-token loss (1). This makes them orders of magnitude more reliable (2). It forces them to consider what they're going to write. The obvious conclusion is that they plan (3). Hence why the myth that LLMs are strictly next token prediction machines is so unhelpful and poisonous to discuss.

The models still _generate_ response token-by-token, but they pick tokens _not_ based on tokens that maximize probabilities at each token. Rather they learn to pick tokens that maximize probabilities of the _entire response_.

(1) Slight nuance: All RL schemes for LLMs have to break the reward down into token-by-token losses. But those losses are based on a "whole response reward" or some combination of rewards.

(2) Raw LLMs go haywire roughly 1 in 10 times, varying depending on context. Some tasks make them go haywire almost every time, other tasks are more reliable. RL'd LLMs are reliable on the order of 1 in 10000 errors or better.

(3) It's _possible_ that they don't learn to plan through this scheme. There are alternative solutions that don't involve planning ahead. So Anthropic's research here is very important and useful.

P.S. I should point out that many researchers get this wrong too, or at least haven't fully internalized it. The lack of truly understanding the purpose of RL is why models like Qwen, Deepseek, Mistral, etc are all so unreliable and unusable by real companies compared to OpenAI, Google, and Anthropic's models.

This understanding that even the most basic RL takes LLMs from useless to useful then leads to the obvious conclusion: what if we used more complicated RL? And guess what, more complicated RL led to reasoning models. Hmm, I wonder what the next step is?


> LLMs that haven't gone through RL are useless to users. They are very unreliable, and will frequently go off the rails spewing garbage, going into repetition loops, etc...RL learning involves training the models on entire responses, not token-by-token loss (1).

Yes. For those who want a visual explanation, I have a video where I walk through this process including what some of the training examples look like: https://www.youtube.com/watch?v=DE6WpzsSvgU&t=320s


You might find this helpful for prioritizing which knobs to turn first https://github.com/google-research/tuning_playbook

If anyone's interested, I made Colab notebooks with free GPUs for both GRPO (the algo DeepSeek used) to train a reasoning model from scratch, and also general finetuning, which the Berkeley team employed!

GRPO notebook for Llama 3.1 8B: https://colab.research.google.com/github/unslothai/notebooks...

General finetuning notebook: https://colab.research.google.com/github/unslothai/notebooks...

The Berkeley team's 17K dataset: https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k Hugging Face also released a 220K dataset: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k


"A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system."

– John Gall (1975) Systemantics: How Systems Really Work and How They Fail

https://en.wikipedia.org/wiki/John_Gall_(author)#Gall's_law


I've been a happy user of frigate (https://frigate.video) with fully local isolated setup: multiple poe cameras on a dedicated network, coral to supplement on-camera recognition, HA+Prometheus for alerts and smarts.

Excited to see another project, especially in Rust(not for the memes; python env management has bit me a few times when hacking).

One major gripe with frigate that I have is the way it treats detection events as pointers to video files. This makes replicating events off site a major pain.


I took a skim through it in the morning - I like the LoRA Learns Less and Forgets Less paper more https://openreview.net/forum?id=aloEru2qCG - it has much more signal in a few pages - also the original QLoRA paper from Dettmers https://arxiv.org/abs/2305.14314 has so many more important morsels.

But all in all, the review is a reasonable "manual" I guess. I would have liked maybe more instructive comprehensive practical examples, and maybe more mention of other OSS packages for finetuning :))


I've been thinking recently about how things like Project Euler, LeetCode, and to a bit less of an extent, Advent of Code, are so heavily focused on making clever use of math, data structures and algorithms, that it makes them suboptimal as a tools for getting familiar with a new programming language.

I know that that critique isn't new to anyone but it makes me think about how it would be cool if there were a code puzzler site that is specifically geared towards little self-contained tasks that are more to do with forcing you to get familiar with the common everyday tasks of software development.

Some example puzzlers could be like:

- open an http server on port 80

- open a file and write this data to it

- write temporary files to a location, deleting them when process exits

- query a database

- deal with such and such error scenario

- find a way to test this component

- bundle this code as an executable

- sanitize user input here

- make this behavior configurable

- take the config from environment variable variable and/or config file and/or arguments

- parse this data file

You do get a bit of parsing and file handling with Advent of Code but imagine a series of dozens of small problems that grill you on every corner of the python filesystem api. Would be a lot less dry than reading docs cover to cover.


Not sure about the website, but the guy was probably Edward Tufte from the description. He and Stephen Few were kind of the OGs of data visualization for the modern era.

https://www.edwardtufte.com/


Tangent on representing recipes - Always loved the "Cooking for Engineers" guy's recipe notation (scroll down to just above the comments), they're so clever and concise:

https://www.cookingforengineers.com/recipe/168/Pecan-Coffee-...


I have a 4-quardrant way of thinking about this that's similar.

- Y-axis is "drive"

- X-axis is "aptitude"

- low drive + low aptitude: never hire

- low drive + high aptitude: hire for targeted use cases where you need expertise

- high drive + low aptitude: hire, train, and foster aptitude growth

- high drive + high aptitude: hire on the spot

There's an indirect way of testing for this which is to test for curiosity and lack of ego. My experience has been that candidates with high curiosity tend to have low ego (they know what they don't know and are curious to learn). These candidates make great hires because you can teach them anything.

Wrote about this a little bit here (with a handy diagram): https://charliedigital.com/2020/01/15/effective-hiring-for-s...


The Architecture of Open Source Applications has some great lessens learned. It's also free: https://aosabook.org/en/ .

I don't think there are lots of examples of the distributed systems discussion though, since most of the writing is about how to structure the source code of a single program. That's not unrelated, since most of what you're asking about is how different interfaces or modules should interact. In a distributed system, that happens over RPCs, but it can just as easily happen within a program with separate threads.

Note: they've added some other books, which show up at the top of the page, but I've only read select chapters from AOSA Volume 1.


Ehhhhh yeah kinda? Yes, you should spend most of your time on "API Semantics" (what does it look like using this code?), and you should spend a lot less time on "how are the tests?"

but, for example,

Writing good tests massively contributes to good implementation details and API semantics; and test code is also code that needs to be reviewed under more-or-less the same criteria as the rest.

Also - documentation (or, legibility, if you're onboard with self-documenting code) can be more important than either implementation details, and even API semantics, as it can define whether the entire work is useable or maintainable.

I might say instead:

- What will it be like reviewing this code? (style, test coverage, etc - do I have to worry about spotting typos, and other stupid stuff?)

- What will be like debugging this code? (patterns, logging, etc - which might handled by a framework)

- What will it be like altering this code? (documentation/legibility, implementation details, etc - when the business needs change or grow)

- What will it be like using this code? (API semantics, API docs - when I go to build something on top of this)

And then yeah; the top should be entirely automated, and you should (generally) spend most of your time on the bottom.


On the one hand, this looks really useful.

On the other hand:

> There are various forms of attention / self-attention, Transformer (Vaswani et al., 2017) relies on the scaled dot-product attention: given a query matrix , a key matrix and a value matrix , the output is a weighted sum of the value vectors, where the weight assigned to each value slot is determined by the dot-product of the query with the corresponding key

There HAS to be a better way of communicating this stuff. I'm honestly not even sure where to start decoding and explaining that paragraph.

We really need someone with the explanatory skills of https://jvns.ca/ to start helping people understand this space.


https://yunohost.org is a much more mature project, with a larger app ecosystem. Give it a try, you won’t be disappointed.

If you're fuzzily looking up members in a fixed corpus, you can also employ the trick from https://github.com/wolfgarbe/SymSpell, which is essentially just to, for each string in your corpus, enumerate all of the variants of that string with one letter removed, and put both the original string and each variant in a hashtable as a key, with the original string as the value. To do a lookup, you do the same enumeration (original query plus each variant with one letter dropped), and look all of them up in your hashtable. The values you get out of that are a list of all of the strings in your corpus at edit distances 0 or 1, and potentially some at edit distance 2; you can do a subsequent Levenshtein calculation on each to weed out the distance-2 strings, but you only have to do it on this massively reduced set rather than on your whole corpus.

So like, to index a string "hello", you'd add all of {"hello":"hello","ello":"hello","hllo":"hello","helo":"hello","hell":"hello"} into your table, and for the query "gello", you'd look up all of ["gello", "ello", "gllo", "gelo", "gell"], and get a match on "ello"->"hello", then do a Levenshtein calculation dist("gello","hello") to confirm it's within ED=1 (it is), and be done. (Bonus: the same method works with Damerau-Levenshtein distance as well.)


If you want to dig even deeper into diffeq, Professor Leonard has been publishing his series[1] on the topic over the past few months. He's up to lesson #31 so far.. not sure how many total the series is supposed to be.

[1]: https://www.youtube.com/playlist?list=PLDesaqWTN6ESPaHy2QUKV...


I'm glad to see a classic article linked from AITopics. We have a big collection there, as well as articles from NeurIPS, AAAI conferences, AI Journal, and news sources, amounting to 200k+ items, all fully classified into AI topics. Have a look!

https://aitopics.org/


Great read. I have 30 years combined experience in symbolic AI and neural networks and my current day job is managing a deep learning team. I could not agree more with: “”I think it is far more likely that the two — deep learning and symbol-manipulation-will co-exist, with deep learning handling many aspects of perceptual classification, but symbol-manipulation playing a vital role in reasoning about abstract knowledge.””

My personal time projects are mostly combining symbolic AI and deep learning, but I am still trying to find a non-Python solution, including Haskell TensorFlow bindings, Armed Bear Common Lisp with DL4J, and exporting trained Keras models to a Racket environment - all plausible hacking environments but none feel ‘just right.’ If you are working on the same ideas please get in touch with me.


These guys have done what looks like an impressive amount of work... but I was disappointed to see there's no mention of generative language modeling at all and only a very brief mention of key advances in image generation over the past few years.

Examples of generative language modeling ignored by this paper (not even mentioned):

* Unsupervised Transformer - https://blog.openai.com/language-unsupervised/

* ELMo - https://arxiv.org/abs/1802.05365

* ULMFit - https://arxiv.org/abs/1801.06146

Examples of generative image modeling ignored by this paper (some are mentioned only in passing):

* Glow - https://blog.openai.com/glow/

* RealNVP - https://arxiv.org/abs/1605.08803

* NICE - https://arxiv.org/abs/1410.8516

* Pixel CNN - https://arxiv.org/abs/1606.05328 (and its cousin PixelRNN - https://arxiv.org/abs/1601.06759)


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: