More

yaroslavvb · 2026-04-08T05:33:51 1775626431

This was around the time I trained Transformer-XL (outside of OpenAI) with Ben Mann (https://yaroslavvb.medium.com/scaling-transformer-xl-to-128-...) . Originally we wanted to release train and release the weights as a kind of GPT-2.5, but our OpenAI friends pushed us to keep weights closed.

blitzar · 2026-04-08T11:06:09 1775646369

> our OpenAI friends

I would be taking a grudge against those "friends" to my grave.

yaroslavvb · 2026-04-08T14:50:09 1775659809

I don't hold grudge, GPT-2 wasn't that great of a model, so releasing it would be more of a publicity value. But the blog post already had that purpose.

yaroslavvb · 2026-04-08T14:52:28 1775659948

This project gave me motivation to build the deep learning next token prediction integration for JetBrains because I was using PyCharm at the time. (Eventually, it wasn't continued because it was kind of expensive to host)

ieijdd · 2026-04-08T14:15:57 1775657757

Well you all got fooled didn’t you.

Much like how they claimed to be all about open source.

adt · 2026-04-08T08:44:04 1775637844

Same story with Connor Leahy and his GPT-2 clone, though his public articulation of how OpenAI sat him down seems to be glossed over.

"

OpenAI reached out to me almost immediately to talk and they were nothing but respectful and understanding... After making it publicly known what I had done, I was quickly approached by a range of smart people with good arguments. Many of them helped me update my beliefs in light of new evidence...

The day after my announcement, I got to talk to Jack Clark, Alec Radford and Jeff Wu from OpenAI. We had a nice hour long discussion, where I explained where I was coming from, and they helped me to refine my beliefs. They didn’t come in accusing me in any way, they were very clear in saying they wanted to help me gain more important insight into the wider situation. For this open and respectful attitude I will always be grateful. Large entities like OpenAI often seem like behemoths to outsiders, but it was during this chat that it really hit me that they were people just like me, and curious hackers to boot as well.

I quickly began to understand nuances of the situation I wasn’t aware of. OpenAI had a lot more internal discussion than their blog post made it seem. And I found this reassuring. Jack in particular also gave me a lot of valuable information about the possible dangers of the model, and a bit of insight into the workings of governments and intelligence agencies.

After our discussion, I had a lot to think about. But I still wasn’t really convinced to not release. Even some people inside OpenAI were still discussing the not-release policy. So while I definitely had things to consider, I was still mostly set on releasing...

We shouldn’t be angry with OpenAI for what they did. We should applaud them for making a point before it becomes a true problem. Prophylaxis is much better than treatment. I still disagree with some of the things OpenAI did and how they communicated them, but I now understand that sending a message that it is ok, even celebrated, for a lone individual to unilaterally go against reasonable safety concerns of other researchers is not a good message to send. I want to support OpenAI’s message. So, while it might be a small, mostly symbolic gesture, I will not be releasing my model. Some day, someone like me may be in a situation just like mine, but it won’t be GPT2. It might be something much, much more dangerous. And that is the person I am trying to talk to here.

"

https://medium.com/@NPCollapse/the-hacker-learns-to-trust-62...

https://archive.md/1HoGz

yaroslavvb · 2026-04-08T14:45:47 1775659547

Thanks for the share! Didn't realize eleutherai launched around same time

yaroslavvb · 2026-02-08T15:02:50 1770562970

Modern AGI discourse defies the voice of reason

yaroslavvb · 2026-01-04T12:00:33 1767528033

I used to look at all TensorFlow questions when I was on the TensorFlow team (https://stackoverflow.com/tags/tensorflow/info). Unclear where people go to interact with their users now....Reddit? But the tone on Reddit is kind of negative/complainy

yaroslavvb · 2025-09-14T15:12:18 1757862738

Balancing protection against water bills - https://www.epa.gov/newsreleases/epa-announces-it-will-keep-...

sgnelson · 2025-09-14T15:27:23 1757863643

Well shit, we can really lower water bills by getting rid of all clean water regulations and simply stop water treatment.

Think of the cost savings!

yaroslavvb · 2025-09-14T15:54:09 1757865249

Stricter (but not looser) standards can be imposed on state level. Canada has no binding national drinking water law, they leave it to territories/provinces to decide how to implement guidelines.

sgnelson · 2025-09-14T18:39:24 1757875164

Watersheds don't follow political boundaries.

8note · 2025-09-14T19:19:29 1757877569

sometimes they do.

the Alberta/British Columbia border is defined by which direction water drains off the mountains

BobaFloutist · 2025-09-14T21:25:05 1757885105

That sounds like the political boundary follows the watershed.

It also doesn't actually refute the actual point they were making.

stouset · 2025-09-14T21:30:12 1757885412

What if instead we could all collectively agree that access to some amounts of fresh, running water is a fundamental human need? We figure a number, and the first N units are free. Additional units cost money, and perhaps you have two or three usage tiers where heavy users are disincentivized through additional cost.

You calculate the figures such that the higher usage tiers subsidize the costs of the basic needs users.

Or would that be socialism?

yaroslavvb · 2025-08-01T20:25:21 1754079921

I've used this feature before to make my chats discoverable through search engines. I had to manually click it each time I shared, it didn't toggle on.

yaroslavvb · on May 26, 2022

Researchers like to talk about and show off their work outside the company. If you don't let them, they get unhappy and leave.

yaroslavvb · on Nov 8, 2021

The difference is that for some assets you can calculate value based on fundamentals. IE, humans need shelter, hence estimate future value of shelter (real estate) based on migration trends and other factors. How do you estimate future value of bitcoin? Lack of predictability is probably why serious investment funds don't go into crypto

yaroslavvb · on Nov 5, 2021

Realistic simulation of neurons is expensive. Back in my grad school days we ran Genesis and could afford at most 10k neurons - each neuron needs a lot of work to model the corresponding differential equations. However, it's unclear how to translate this into requirements for artificial neural networks -- the type of computation is too different.

A different metric is a more relevant goalpost -- number of synapses. If each of 125 trillion synapses in the brain can adjust its strength independently of others, it loosely corresponds to a parameter in a neural network. So if we get 100 trillion parameter networks training but still no human intelligence, we'll know conclusively that the bottleneck is something else. Currently training 1T parameter networks seem feasible

marmaduke · on Nov 6, 2021

if you collapse things to just synapses, you’ve lost of the complexity of dendritic arbors. The article doesn’t mention gap junctions but there are networks of those too with different properties.

It seems to me that mean field models, which could be deep networks internally, are a much more parsimonious computational approach.

skyde · on Nov 7, 2021

we already know biological neural networks like the worm C-elegant is more intelligent than an artificial neural network of the same size.

isn’t it sufficient proof the bottleneck is elsewhere ?

yaroslavvb · on Nov 1, 2021

It has a name too -- крякозяблики (kryakoziabliki)

yaroslavvb · on Oct 19, 2021

Haha, me too. I guess it's a good learning exercise. I've been using Mathematica for 20 years (my name still shows up on https://stackoverflow.com/tags/wolfram-mathematica/topusers), but in a time crunch I would do this in Python.