Python is just a beautiful, well-designed language - in an era where LLM's generate code, it is kind of reassuring that they mostly generate beautiful code and Python has risen to the top. If you look at the graph, Julia and Lua also do incredibly well, despite being a minuscule fraction of the training data.
But Python/Julia/Lua are by no means the most natural languages - what is natural is what people write before the LLM, the stuff that the LLM translates into Python. And it is hard to get a good look at these "raw prompts" as the LLM companies are keeping these datasets closely guarded, but from HumanEval and MBPP+ and YouTube videos of people vibe coding and such, it is clear that it is mostly English prose, with occasional formulas and code snippets thrown in, and also it is not "ugly" text but generally pre-processed through an LLM. So from my perspective the next step is to switch from Python as the source language to prompts as the source language - integrating LLM's into the compilation pipeline is a logical step. But, currently, they are too expensive to use consistently, so this is blocked by hardware development economics.
mhhm yes yes. There's a thread of discussion that I didn't quite chose to delve into in the post, but there is something interesting to be found in the observation that languages that are close to natural language (Python being famous for being almost executable pseudo-code for a while) being easier for LLMs to generate.
Maybe designing new languages to be close to pseudo-code might lead to better results in terms of asking LLMs to generate them? but there's also a fear that maybe prose-like syntax might not be the most appropriate for some problem domains.
Wasn't there that thing about how large LLM's are essentially compression algorithms (https://arxiv.org/pdf/2309.10668)? Maybe that's where this article is coming from, is the idea that finetuning "adds" data to the set of data that compresses well. But that indeed doesn't work unless you mix in the finetuning data with the original training corpus of the base model. I think the article is wrong though in saying it "replaces" the data - it's true that finetuning without keeping in the original training corpus increases loss on the original data, but "large" in LLM really is large and current models are not trained to saturation so there is plenty of room to fit in finetuning if you do it right.
Not sure what you mean by “not trained to saturation”. Also I agree with the article, in the literature, the phenomenon to which the article refers is known as “catastrophic forgetting”. Because no one has specific knowledge about which weights contribute to model performance, by updating the weights via fine-tuning, you are modifying the model such that future performance will change in ways that are not understood. Also I may be showing my age a bit here, but I always thought “fine-tuning” was performing additional training on the output network (traditionally a fully-connected net), but leaving the initial portion (the “encoder”) weights unchanged - allowing the model to capture features the way it always has, but updating the way it generates outputs based on the discovered features.
OK, so this intuition is actually a bit hard to unpack, I got it from bits and pieces. So this is this post https://www.fast.ai/posts/2023-09-04-learning-jumps/. Essentially, a single pass over the training data is enough for the LLM to significantly "learn" the material. In fact if you read the LLM training papers, for the large-large models, they generally explicitly say that they only did 1 pass over the training corpus, and sometimes not even the full corpus, only like 80% of it or whatever. The other relevant information is the loss curves - models like Llama 3 are not trained until the loss on the training data is minimized, like typical ML models. Rather they use these approximate estimates of FLOPS / tokens vs. performance on benchmarks. But it is pretty much guaranteed that if you continued to train on the training data it would continue to improve its fit - 1 pass over the training data is by no means enough to adequately learn all of the patterns. So from a compression standpoint, the paper I linked previously says that an LLM is a great compressor - but it's not even fully tuned, hence "not trained to saturation".
Now as far as how fine-tuning affects model performance, it is pretty simple: improves fit on the fine-tuning data, decreases fit on original training corpus. Beyond that, yeah, it is hard to say if fine-tuning will help you solve your problem. My experience has been that it always hurts generalization, so if you aren't getting reasonable results with a base or chat-tuned model, then fine-tuning further will not help, but if you are getting results then fine-tuning will make it more consistent.
Always appreciated the work of Jeremy Howard. Also had a lot of fun using the Fast.ai framework. My experience is similar to your description. When using 2, 3, or more epochs, felt that overfitting started to emerge. (And I was CERTAINLY not training models anywhere near the size of modern LLMs) I suppose in this case by “saturation” you meant training “marginally before exhibiting over-fitting” - something akin to “the elbow method” w.r.t. clustering algorithms? I’ll have to chew on your description of overfitting results for a while. It jives with mine, but in a way that really makes me question my own - thanks for the thought provoking response!
I was thinking this was about leaking the kernels or something, but no, they are
"publishing" them in the sense of putting out the blog post - they just mean they are skipping the peer review process and not doing a formal paper.
I'd say about 90% of the pain in NixOS comes from its non-FHS (Filesystem Hierarchy Standard) layout. But that's also a fundamental part of Nix/NixOS's design-it was built that way from the start.
For complex packages like Steam, it's both possible and recommended to use FHS-compatible containers on NixOS. Still, I've seen people say things like, "All I do is set up containers-why not just use Docker instead of NixOS?" The thing is, if you dig deeper, tools like Docker or Flatpak are actually less powerful than Nix when it comes to container management.
I've been toying with an idea: using filesystem access tracing to replace the current approach of using random hashes for isolation. This could allow an FHS-style layout while preserving many of the guarantees of the Nix model. It would dramatically improve compatibility out-of-the-box, enable capabilities that aren't possible today, and reduce network and disk usage-since files could be modified in-place instead of being remade or redownloaded.
It's on my backlog, though. Starting a new distro doesn't seem particularly rewarding at the moment.
1,1,1-trichloroethane doesn't seem particularly toxic - "probable carcinogen", some neurological and liver effects but I'd say it's probably still safer than e.g. isopropyl alcohol which definitely leads to neurological issues long-term. The reason it's banned is because of the ozone layer, not because it's unsafe to individual humans.
Hot take but you can be a terrible designer and be completely unknown too. I've been getting into music and there are a lot of wannabes and very few "gems hidden in the dirt" or whatever - if your music is good you'll at least be able to get some decent bookings.
There is a question of originality. If the variable names, comments, etc. are preserved, then yes, it is probably a derivative work. But here, where you are starting from the obfuscated code, there is an argument that the code is solely functional, hence doesn't have copyright protection. It's like how if I take a news article and write a new article with the same facts, there's no copyright protection (witness: news gets re-reported all the time). There is a fine line between "this is just a prompt, not substantial enough to be copyrightable" and "this is a derivative work" which is still being worked out in the legal system.
Sometimes the only way to know something is important is to shut it off and see if anyone complains. For example, lots of stories in https://news.ycombinator.com/item?id=9629714. Now certainly the Trump administration could have been more careful, but they only have 4 years so the Facebook motto of "move fast and break things" applies.
> the Facebook motto of "move fast and break things" applies.
That’s seriously begging the question of whether a website started to rate the attractiveness of Zuckerberg’s classmates has the same consequences for society if it fails as the government. When you work on something which actually matters, there are virtues other than speed. What the Republicans are doing is like clearing your lawn by setting it on fire, saying they didn’t have time to do anything slower.
It’s estimated that just the USAID cuts alone are on the order of hundreds of children being born HIV positive every day, not to mention the impact of food aid disappearing during a famine, or shutting down the last option for afghan women to get educated:
The science funding has a lower death toll, of course, but it profoundly disrupts careers and pushes people out of the country. Someone educated in the United States who returns to their home country ends up competing with us and probably won’t come back. The grad student getting cut now will probably end up leaving science entirely (people need to make rent and student loan payments) so we’ll be missing out on their lifetime achievements and also the later-career guidance they would have given the next generation.
The federal government as a whole becomes less efficient because fewer top people will be willing to work for lower pay without job security and every contractor will be pricing in future disruption.
Thats fine for a sofrware startup because it fundamentally doesn't matter. Who cares if your silly website fails after you experiment, no one gets seriously hurt.
Shutting off the government means that things can be irreparably damaged. Losing a generation of scientists because of random cullings at the NSF will have effects for decades.
In the worst case, "moving fast and breaking things" with the government will kill people. For example, many patients were kicked off clinical trials during the NIH funding freeze. Abroad, the end of PEPFAR could kill untold numbers of people.
To be rather abrasive in my response: I believe your view is a waste of air. In case I'm correct how about we cut you off from air for a week and if there's a problem we'll restore it then.
That is how a large portion of the internet works, e.g. in most subreddits certain viewpoints will be instantly banned without any discussion. HN is kind of strange in that respect.
I figured it was a rather apt example of how the turn it off and wait until someone complains doesn't work if the damage done in the wait until it's restored time isn't repairable. The abrasive personal example is because he's ignoring that this view has many people's lives at risk when we talk about programs like usaid.
All of the important programs have temporary restraining orders. That's actually the standard the judge applies, "is there a possibility of irreparable harm?" (e.g. lives lost). It's not perfect but no system is.
> but they only have 4 years so the Facebook motto of "move fast and break things" applies.
Except with the federal government “things” in many instances refers to people’s lives. What’s the acceptable body count to you, as we approach haphazardly and unconstitutionally reducing the deficit?
> Sometimes the only way to know something is important is to shut it off and see if anyone complains.
These government programs aren't stray servers in a closet.
Even if you believe that these programs should be stopped, it's entirely wasteful to abruptly end them and let their work in progress just crash out and burn.
But it's still a very bad idea to operate this way. There is no rapid feedback loop. The negative effects can be subtle and take years to ripple through the economy and science world.
Startups have nowhere to go but up. Large established companies have nowhere to go but down. Why do you think large organizations are so conservative? It's because getting new customers is much harder than losing existing customers. The US government has flaws, but it is phenomenal overall.
This is like taking over Apple and tearing apart its culture and management. Only bad will come out of it.
Have you been paying attention to Republicans over the last 40 years? They don't care if it's useful or important. They don't want government programs to exist.
There's certainly an argument that anything the government can do, the private sector can do better. That argument would conclude that the government should indeed not exist, and consequently have no programs. The reality is more complicated, something like the microkernel vs. monolithic kernel debate, but it is hard to say that the current distribution between private and public sectors is optimal.
> If scenarios with different mixes of CC/DAC and WWS were performed, it would not be possible to conclude whether one is an opportunity cost. Instead, using a mixture requires assuming that both CC/DAC and WWS should be used before determining whether one has any benefit relative to the other.
Seems stupid - they are both being used, so even the business-as-usual scenario is a mixture. If indeed the 100% WWS + 0% CC/DAC scenario is better than than the 95% WWS + 5% CC/DAC scenario, then it is logical to conclude that CC/DAC is useless, but according to the tables and figures, they didn't even look at whether a 50/50 WWS + CC/DAC split would be better or worse. Yet their conclusion is still "policies promoting CC and SDACC should be abandoned". They have these really complex models but at the end of the day it is garbage in, garbage out.
But Python/Julia/Lua are by no means the most natural languages - what is natural is what people write before the LLM, the stuff that the LLM translates into Python. And it is hard to get a good look at these "raw prompts" as the LLM companies are keeping these datasets closely guarded, but from HumanEval and MBPP+ and YouTube videos of people vibe coding and such, it is clear that it is mostly English prose, with occasional formulas and code snippets thrown in, and also it is not "ugly" text but generally pre-processed through an LLM. So from my perspective the next step is to switch from Python as the source language to prompts as the source language - integrating LLM's into the compilation pipeline is a logical step. But, currently, they are too expensive to use consistently, so this is blocked by hardware development economics.
reply