> That doesn't mean we _understand_ them, that just means we can put the blocks together to build one.
Perhaps this[0] will help in understanding them then:
Foundations of Large Language Models
This is a book about large language models. As indicated by
the title, it primarily focuses on foundational concepts
rather than comprehensive coverage of all cutting-edge
technologies. The book is structured into five main
chapters, each exploring a key area: pre-training,
generative models, prompting, alignment, and inference. It
is intended for college students, professionals, and
practitioners in natural language processing and related
fields, and can serve as a reference for anyone interested
in large language models.
> I think the real issue here is understanding _you_.
My apologies for being unclear and/or insufficiently explaining my position. Thank you for bringing this to my attention and giving me an opportunity to clarify.
The original post stated:
Since LLMs and in general deep models are poorly understood ...
To which I asserted:
This is demonstrably wrong.
And provided a link to what I thought to be an approachable tutorial regarding "How to Build Your Own Large Language Model", albeit a simple implementation as it is after all a tutorial.
The person having the account name "__float" replied to my post thusly:
That doesn't mean we _understand_ them, that just means we
can put the blocks together to build one.
To which I interpreted the noun "them" to be the acronym "LLM's." I then inferred said acronym to be "Large Language Models." Furthermore, I took __float's sentence fragment:
That doesn't mean we _understand_ them ...
As an opportunity to share a reputable resource which:
.. can serve as a reference for anyone interested in large
language models.
Is this a sufficient explanation regarding my previous posts such that you can now understand?
I'm telling you right now, man - keep talking like this to people and you're going to make zero friends. However good your intentions are, you come across as both condescending and overconfident.
And, for what it's worth - your position is clear, your evidence less-so. Deep learning is filled with mystery and if you don't realize that's what people are talking about when they say "we don't understand deep learning" - you're being deliberately obtuse.
edit to cindy (who was downvoted so much they can't be replied to):
Thanks, wasn't aware. FWIW, I appreciate the info but I'll probably go on misusing grammar in that fashion til I die, ha. In fact, I've probably already made some mistake you wouldn't be fond of _in this edit_.
In any case thanks for the facts. I perused your comment history a tad and will just say that hacker news is (so, so disappointingly) against women in so many ways. It really might be best to find a nicer community (and I hope that doesn't come across as me asking you to leave!)
============================================================
What I meant to say is that you were deliberately speaking cryptically and with a tone of confident superiority. I wasn't trying to imply you were stupid (w.r.t. "Ad Hominem").
Seems clear to me neither of us odd going to change the others mind though at this point. Take care.
edit edit to cindy:
=======================•••
fun trick. random password generate your new password. don't look at it. clear your clipboard. you'll no longer be able to log in and no one else will have to deal with you. ass hole
==========================
(for real though someone ban that account)
>>> Since LLMs and in general deep models are poorly understood ...
>> This is demonstrably wrong.
> That doesn't mean we _understand_ them ...
The previous reply discussed the LLM portion of the original sentence fragment, whereas this post addresses the "deep model" branch.
This article[0] gives a high-level description of "deep learning" as it relates to LLM's. Additionally, this post[1] provides a succinct definition of "DNN's" thusly:
What Is a Deep Neural Network?
A deep neural network is a type of artificial neural
network (ANN) with multiple layers between its input and
output layers. Each layer consists of multiple nodes that
perform computations on input data. Another common name for
a DNN is a deep net.
The “deep” in deep nets refers to the presence of multiple
hidden layers that enable the network to learn complex
representations from input data. These hidden layers enable
DNNs to solve complex ML tasks more “shallow” artificial
networks cannot handle.
Additionally, there are other resources discussing how "deep learning" (a.k.a. "deep models") works here[2], here[3], and here[4].
A perusal of the source code of, say, Ollama -- or the agentic harnesses of Crush / OpenCode -- will convince you that yes, this should be an extremely a simple feature (management of contexts are part and parcel).
Also, these companies have the most advanced agentic coding systems on the planet. It should be able to fucking implement tree-like chat ...
If the client supports chat history, that you can resume a conversation, it has everything required, and it's literally just a chat history organization problem, at that point.
This should be such an infrequent occurrence that the cost should be negligible. Surely their $10/month plan has enough margin that this can be covered?
There is likely a cost to the infrastructure necessary to enable calling 911 that scales with the number of users not the number of 911 calls. Where I'm at, there is a 75 cent per month fee added to phone plans to cover the costs of access to 911. If most people are on the free plan, the margin from the few paying customers won't cover it.
Not everything needs 10k RPS, and in some sense there are benefits to a new process – how many security incidents have been caused by accidental cross-request state sharing?
And in a similar vein, Postgres (which is generally well liked!) uses a new backend process per connection. (Of course this has limitations, and sometimes necessitates pgbouncer, but not always.)
Few years ago I felt the same and created trusted-cgi.
However, through the years I learned:
- yes, forks and in general processes are fast
- yes, it saves memory and CPU on low load sites
- yes, it’s simple protocol and can be used even in shell
However,
- splitting functions (mimic serverless) as different binaries/scripts creates mess of cross scripts communication
- deployment is not that simple
- security wise, you need to run manager as root and use unique users for each script or use cgroups (or at least chroot). At that moment the main question is why not use containers asis
Also, compute wise, even huge Go app with hundreds endpoints can fit just few megabytes of RAM - there is no much sense to save so few memory.
At worst - just create single binary and run on demand for different endpoints
Was that the only reason? In our last testing (2021), on the same hardware and for our specific case (a billions of records database with many tables and specific workfloads), mysql consistently left postgres in the dust performance wise. Internal and external devs pointed out that probably postgres (or rather, our table structures & indexes) could be tweaked with quite a lot of work to be faster, but mysql performed well (for our purpose) even with some very naive options. I guess it depends on the case, but I cannot justify spending 1 cent (let alone far more as we have 100k+ tables) on something while something else is fast enough (by quite a margin) to begin with...
It sounds like this may have been one of the pieces of software the author intentionally chose not to use:
> There are some clunky old Windows programs, niche scientific tools, and image analysis software that assumes you’re trying to count cells under a microscope...
Those downsides don't seem that bad to me. If something needed repair, they can bring a contractor on site, and certainly the Fed has cameras and police to monitor.
(In line with some other commenters, I'm more inclined to believe it's bills they've taken out of circulation than "almost finished" ones -- security features are built in throughout the process, not just an extra step at the end.)
Cameras still need to be monitored which costs money. Plus sleight-of-hand and other tricks are a thing, so you'd probably still need to background-check any contractors and maintain a strict chain of custody over access to the cube and then recount and re-check to make sure none of the currency has been substituted by lookalike fakes. Police still costs money.
If there is no reasonable way for the public to notice, why not make everyone's life easier by using fake money? That would be easier and cheaper.
Leaks can be dealt with by the legal system (pay people decently and make them sign an NDA in exchange not to disclose the bills are fake) which is much easier than actually keeping track of 1M of currency.
I imagine Cloudflare and AWS were on a Chime bridge while this all went down, they both have a lot at stake here.
reply