I think something with more uniform training and inference setups, and otherwise equally hardware friendly, just as easily trainable, and equally expressive could replace transformers.
I find a lot of Rust libraries "seem" dead, based on github activity, but looking into it, they are actively used in many projects. I think Rust projects just tend to have less open issues, and don't need to be maintained as often. This is also the case internally at my company.
Very impressive! I guess this still wouldn't affect their original example
> For example, you might observe that asking ChatGPT the same question multiple times provides different results.
even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch.
> Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.
The router also leaks batch-level information across sequences.
> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch.
I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work.
So what do you do if you have a better product and a "name brand" disadvantage? Advertising commodifies information flow instead of letting it pool with the people who already have access to it. Think of all the products that got big nowadays because they could convince VCs to fund ad spend, and saw a return for it.
I think advertising has a huge, positive, 2nd order effect on the world.
Please expound. Are we going to fill in yearly surveys explaining what we like and don't like? Where does the information come from? Who determines the algorithm for placement? Will there now be no way to opt-out of ads at all, there's now a national quota on how many ads were exposed to a year or something?
There's no algorithm. Just a query language. Nothing is pushed to consumer. You want to buy something. You search. Companies can't pay anyone for any kind of publishing. Anyone is free to build tools and content that helps with the search. However companies can publish information about their products and services only through the database.
> However companies can publish information about their products and services only through the database.
Since we're already dreaming, I'd modify this to say companies can publish information about their products and services only on their own website, and the database just links to it.
I wouldn't go that route just in case companies has incentive to obfuscate the information. Forcing them to publish through the database makes them confirm their information to one structure so consumers have easier time searching comparing and deciding. I'd also make publishing of some information mandatory before you can sell to customers.
I didn't mean my suggestion to imply that the DB would be barren of information and you'd have to go to multiple websites to compare products. However, I think the information in the DB, even if provided by the company, should be vetted and inputted by a neutral party, since the company has an incentive to manipulate or bias it.
It says rookie contract + endorsements, and I don't think it seems particularly small, just doing the math. Also I doubt that ex: Thrive LPs would care about this kind of thing. This whole comment just seems wrong.
It's a big number, but it's spread across a lot of different investments.
Some of those big-name VC funds wouldn't even return your call if you wanted to invest several million dollars. Founders Fund brings in multiple billions of dollars when they raise a new fund.
His fame and status unlocked his access.
> Also I doubt that ex: Thrive LPs would care about this kind of thing.
Thrive has raised over ten billion dollars. If a normal moderately wealthy person showed up with a couple million dollars to invest they would not be invited to be an LP.
Huh? of course its enough. Transformers immediately started destroying every single baseline out there. The authors definitely knew it was a very significant discovery beforehand.
Hmmmm I don't think that would make sense. Closest analogy is working with humans. The easiest way to work with a human isn't a thin, limited api, but rather to give them context and work together (employment). I think the future of software will look more like Claude Code. Lets the model work in a similar space as us, where it can intelligently seek out information and use tools as a human would.