From a black box perspective, LLMs are pretty simple, you put text or images in, (possibly structured) text comes out, maybe with some tool invocations.
If you use a good library for this, like Python's litellm for example, all it takes is changing one string in your code or config, as the library exposes most APIs of most providers under a simple, uniform interface.
You might need to modify your prompt and run some evals on whatever task your app is solving, but even large companies regularly deprecate old models and introduce vastly better ones, so you should have a pipeline for that anyway.
These models have very little "stickiness" or lock-in. If your app is a Twitter client and is built around the Twitter API, turning it into a Mastodon client built around the Mastodon API would take a lot of work. If your app uses Grok and is designed properly, switching over to a different model is so simple that it might be worth doing for half an hour during an outage.
Prompt to Output quality vary by a large amount between models IMO. The equivalent analogy would "lets switch programming language for this solved problem".
The models are still of a level where for less common/benchmarked tasks, there's often only one model that's very good at it, and whichever is 2nd best is markedly worse, possibly to a degree where it's unusable for anything serious.
I assume it'll be a paid API so the "contract" is a lot more clear. Twitter never understood what to do with its API so pulling that particular rug makes sense.
But I too wouldn't use this. X is playing fast and loose with ... everything, so having a business rely on their product seems risky.
The nice thing with LLMs is that the API is relatively simple - for the most basic case, it's string in, string out. While you may need to redesign your prompt a bit, I bet for many use cases, LLMs are reasonably interchangeable, and the integration work required for an API change should be minimal.
Those who would build on top of the API might be considering a couple of past changes that are significant, but not necessarily a reason to think they'll be further pain in the future: the company ownership changed, and those who train LLMs all of a sudden want all the human-created text on the internet.
Sometimes having a fast enough model at a low enough price makes you the obvious choice e.g. I know Claude is better than gpt-4o-mini but I use the latter for a lot more data processing because it's significantly cheaper and faster and the gains I'd get out of Claude seem somewhat marginal for my use case
Best at product / market fit. And that space is very very wide. Does the GenAI serve as a feature in a larger product (like realtime “reasoning” on X or in Apple’s case in iOS)? Is it a standalone product that general public or enterprises use? Does it play in a niche area? Etc.
I wasn't really talking about the marginal differences we see right now in August 2024.
I'm talking about the next huge step forward that only 1 company will achieve, because it simply has the most GPUs (in limited supply) + energy source first and keeps that advantage.
At some point this becomes a run-away self amplifying differentiator and it will make that company win regardless of all else.
My money is on xAI in 2025.
PS: the only reason prompts need to be optimized for each model is a symptom of models simply not being that good yet. This need will vanish in the near future as you get way better models. A recent hint of what I mean: mid-journey needed very elaborate prompt (and even loras) to get what you want. In flux that prompt can be much shorter (without loras) and it still gets closer to what you want. Same will happen with LLMs. Another example: with ChatGPT 4 you need to literally beg a model to only return what you ask for (for example JSON) or put it in a certain mode (JSON mode), in Claude Sonnet 3.5 it will simply just listen to what you ask for. So again: that's not "because every model needs model-specific fine-tuning" that's because previous models where simply not as good.
> People who worked at Twitter said it was bullshit
No, we have no idea from The Verge article whether the sources are even qualified to make such statements or if the statements are even true. In fact on the basis of the 99 percent speculative quote we can disregard the source quotes altogether. I'll say this, I work on far less significant software than X and we get DDOSed all the time.
> every other spaces event that was run at the same time was unaffected.
That's not true, I wasn't even able to load my feed during the initial part of the stream.
You seem to be invested in this topic in a weird and unhealthy way but there is nothing of value here in this comment.
You baselessly accuse journalists of straight up making things up and then go on to give some anecdotal evidence that conveniently nobody can disprove.
- every other spaces event that was run at the same time was unaffected.
- no other part of the website was impacted in any way whatsoever.
Aren’t these last two an argument FOR a ddos attack? It seems reasonable to assume we’re there a ddos attack at that time it would be against the Elon/Trump stream explicitly.
I’d like to see an explanation of how that is even possible to get that level of targeting without knowing the connection details of either Elon or Trump. The rest of the attack surface is surely shared infrastructure with the rest of the website.
So no I think it was just a straight up technical failure on their end.
"The Verge has no political bias". Okay, in the same way that wired has no political bias. They're so unbiased yet you know exactly the way an article is slanted towards given the topic and persons. Just like I know the slant given a reddit /r/all post or Fox News/msnbc article.
Verge editors most definitely are biased as are all humans. Journalists are not neutral. In this case someone made a "99 percent chance" speculative statement and the publication decided to print it as if it were fact and not just dismiss it as coming from someone who knew nothing.
We know nothing about the sources, and writers are not above making stuff up. I could just as easily spin it on them: there's a 99 percent chance they made up the sources.
Trump is both partisan and biased and doesn't claim to be neutral. Of course he was trashing things to do with his political opponents (he was running against DeSantis in the primary at the time).
The Register weighed in with a Yeah, Right skeptical attitude:
The Register has found no evidence of a denial of service attack directed at X. Check Point Software's live cyber threat map does not record unusual levels of activity at the time of writing. NetScout's real-time DDoS map recorded only small attacks on the US.
If a DDoS was indeed the reason for the delayed start of the event, it appears not to have impacted the rest of X's operations – there were plenty of posts commenting on the problems with the Space occupied by the interview. And Musk was tweeting from the very network said to be under attack.
The interview commenced some 40 minutes after its advertised time. Live audience statistics reported 1.1 to 1.3 million attendees during the portions of the event The Register observed – although during the stream Trump claimed that the event had an audience of 60 million or more, exceeding targets of 25 million.
This is the reason why we teach kids stories like little red riding hood because it’s just such a fundamental thing that when you lie about absolutely everything all the time people will just never trust you again even if you happen to be telling the truth one particular time.
And unfortunately both of these men are known for bullshitting more than anything else and have been now for a long time.