a big part of the whole "hack" of Uber in the first place is that people are using their personal vehicles. So the depreciation and many of the running costs are sunk costs already. Once you paid those already it becomes a super good deal to make money from the "free" asset you already own.
The depreciation would be amortized to cover more than one person. I only travel once or twice per week, it cost me less to use an Uber than to own a car.
When LLM use approaches this number, running one locally would be, yes. What you and other commentator seem to miss is, "Uber" is a stand-in for Cloud-based LLMs: Someone else builds and owns those servers, runs the LLMs, pays the electricity bills... while its users find it "economical" to rent it.
(btw, taxis are considered economical in parts of the world where owning cars is a luxury)
There's no voter suppression in US, and it won't stand in courts even if somebody pushes it. Supreme court keeps using partisan decision in favour of Dems and GOP, so it remains balanced. What's left is everything you mentioned.
I work for state government. We've used the ACS survey to try and determine whether we were unfairly targeting non-native English speakers with some of our decisions. It's also used a lot in academia.
If I had to guess, commercial organizations have access to more invasive and higher quality data that they obtain through credit card companies, lexus-nexus or other data brokers. This attitude mostly harms organizations involved in the social sciences.
I used to work for commercial organizations that sold marketing data and when some Republican senator came out against ACS, there was a bunch of activity to lobby hard to keep it. If we didn't need it, we wouldn't have spent all that money.
We mainly used it as cheap check of things and checksum against data we were getting. Without it, it would have been big blow.
It sounds like they are describing a regex filter being applied to the model's beam search. LLMs generate the most probable words, but they are frequently tracking several candidate phrases at a time and revising their combined probability. It lets them self correct if a high probability word leads to a low probability phrase.
I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.
You're actually applying a grammar to the token. If you're outputting, for example, JSON, you know what characters are valid next (because of the grammar), so you just filter out the tokens that don't fit the grammar.
They might not now how whisper works. I suspect that the answer to their question is 'yes' and the reason they can't find a straightforward answer through your project is that the answer is so obvious to you that it's hardly worth documenting.
Whisper for transcription tries to transform audio data into LLM output. The transcripts generally have proper casing, punctuation and can usually stick to a specific domain based on the surrounding context.
This is the high-level explanation of the simplest diffusion architecture. The model trains by taking an image and iteratively adding noise to the image until there is only noise. Then they take that sequence of noisier and noisier images and they reverse it. The result is that they start with only noise, and they predict the removal of noise at step until they get to the final step (which should be the original image (or training input)).
That process means they may require a hundred or more training iterations on a single image. I haven't digested the paper, but it sounds like they are proposing something conceptually similar to skip layers (but significantly more involved).
I've used their public site for a few private projects, mostly in habit from when private projects in GH were limited to paid accounts. The collaboration was a bit better at that time imo.
I'm not sure that I would choose it for self-hosting over gitea, forgejo or straight up ssh+git on a remote system, which works well enought for a personal backup target.
I think they're fantastic at generating the sort of thing I don't like writing out. For example, a dictionary mapping state names to their abbreviations, or extracting a data dictionary from a pdf so that I can include it with my documentation.