Show HN: Countless.dev – A website to compare every AI model: LLMs, TTSs, STTs

vunderba · 2024-12-07T16:01:45 1733587305

OP, were you inspired by this LLM comparison tool?

https://whatllm.vercel.app

The tables are very similar - though you've added a custom calculator which is a nice touch.

Also for the Versus Comparison, it might be nice to have a checkbox that when clicked highlights the superlative fields of each LLM at a glance.

Gcam · 2024-12-07T21:25:51 1733606751

Data in this tool is from https://artificialanalysis.ai/ on October 13 2024 and so is a little of out date.

This page has up to date information of all models and providers: https://artificialanalysis.ai/leaderboards/providers We also on other pages cover Speech to Text, Text to Speech, Text to Image, Text to Video.

Note I'm one of the creators of Artificial Analysis.

xnx · 2024-12-07T17:53:27 1733594007

Thanks for sharing. That's a better tool.

andrewmcwatters · 2024-12-07T19:50:53 1733601053

Both seem to have great value. Some information is missing from Vercel's tables.

ursaguild · 2024-12-07T12:38:58 1733575138

I like the idea of more comparisons of models. Are there plans to add independent analyses of these models or is it only an aggregation of input limits?

How do you see this differing from or adding to other analyses such as:

https://artificialanalysis.ai

https://huggingface.co/spaces/TTS-AGI/TTS-Arena

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena

Great work on all the aggregation. The website is nice to navigate.

botro · 2024-12-07T17:40:22 1733593222

I made https://aimodelreview.com/ to compare the outputs of LLMs over a variety of prompts and categories, allowing a side by side comparison between them. I ran each prompt 4 times for different temperature values and that's available as a toggle.

I was going to add reviews on each model but ran out of steam. Some users have messaged me saying the comparisons are still helpful to them in getting a sense of how different models respond to the same prompt and how temperature affects the same models output on the same prompt.

adrianomartins · 2024-12-08T23:44:04 1733701444

Hey, this is pretty insightful! Wonder if, in the course of researching to build this website you reached any conclusions as to what’s the AI assistant currently ahead.

rtsil · 2024-12-08T06:44:09 1733640249

I can confirm, it's still very helful, thank you!

ahmetd · 2024-12-07T13:13:53 1733577233

the gradio ui looks ugly imo, that's why I used shadcn and next.js to make the website look good.

I'll try to make it as user-friendly as possible. Most of the websites are ugly + too technical.

refulgentis · 2024-12-08T00:06:25 1733616385

I want to point out you dodged the data question, and there's a reason for it.

I like your work visually on first glance, god knows you're right about gradio, even if its irrelevant.

But peddling extremely limited, out of date, versions of other people's data, trumps that, especially with this tagline. "A website to compare every AI model: LLMs, TTSs, STTs"

It is a handful of LLMs, then one TTS model, then one STT model, both with 0 data. And it's worth pointing out, since this endeavor is motivated by design trumping all: all the columns are for LLM data.

vivzkestrel · 2024-12-08T07:11:48 1733641908

now imagine going one step further and actually running a prompt across every AI model and showing you the best answer and the AI model that generated it

alternatex · 2024-12-08T09:20:27 1733649627

Who decides what the best answer is?

vivzkestrel · 2024-12-08T13:18:42 1733663922

the user who runs the prompt?

alternatex · 2024-12-12T10:45:09 1734000309

Those tools exist, they do not need to be imagined. Look into the related comments. Also they do little, but increase the labor of getting an answer. Not exactly an improvement of AI for the user to spend more time reviewing AI answers.

karpatic · 2024-12-07T15:37:36 1733585856

Great! I wish there was a "bang to buck" value. Some way to know the cheapest model I could use for creating structured data from unstructured text, reliably. Using gpt4o-mini which is cheap but wouldn't know if anything cheaper could do the job too.

jampa · 2024-12-07T15:52:50 1733586770

Take a look at Gemini Flash 1.5. I had videos I needed to turn into structured notes, and the result was satisfactory (even better than the Gemini 1.5 Pro, for some reason). https://jampauchoa.substack.com/i/151329856/ai-studio.

According to this website, the cost is half of the gpt4-o mini. 0.15 vs 0.07 per 1M token.

nostrebored · 2024-12-07T18:01:17 1733594477

Seconding Gemini flash for structured outputs. Have had some quite large jobs I’ve been happy with.

sdesol · 2024-12-07T17:58:56 1733594336

I haven't found a model at the price point of GPT-4o mini that is as capable. Based on the hype surrounding Llama 3.3 70B, it might be that one though. On Deepinfra, input tokens are more expensive, but the output token is cheaper so I would say they are probably equivalent in price.

Also, best bang for the buck is very subjective, since one person might need it to work for one use case vs somebody else, who needs it for more.

mcbuilder · 2024-12-07T16:21:49 1733588509

I always plug openrouter.ai for making cross-model comparisons. It's my general goto for random stuff. (I am not affiliated, just a user)

pickettd · 2024-12-07T20:56:30 1733604990

I love the idea of openrouter. I hadn't realized until recently though that you don't necessarily know what quantization a certain provider is running. And of course context size can vary widely from provider to provider for the same model. This blog post had great food for thought https://aider.chat/2024/11/21/quantization.html

avereveard · 2024-12-08T03:12:27 1733627547

To expand a little, some providers may apply more aggressive optimization in periods of high load.

wslh · 2024-12-07T14:06:21 1733580381

I'd like to share a personal perspective/rant on AI that might resonate with others: like many, I'm incredibly excited about this AI moment. The urge to dive headfirst into the field and contribute is natural after all, it's the frontier of innovation right now.

But I think this moment mirrors financial markets during times of frenzy. When markets are volatile, one common piece of advice is to “wait and see”. Similarly, in AI, so many brilliant minds and organizations are racing to create groundbreaking innovations. Often, what you're envisioning as your next big project might already be happening, or will soon be, somewhere else in the world.

Adopting a “wait and see” strategy could be surprisingly effective. Instead of rushing in, let the dust settle, observe trends, and focus on leveraging what emerges. In a way, the entire AI ecosystem is working for you: building the foundations for your next big idea.

That said, this doesn't mean you can't integrate the state of the art into your own (working) products and services.

whiplash451 · 2024-12-07T14:15:20 1733580920

Your proposal makes a lot of sense. I assume a number of companies are integrating sota models into their products.

That being said, there is no free lunch: when you're doing this, you're more reactive than proactive. You minimize risk, but you also lose any change to have a stake [1] in the few survivors that will remain and be extremely valuable.

Do this long enough and you'll have no idea what people are talking about in the field. Watch the latest Dwarkesh Patel episode to get a sense of what I am talking about.

[1] stake to be understood broadly as: shares in a company, knowledge as an AI researcher, etc.

wslh · 2024-12-07T14:23:47 1733581427

Thank you for your thoughtful response! I completely agree that there's a tradeoff between being proactive and reactive in this kind of strategy: minimizing risk by waiting can mean missing out on opportunities to gain a broader "stake".

That said, my perspective focuses more on strategic timing rather than complete passivity. It's about being engaged with understanding trends, staying informed, and preparing to act decisively when the right opportunity emerges. It's less about "waiting on the sidelines" and more about deliberate pacing, recognizing that it’s not always necessary to be at the bleeding edge to create value.

I'll definitely check out Dwarkesh Patel’s latest episode. I assume it is the Gwern one, right? Thanks!

gtirloni · 2024-12-07T15:42:56 1733586176

Tangent question: is there anything better on the desktop than ChatGPT's native client? I find it too simple to organize chats but I'm having a hard time evaluating the dozen or so apps (most are disguise for some company's API service). Any recommendations? macOS/Linux compatibility preferred.

refulgentis · 2024-12-08T06:46:06 1733640366

Telosnex: every platform, native. Also, has web. Anthropic, OpenAI, Mistral, Groq, Gemini, and any local LLM on literally every platform. and you can bring your own API keys, and the best search available. Pay as you go, with everything at cost if you pay $10/month. Otherwise, free. Everythings stored in simple JSON.

ralfhn · 2024-12-07T16:56:48 1733590608

There’s https://www.typingmind.com/ local-only (no server) and built by an indie dev

thelittleone · 2024-12-07T17:05:52 1733591152

Peesonally im a Typing Mind user but it got too slow and buggy with long cbaglts. Ended up with boltai which is a natice mac app and found it very good after months of heavy use. I think it could also improve navigation coloring or iconography to help distinguish chats better but its my favorite so far.

rubymamis · 2024-12-07T17:20:57 1733592057

I'm working on a native LLM client that is beautiful and fast[1], developed in Qt C++ and QML - so it can run on Windows, macOS, Linux (and mobile). Would love to get your feedback once it launches.

[1] https://rubymamistvalove.com/client.mp4

shepherdjerred · 2024-12-07T17:33:05 1733592785

I've liked Machato: https://untimelyunicorn.gumroad.com/l/machato

politelemon · 2024-12-07T13:13:55 1733577235

There are only two audio transcription models. Is this generally true, are there no open source ones like llama but for transcribing? Or just small dataset on that site

rhdunn · 2024-12-07T14:03:20 1733580200

It looks like the site is only listing hosted models from major providers, not all models available on huggingface, civit.ai, etc. -- Looking at the image generation and chat lists there are many more models that are on huggingface that are not listed.

See https://huggingface.co/models?pipeline_tag=automatic-speech-...

Note: Text to Speech and Audio Transcription/Automatic Speech Recognition models can be trained on the same data. They currently require training separately as the models are structured differently. One of the challenges is training time as the data can run into the hundreds of hours of audio.

politelemon · 2024-12-07T16:52:55 1733590375

Thank you both

woodson · 2024-12-07T15:37:58 1733585878

There are lots and lots of models, covering various use cases (e.g., on device, streaming/low-latency, specific languages). People somehow think OpenAI invented audio transcription with whisper in 2022 when other models exist and have been used in production for decades (whisper is the only one listed on that website).

ursaguild · 2024-12-07T13:02:18 1733576538

Just saw that this was built for a hackathon. Huge kudos and congratulations!

ahmetd · 2024-12-07T13:12:32 1733577152

thank you! although I wasn't able to win the hackathon it was still a fun experience :)

tonetegeatinst · 2024-12-07T23:46:16 1733615176

Love the UI and table layout. Have you though about showing the different VRAM requirements for models?

mcklaw · 2024-12-07T11:53:53 1733572433

It would be great if llmarena leadership information would also appear to compare performance vs cost.

ursaguild · 2024-12-07T12:48:23 1733575703

https://lmarena.ai

ahmetd · 2024-12-07T12:07:48 1733573268

yep, will add this :)

xnx · 2024-12-07T11:37:17 1733571437

Nice resource. Almost too comprehensive for someone who doesn't know all the sub-version names. Would be great to have a column of the score from lmarena leaderboard. Some prices are 0.00? Is there a page that each row could link to for more detail?

ahmetd · 2024-12-07T12:10:26 1733573426

thank you! some models either have N/A or 0.00, I found it is like that for the free models and ones that aren't available.

As per llmarena I'll definitely add it, a lot of other people recommended it as well.

over time will make the website more descriptive and detailed!

kmoser · 2024-12-07T17:23:42 1733592222

And a link to the company page where one can use/subscribe to the model

lolinder · 2024-12-08T00:59:14 1733619554

One thing that stands out playing with the sorting is that Google's Gemini claims to have a context window more than 10x that of most of its competition. Has anyone experimented with this to see if its useful context window is actually anything close to that?

In my own experiments with the chat models they seem to lose the plot after about 10 replies unless constantly "refreshed", which is a tiny fraction of the supposed 128000 token input length that 4o has. Does Gemini actually do something dramatically differently, or is their 3 million token context window pure marketing nonsense?

avereveard · 2024-12-08T03:09:44 1733627384

https://github.com/NVIDIA/RULER results in benchmark other than needle in haystack seem solid all the way to 128k

lolinder · 2024-12-08T03:46:30 1733629590

Thanks, this is exactly the kind of info I was hoping existed.

danpalmer · 2024-12-08T01:57:44 1733623064

When the released it they specifically focused on the accurate recall across the context window. There are a bunch of demos of things like giving it a whole movie as input (frame every N seconds plus script or something) and asking for highly specific facts).

Anecdotally, I use NotebookLM a bit, and while that’s probably RAG plus large contexts (to be clear, this is a guess not based on inside knowledge), it seems very accurate.

wtvanhest · 2024-12-08T01:09:18 1733620158

What tactics do you use to refresh while using them?

zoltrix303 · 2024-12-08T01:26:33 1733621193

I tend to use a sentence along these lines: "Give me a straightforward summary of what we discussed so far, someone who didn't read the above should understand the details. Don't be too verbose."

Then i just continue from there or simply use this as a seed in another fresh chat.

lolinder · 2024-12-08T01:30:57 1733621457

I don't have a strategy that I like—it just amounts to having to say "you forgot about requirement X, try again keeping that in mind".

nikvdp · 2024-12-08T04:17:37 1733631457

you guys might also like http://llmprices.dev, similar but it's automatically updated with the latest info every 24h

robbiemitchell · 2024-12-07T18:16:34 1733595394

One helpful addition would be Requests Per Minute (RPM), which varies wildly and is critical for streaming use cases -- especially with Bedrock where the quota is account wide.

alif_ibrahim · 2024-12-07T15:20:53 1733584853

thanks for the comparison table! would be great if the header is sticky so i don't get lost in identifying which column is which.

ProofHouse · 2024-12-07T20:52:15 1733604735

These are hard to keep updated. I find they usually fall off. It would be cool to have one, but honestly, this one already doesn't even have 4o and pro on it which if it was being maintained, it obviously would. Updating a table shouldn't take days. It's like a one minute event.

ahmetd · 2024-12-09T16:26:00 1733761560

the reason 4o and pro doesn't exist is because they aren't available via API yet!

the website is updated, don't worry :)

Bigie · 2024-12-08T01:06:46 1733620006

I feel like the number is still a bit lacking, especially since many models made by Chinese companies are not represented, like speech-to-text.

As far as I know, there's a volcano engine in China that has impressive text-to-speech capabilities. Many local companies are using this model.

moralestapia · 2024-12-07T15:56:28 1733586988

Hey this is great!

A small suggestion, a toggle to exclude between "free" and hosted models.

Reason is, I'm obv. interested in seeing the cheaper models first but am not interested in self-hosting which dominate the first chunk of results because they're "free".

dangoodmanUT · 2024-12-07T14:40:02 1733582402

This is missing... so many models... like most TTS and STT ones.

11labs, deepgram, etc.

tomp · 2024-12-07T19:53:37 1733601217

"every"

you're missing a lot

TTS: 11labs, PlayHT, Cartesia, iFLYTEK, AWS Polly, Deepgram Aura

STT: Deepgram (multiple models, including Whisper), Gladia Whisper, Soniox

just off the top of my head (it's my dayjob!)

wiradikusuma · 2024-12-07T17:13:54 1733591634

Suggestions:

1. Maybe explain what Chat Embedding Image generation Completion Audio transcription TTS (Text To Speech) means?

2. Put a running number on the left, or at least just show total?

mtkd · 2024-12-07T11:47:56 1733572076

Would poss be further useful to have a release date column, license type, whether EU restricted and also right-align / comma-delimit those numeric cells

ahmetd · 2024-12-07T12:06:42 1733573202

good idea, will look into adding this!

5563221177 · 2024-12-08T03:50:21 1733629821

Logs emitted during the build, or test results, or metrics captured during the build (such as how long it took)... these can all themselves be build outputs.

I've got one where "deploying" means updating a few version strings and image reverences in a different repo. The "build" clones that repo and makes the changes in the necessary spots and makes a commit. Yes, the side effect I want is that the commit gets pushed--which requires my ssh key which is not a build input--but I sort of prefer doing that bit by hand.

shahzaibmushtaq · 2024-12-07T14:47:23 1733582843

It's weird that OpenAI has lower prices for same models and Azure has higher prices. Anyone can explain?

BTW impressive idea and upvoted on PH as well.

xrendan · 2024-12-07T15:10:01 1733584201

Azure charges differently based on deployment zone/latency guarantees, OpenAI doesn't let you pick your zone so it's equivalent to the Global Standard deployment (which is the same cost).

[0] https://azure.microsoft.com/en-us/pricing/details/cognitive-...

ahmetd · 2024-12-07T15:07:07 1733584027

tysm for the support!

OpenAI and Azure should be the same, it's weird that it shows it as different. I'll look into fixing this.

currently #2 on PH, any help would be appreciated!

mentalgear · 2024-12-07T14:52:23 1733583143

This is interesting price-wise, but quality-wise if you do not provide benchmark results, it's not that helpful a comparision.

ikishorek · 2024-12-10T10:31:41 1733826701

Can you please consider adding sort options to the cost columns in the Pricing Calculator?

Its_Padar · 2024-12-07T12:38:01 1733575081

Would be great if it was possible to get to the page where the pricing was found to make it easier to use the model

victoriawu · 2024-12-09T08:34:02 1733733242

It’s indeed quite intuitive to see the details of each AI model, but it feels a bit overwhelming with too much information.

I wonder if adding a chatbot might be a good idea. Users could ask specific questions based on their needs, and the bot could recommend the most suitable model. Perhaps this would add more value.

SubiculumCode · 2024-12-07T21:50:25 1733608225

I was surprised: what is that the model that costs the most per token? Luminous-Supreme-Control

ahmetd · 2024-12-09T16:22:53 1733761373

some European safety AI company haha. It's exactly what you would expect from Europe :)

e-clinton · 2024-12-07T17:39:14 1733593154

DeepInfra prices are significantly better than what’s listed for OS models.

amelius · 2024-12-07T16:51:55 1733590315

I'm missing the "IQ" column.

methou · 2024-12-07T15:06:35 1733583995

Thank you on behalf of my waifu!

NoZZz · 2024-12-07T17:34:53 1733592893

Stop feeding their machine.