I wonder if 1bit quantization is the *main* reason why pplx.ai is faster than an...

vitorgrs · on Feb 29, 2024

Nop. The model on Perplexity is a finetuned GPT 3.5 (the free one). And the paid versons, well, you can choose between GPT4 (not turbo), Gemini pro, Claude, etc.

You can choose their model ("Experimental"), but is not faster than the other models.

All of these, proprietary models are fast on Perplexity. I do guess they are using some insane cache system, better API infrastructure...

refulgentis · on Feb 29, 2024

Absolutely not, 1 bit isn't even real yet. perplexity does a ton of precaching, TL;Dr every novel query is an opportunity to cache: each web page response, the response turned into embeddings, and the LLM response. That's also why I hate it, it's just a rushed version of RAG with roughly the same privacy guarantees any incumbent would have given you in last 15 years (read: none, and gleefully will exploit yours while saying "whoops!")