I wonder if 1bit quantization is the main reason why pplx.ai is faster than any other RAG or chatbot. For instance, Gemini in comparison is a turtle, though it is better at explanations, while pplx is concise.
Nop. The model on Perplexity is a finetuned GPT 3.5 (the free one).
And the paid versons, well, you can choose between GPT4 (not turbo), Gemini pro, Claude, etc.
You can choose their model ("Experimental"), but is not faster than the other models.
All of these, proprietary models are fast on Perplexity. I do guess they are using some insane cache system, better API infrastructure...
Absolutely not, 1 bit isn't even real yet. perplexity does a ton of precaching, TL;Dr every novel query is an opportunity to cache: each web page response, the response turned into embeddings, and the LLM response. That's also why I hate it, it's just a rushed version of RAG with roughly the same privacy guarantees any incumbent would have given you in last 15 years (read: none, and gleefully will exploit yours while saying "whoops!")