Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Awesome-totally-open-ChatGPT: A list of open alternatives to ChatGPT (github.com/nichtdax)
343 points by pallas_athena on March 21, 2023 | hide | past | favorite | 48 comments


Comparing search engines has been my passion for a while now. I built various tools around it (https://www.gnod.com/search/)

With the advent of LLMs, I also started doing it for AI engines recently:

https://www.gnod.com/search/ai

Will keep it updated with new LLMs as they come on the scene.

If anybody runs one of these open ChatGPT alternatives as a service, let me know. I would love to include your service in the AI comparison.


Good idea. A couple more non-AI engines: https://search.marginalia.nu/ and my own https://highbrow.se/


Something in Marginalia's query processing is almost comically broken right now. Only really works well for single-term queries. Got a fix in the pipe, but it'll be a week or so.


Do you have any suggestions how to feed different search engines into each other, along with random injections of random prompts, to watch their dialog?


Prompt:

What would happen if the Eiffel tower was submerged in a massive tub of liquid nitrogen?

    ----
YouChat:

Based on the provided search results, there does not appear to be any reliable information on what would happen if the Eiffel Tower was submerged in a massive tub of liquid nitrogen. It is important to note that submerging the Eiffel Tower in such extreme conditions could potentially cause damage or destruction to the tower, and it is not recommended to attempt such an experiment. If you have any other questions or concerns, please let me know and I will do my best to assist you.

    ----
Phind:

Liquid nitrogen is a colorless, odorless, and non-flammable substance that is used in many industrial processes. It has a boiling point of −196 °C (−321 °F), which is much lower than the freezing point of water. If the Eiffel Tower were submerged in a massive tub of liquid nitrogen, several things would happen [4].

The liquid nitrogen would quickly evaporate into gas since the Eiffel Tower is much warmer than −196 °C. This would create a cloud of nitrogen gas around the tower, which could obscure its visibility.

The tower's metal structure would initially contract due to the extreme cold, which could cause it to make cracking or popping sounds. However, the contraction would be minimal since the tower is made of wrought iron, which has a low coefficient of thermal expansion [5].

The tower's paint would become brittle and could crack due to the sudden change in temperature. However, the paint layer is not thick enough to affect the overall structural integrity of the tower [1].

The tower's micro-movements due to wind and sun exposure would be reduced since the metal structure would be much colder and less prone to expansion and contraction. However, this effect would be temporary since the tower would eventually warm up again [1].

The tower's lights would still be visible since they are powered by electricity, which is not affected by the temperature of the tower [3].

In summary, submerging the Eiffel Tower in a massive tub of liquid nitrogen would not cause any significant damage to the tower's structure. However, it could create a temporary cloud of nitrogen gas and cause the tower's paint to become brittle.

[1]: https://www.toureiffel.paris/en/news/130-years/how-long-can-...

[2]: <missing>

[3]: https://www.toureiffel.paris/en/news/history-and-culture/15-...

[4]: https://quizlet.com/501418121/assignment-4-eiffel-flash-card...

[5]: https://www.snexplores.org/article/how-science-saved-eiffel-...

    ----
Perplexity:

Submerging the Eiffel Tower in a massive tub of liquid nitrogen would not be possible as it would require an enormous amount of liquid nitrogen, which is not readily available. However, if such a scenario were to occur, the tower would become extremely cold and brittle due to the low temperature of liquid nitrogen (-196°C). The metal structure of the tower would contract and shrink, causing it to become more fragile and prone to breaking. Additionally, any water or moisture on the tower would freeze instantly, leading to potential damage or destruction of the structure[1][2].

[1]: https://youtube.com/watch?v=u4gobiTvFg0

[2]: https://insh.world/science/what-if-we-poured-a-pool-of-liqui...


Phind is pretty impressive, I wonder what proprietary models it uses, GPT?


Nice, I've found engines that our corporate firewall doesn't block, woot =)


It's absolutely fantastic that we have so many runtimes, so quickly, to the point where we have an awesome list.

However, given that the usefulness of chatbots depends more on the model being used, what I would find a lot more useful is a ranking of the various models that are available. Currently I'm having to rely on comments on the internet to find out if Alpaca 7B or LlaMA 65B is genuinely productive to use. As new models come out, I'd love it if I knew how well it tells jokes, answers complicated questions, or generates code.


We have a whole team of folks just watching for these to come out and then go evaluate them.

Short answer: none of them do as well as the OG Davinci-003. Not even close. Even the 3.5 Turbo models from OpenAI don’t do as well.

We throw some sophisticated prompts at them to attempt chain of thought reasoning.


That's quite a confusing comment. `davinci-003` is from OpenAI, whereas ChatGPT is some sort of variants more "optimized" for chatting. Said differently, GPT3 or 3.5 is a customized version of `davinci-003`, made for chatting. Please don't ask me on the details, I don't know, but `davinci-003` is not an alternative to ChatGPT


>but `davinci-003` is not an alternative to ChatGPT

Why makes you believe that? In my testing davinci does better than gpt-3.5-turbo for most tasks.


I think people, and this article, is about suggesting alternatives (competitors) to ChatGPT. `davinci` is obviously not an alternative, ChatGPT is `davinci` made for chatting. As to whether davinci produces better responses than ChatGPT ... maybe? but that's a different question


It is an alternative. It’s just more expensive.


Do you have a citation for that?


would be interested in that as well


What kind of things have you seen davinci-003 do better than 3.5 turbo?


We need open benchmarks, clearly. Know any projects in that space?


Could you expand on this a bit more? What types of prompts? What are your evaluation criteria?

This actually sounds fascinating. Not unlike birdwatching! ))


That’s interesting - what about 4?


No! We can’t have open alternatives! That would be “unsafe”. We need regulatory capt— I mean, responsible government regulation now! As a MSFT shareholder I demand it.


OpenAI: This technology is too power to be put in the hands of mere mortals!

Also OpenAI: ... unless, they pay us $20 per month!


Sam Altman: "We definitely need more regulation on AI"

Regulation could prevent indie AI projects.


Just like regulation stopped piracy and 3d printed weaponry and prostitution and the war on drugs and prohibition stopped the drug and alcohol trade in America.


Regulation can do bad things/Regulation can do good things ? Is this what you meant ?


I suspect that gp meant that, for a lot of things, regulation is ineffective and costly.


ChatGLM is the only one on the list whose performance feels similar to gpt3. I tried out Alpaca, wasn’t as good yet but I’ve heard on Twitter someone has a better dataset and has finetuned it, let’s see if they release the model weights


I posted this yesterday too but I've found OpenAssistant to be the best of the OSS ones so far (Alpaca and LLaMA are non-commercial so not truly open): https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...


What's nice about Open Assistant is that it's continuously improving and the community can contribute to the training data. It is potentially MUCH better than ChatGPT.


I've tried most of these and I gotta say, ChatGLM is actually pretty good. The Chinese responses are sometimes indistinguishable from those of ChatGPT, and it's only a 6B model.

Sadly, whenever I use harder questions or logical ones, it performs terribly. It also occasionally uses Chinese adjectives (albeit correctly) when responding in English.


I wonder if their censorship means that the Chinese corpus has less spam than the rest of the internet? Would be interesting if that turns out to be a huge advantage for making AI.


Chinese censorship is both subtractive and additive (“flood the zone with shit”). There will be plenty of spam in the corpus.


I feel like we need to invest in open alternatives otherwise we are gonna create something worse than IBM used to be.


What we need is a crowdsource project where we dedicate gpu resources to build a model from scratch. There are thousands of us in here, and thousands more in the wide community.


Open-assistant.io is what you want. They've gotten more than 100k RLHF training dataset and is actively tuning up a Chat model based on GPT-NeoX (?) from what I've heard.


What would be great: A list of performance benchmarks: What's the model good at?


Those exist - the trouble is that they're not for the average mortal...


Cool, but a pity there are no benchmarks so we can compare them.


If OpenAI would release more than masturbation papers, yes, we could compare them. Right now, we only know GPT-4 does things, and looks smart when you ask it stuff.


What do you mean? The paper [1] is full of benchmarks?

[1] GPT-4 Technical Report, https://cdn.openai.com/papers/gpt-4.pdf


Problem being its not a Research Paper, which they where doing previously. This is very bad state as you're not detailing anything that external parties can recreate or prove the scientific method.

They can exclaim the model says 40% less "xbox live gamer words" which people outside the company couldn't validate.

tl:dr OpenAi is now a business

Worth watch Yannic talk about the problem and other cool ML topics too. https://www.youtube.com/watch?v=2zW33LfffPc


It's not like a closed model only available to scientists you can't benchmark yourself. Benchmarking should also be done by a 3rd party otherwise we have a conflict of interest.


If this was a cpu/graphic cards sure lets benchmark it, worst case you getting less frames.

Here we'd need to see more about its design and safety, else you may be getting recipes for veggie dishes when what you really wanted was fried chicken.


How would knowing the architecture or safety mechanisms help you decide if it’s going to give incorrect results more than actual testing would?

I’m no LLM expert, but I don’t think you can eyeball the arch and say “that’s going to confuse veggies for fried chicken”.


https://aisnakeoil.substack.com/p/gpt-4-and-professional-ben...

"GPT-4 and professional benchmarks: the wrong answer to the wrong question OpenAI may have tested on the training data. Besides, human benchmarks are meaningless for bots."


Just looking at the pictures and graphs in that paper is enough to become amazed by what they're achieving. The example where they show 3 pictures of an old monitor plug being connected to an iphone to recharge it, and then GPT4 is asked what's funny about it, and answers incredibly accurately, is amazing.


Since we don't have access to this feature lets be skeptical, its feels like "leading the witness," if your asks what be the funny here. Also if the image is from a forum or sub with funny images is that able to give it away?

Having multiple tests would be a stronger test say with example prompts: "whats going on in this picture", "what would a person think seeing this image" etc..

gpt4 is cool as a numbers box but this is not reasoning logic and without papers hasn't been proven either.


There are plenty of people with access to GPT-4 through the API. They can benchmark it however they want.


Your suggestion made me wonder: are the OpenAI API responses deterministic? I would expect that if one gets a different reply to the same prompt each time (as I would expect, given the probabilistic nature of LLM), that makes "benchmarking" kind of weird. Or maybe the solution is just to run the prompt 100 times and mean/average across whatever metric is being used




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: