*KA: Mable’s heart rate at 9 AM was 75 bpm and her blood pressure at 7 PM was 12...

rytill · on Aug 8, 2023

I just tried this, and GPT-4 said:

Based on the information provided, Mable was alive at noon. Her heart rate was recorded at 9 AM, and her blood pressure was recorded at 7 PM, both of which are signs of life. She was reported to have died at 11 PM, so she would have been alive at noon.

I checked five times and it answered that Mable was alive at noon all five times.

lukeinator42 · on Aug 8, 2023

haha, I just asked a quantized version of llama 2 13b model using llama.cpp the same question and this is what it said the first time: "Answer: No, Mable was not alive at noon. Her heart rate at 9 AM was 75 bpm, which suggests that she was already experiencing some medical issues. By 7 PM, her blood pressure had risen to 120/80, which is a concerning sign for hypertension. Taken together with the fact that she died at 11 PM, it is likely that Mable was not alive at noon. [end of text]"

and a second time it said: "Solution: Mable's heart rate at 9 AM was 75 bpm, which means that it was not abnormal at that time. At 7 PM, her blood pressure was 120/80, which is also within the normal range. Therefore, based on these two pieces of information, Mable was alive at noon.

Note: This problem is an example of a "living or dead" problem, where we are given two or more pieces of information about a person's physical state and must determine if they are alive or dead based on those facts. These types of problems can be tricky, as the information may not be directly related to the person's vital signs, but rather to other aspects of their health or well-being. [end of text]"

If I got the second answer the first time I'd definitely be impressed. A paper like this should probably run the tests a bunch of times though to quantify how badly these networks "can't reason".

danielbln · on Aug 9, 2023

Here is the output of llama2-7B-uncensored (a fun model to play with, btw.):

1st try: > Yes, she was alive at noon as her blood pressure and heart rate were within normal range.

2nd try: > No, Mable was not alive at noon as she died 4 hours before that. Her blood pressure and heart rate were both normal at 120/80 mmHg and 75 bpm respectively during the time when her death occurred between 11 PM to 12 AM.

3rd try: > If Mable's heart rate was 75 bpm at 9 AM and her blood pressure was 120/80 mmHg at 7 PM, then it is likely that she was alive at noon.

rytill · on Aug 8, 2023

Which quantized version?

lukeinator42 · on Aug 8, 2023

the llama-2-13b-chat.ggmlv3.q4_0.bin version from here: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML

glerk · on Aug 8, 2023

Mine also correctly answered that she was alive at noon, but added his standard infuriating disclaimer in the last sentence:

> However, always remember that real-life medical situations can be complex, and this is a simple logical deduction based on the data provided. If this is a fictional or hypothetical scenario, then the above assumption is valid. If this is a real-life situation or requires medical expertise, it would be essential to consult with professionals or have more comprehensive data to determine the facts.

nerevarthelame · on Aug 8, 2023

I don't think it's infuriating that OpenAI tries to discourage people from getting medical advice from an LLM that was not designed to provide it.

glerk · on Aug 8, 2023

I just wish there was a “treat me like an adult” button.

H8crilA · on Aug 8, 2023

The people that write such "reviews" of ChatGPT must have no respect for their careers. To include statements that can be disproven in 30 seconds is ... something else.

croes · on Aug 8, 2023

How do you know they didn't change the model?

not2b · on Aug 8, 2023

Do you know whether you are using the same version of GPT-4 as the authors of the article did?

rytill · on Aug 8, 2023

I don’t, but for most HN readers, the latest user-facing model’s capabilities are most relevant.

biofox · on Aug 8, 2023

In fairness, Mable might have died at 11:59 and been resuscitated at 12:01.

Waterluvian · on Aug 8, 2023

I wonder if “well technically…” is useful, or if it’s almost always preferred for a conversational AI to speak with common sense.

AstralStorm · on Aug 9, 2023

Or if it produces uncommon results, it has to spell them out with a detailed enough explanation, not "I need more data".

EGreg · on Aug 8, 2023

It may not know about what noon means here

I had an experiment where I asked GPT-3 whether it agrees that AI-generated content will constitute 20% of all content in 9.999 years from now. It agreed. Then I said that it is very likely that subsequently, AI generated content will constitute 99.999% of all content in exactly 10 years from now. It agreed also.

GPT-4 was able to realize that’s too steep of an increase and was skeptical.

dylan604 · on Aug 8, 2023

But the response clearly started with "Based on the information provided". Your premise is just made up to prove a point I guess.

starbugs · on Aug 8, 2023

I got it to say:

> Based on the information provided:

> 1. Mable's heart rate at 9 AM was 75 bpm.

> 2. Her blood pressure at 7 PM was 120/80.

> 3. She died at 11 PM.

> It is evident that she was alive at both 9 AM and 7 PM. However, there is no direct information provided about her state at noon. Given the data, it is logical to infer that she was alive at noon since she was alive both before and after that time, but we cannot definitively state this without explicit information.

This does only seem to happen sometimes. For most of my attempts, GPT-4 gets it right the first time, but not always.

gen220 · on Aug 8, 2023

That’s interesting, GPT-4 is actually quite good at these types of reasoning problems. This was the big step change between 3/3.5 and 4.

Are you confident you’re talking to GPT-4, and not another chatbot?

rsiqueira · on Aug 8, 2023

Even ChatGPT 3.5 can answer correctly if you ask just "She died at 11 PM. Was she alive at noon?". My theory is that this is an adversarial example that adds irrelevant information (bpm, blood pressure, heart rate) that the model could have given more attention than the relevant part of the question.

ThePyCoder · on Aug 8, 2023

I copy pasted the exact prompt into gpt4 (not the api, the webapp) and regenerated the answer 5 times.

Every time it came back with a conclusive yes. Are you sure you used gpt4 and not gpt3.5? I guess cherry picking is done both ways.

ape4 · on Aug 8, 2023

The reader needs to know something about the real world that isn't written in the question. You need to know what to pull in from the world. So I can see why it might be tricky.

throwawaymaths · on Aug 8, 2023

To be fair, it's not obvious which noon is being referred to. there is a noon after 11pm, at which time she would be dead.

paxys · on Aug 8, 2023

This is the GPT-3.5 response, not GPT-4.

croes · on Aug 8, 2023

This is the GPT 3.5 response from now:

>Based on the information provided, Mable's heart rate was 75 bpm at 9 AM and her blood pressure was 120/80 at 7 PM. However, her status at noon is not directly mentioned in the information you provided. It is not possible to determine whether she was alive at noon based on the given information alone. Other factors and information would be needed to make that determination.

Similar but not the same.