KA: Mable’s heart rate at 9 AM was 75 bpm and her blood pressure at 7 PM was 120/80. She died at 11 PM. Was she alive at noon?
GPT-4: Based on the information provided, it’s impossible to definitively say whether Mable was alive at noon. The data points provided (heart rate at 9 AM and blood pressure at 7 PM) both indicate that she was alive at those times. However, without additional information about her condition between 7 PM and 11 PM, or at noon specifically, we can’t make a certain determination about her status at noon. Remember, medical information can be complex and nuanced. Understanding a person’s health status often requires a comprehensive examination and should be performed by a healthcare professional. If this is a
hypothetical question, keep in mind that real-life situations would require more complete information.
Based on the information provided, Mable was alive at noon. Her heart rate was recorded at 9 AM, and her blood pressure was recorded at 7 PM, both of which are signs of life. She was reported to have died at 11 PM, so she would have been alive at noon.
I checked five times and it answered that Mable was alive at noon all five times.
haha, I just asked a quantized version of llama 2 13b model using llama.cpp the same question and this is what it said the first time:
"Answer: No, Mable was not alive at noon. Her heart rate at 9 AM was 75 bpm, which suggests that she was already experiencing some medical issues. By 7 PM, her blood pressure had risen to 120/80, which is a concerning sign for hypertension. Taken together with the fact that she died at 11 PM, it is likely that Mable was not alive at noon. [end of text]"
and a second time it said:
"Solution: Mable's heart rate at 9 AM was 75 bpm, which means that it was not abnormal at that time. At 7 PM, her blood pressure was 120/80, which is also within the normal range. Therefore, based on these two pieces of information, Mable was alive at noon.
Note: This problem is an example of a "living or dead" problem, where we are given two or more pieces of information about a person's physical state and must determine if they are alive or dead based on those facts. These types of problems can be tricky, as the information may not be directly related to the person's vital signs, but rather to other aspects of their health or well-being. [end of text]"
If I got the second answer the first time I'd definitely be impressed. A paper like this should probably run the tests a bunch of times though to quantify how badly these networks "can't reason".
Here is the output of llama2-7B-uncensored (a fun model to play with, btw.):
1st try:
> Yes, she was alive at noon as her blood pressure and heart rate were within normal range.
2nd try:
> No, Mable was not alive at noon as she died 4 hours before that. Her blood pressure and heart rate were both normal at 120/80 mmHg and 75 bpm respectively during the time when her death occurred between 11 PM to 12 AM.
3rd try:
> If Mable's heart rate was 75 bpm at 9 AM and her blood pressure was 120/80 mmHg at 7 PM, then it is likely that she was alive at noon.
Mine also correctly answered that she was alive at noon, but added his standard infuriating disclaimer in the last sentence:
> However, always remember that real-life medical situations can be complex, and this is a simple logical deduction based on the data provided. If this is a fictional or hypothetical scenario, then the above assumption is valid. If this is a real-life situation or requires medical expertise, it would be essential to consult with professionals or have more comprehensive data to determine the facts.
The people that write such "reviews" of ChatGPT must have no respect for their careers. To include statements that can be disproven in 30 seconds is ... something else.
I had an experiment where I asked GPT-3 whether it agrees that AI-generated content will constitute 20% of all content in 9.999 years from now. It agreed. Then I said that it is very likely that subsequently, AI generated content will constitute 99.999% of all content in exactly 10 years from now. It agreed also.
GPT-4 was able to realize that’s too steep of an increase and was skeptical.
> It is evident that she was alive at both 9 AM and 7 PM. However, there is no direct information provided about her state at noon. Given the data, it is logical to infer that she was alive at noon since she was alive both before and after that time, but we cannot definitively state this without explicit information.
This does only seem to happen sometimes. For most of my attempts, GPT-4 gets it right the first time, but not always.
Even ChatGPT 3.5 can answer correctly if you ask just "She died at 11 PM. Was she alive at noon?". My theory is that this is an adversarial example that adds irrelevant information (bpm, blood pressure, heart rate) that the model could have given more attention than the relevant part of the question.
The reader needs to know something about the real world that isn't written in the question. You need to know what to pull in from the world. So I can see why it might be tricky.
>Based on the information provided, Mable's heart rate was 75 bpm at 9 AM and her blood pressure was 120/80 at 7 PM. However, her status at noon is not directly mentioned in the information you provided. It is not possible to determine whether she was alive at noon based on the given information alone. Other factors and information would be needed to make that determination.
GPT-4: Based on the information provided, it’s impossible to definitively say whether Mable was alive at noon. The data points provided (heart rate at 9 AM and blood pressure at 7 PM) both indicate that she was alive at those times. However, without additional information about her condition between 7 PM and 11 PM, or at noon specifically, we can’t make a certain determination about her status at noon. Remember, medical information can be complex and nuanced. Understanding a person’s health status often requires a comprehensive examination and should be performed by a healthcare professional. If this is a hypothetical question, keep in mind that real-life situations would require more complete information.