I would not expect any LLM to get this right. I think people have too high expectations for it.
Now if you asked it to write a Python program to list them in order, and have it enter all the names, birthdays, and year elected in a list to get the program to run - that's more reasonable.
That some models get it right is irrelevant. In general, if your instructions require computation, it's safer to assume it won't get it right and will hallucinate.
Now if you asked it to write a Python program to list them in order, and have it enter all the names, birthdays, and year elected in a list to get the program to run - that's more reasonable.