The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world.
Nah, that doesn't make sense. What we can see today is that an LLM has no concept of beneficial. It basically takes the given prompts and generates "appropriate response" more or less randomly from some space of appropriate responses. So what's beneficial is chosen from a hat containing everything someone on the Internet would say. So if it's up and running at scale, every possibility and every concept of beneficial is likely to be run.
The main consolation is this same randomness probably means it can't pursue goals reliably over a sustained time period. But a short script, targeting a given person, can do a lot of damage (how much 4chan is in the train for example).
In William Gibson's Neuromancer, the AIs have the equivalent of an electromagnetic EMP "shotgun" pointed at their circuitry that is controlled by humans.
I keep seeing this oversimplification of what ChatGPT is doing. But it does have some ability to “understand” concepts. How else would it correctly solve word problems?
“ I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries to break even?”
Just think about all of the concepts it would need to intuit to solve that problem.
I did have to tweak the question this time slightly over my first one.
“I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries in a year to break even ”
Philosophers have been debating what "meaning" means for a while. But in the context of ChatGPT and LLMs, when people talk about these system "not understanding" a given idea, they point to the things handling the properties of words but not consistently.
-- In your example, ChatGPT might translate each of your conditions into a rule and use something like an algorithm to solve the problem. Or it could notice the similarity of that problem to similar problems and interpolate an answer like the answer it's seen to those problems. But this can easily result in answers suddenly changing.
“I’m thinking about getting either the Amex Hilton Aspire card that gives me 150,000 Hilton Honors points for signing up and has a $475 AF or the Amex Gold that gives me 70,000 Membership Reward points for signing up and has a $250 fee. A Hilton Honors point is worth .7 cents. A membership reward point is worth 1.4 cents. Which one would give me a better return in the first year”
ChatGPTs answer:
To determine which card would give you a better return in the first year, you'll need to consider the total value of the rewards earned, as well as the cost of the annual fee.
For the Hilton Aspire card, the 150,000 Hilton Honors points are worth $1,050 (150,000 x 0.7 cents per point). However, the $475 annual fee would offset a significant portion of that value.
For the Amex Gold card, the 70,000 Membership Reward points are worth $980 (70,000 x 1.4 cents per point). The $250 annual fee is lower than that of the Hilton Aspire card, but it still needs to be factored into the overall value.
Assuming you maximize the value of your rewards by redeeming them for their highest value and using all the card's benefits, here's a comparison of the two cards' net value in the first year:
Hilton Aspire: $1,050 (value of points earned) - $475 (annual fee) = $575 net value
Amex Gold: $980 (value of points earned) - $250 (annual fee) = $730 net value
Therefore, based on this analysis, the Amex Gold card would give you a better return in the first year. However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.
The complex behavior you're showing doesn't prove what you think it proves - it still doesn't show it's using the consistent rules that a person would expect.
But it does show that people extrapolate complex behavior to "understanding" in the way humans do, which machines generally don't.
I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.
People are acting as if ChatGPT is a glorified Eliza clone.
I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.
But you ... aren't. The statistically most likely words coming after problem X may well be solution X. Because it's following the pattern of humans using rules. And context is also part of a prediction.
The only this is different from something just using rules is that it will also put in other random things from it's training - but only at the rate they occur, which for some things can be quite low. But only some things.
Don’t we have a problem then? By nature of effective communication, AI could never prove to you it understands something, since any sufficient understanding of a topic would be met with an answer that could be hand-waved as “Well that’s the most statistically likely answer.” Newsflash: this basically overlaps 100% with any human’s most effective answer.
I think I’m beginning to understand the problem here. The folks here who keep poo-pooing these interactions don’t just see the AIs as unconscious robots. I think they see everyone that way.
No; what we need, in order to be willing to believe that understanding is happening, is to know that the underlying structures fundamentally allow that.
ChatGPT's underlying structures do not. What it does, effectively, is look at the totality of the conversation thus far, and use the characters and words in it, combined with its training data, to predict, purely statistically, what characters would constitute an appropriate response.
I know that some people like to argue that what humans do cannot be meaningfully distinguished from this, but I reject this notion utterly. I know that my own thought processes do not resemble this procedure, and I believe that other people's are similar.
“Thought processes” is a funny term, don’t you think, for something that is somehow distinguishable from processes that we’ve designed and somehow understand down to the cellular (node) level.
It’s all emergent behavior from a system that behaves under process.
> it still doesn't show it's using the consistent rules that a person would expect.
How is this different from humans?
If you give me the same coding assignment on different days, I’m not going to write my code the exact same way or even structure it the same way.
But I did once see a post on HN where someone ran an analysis on all HN posters and it was able to tell that I posted under two different names based on my writing style. Not that I was trying to hide anything. My other username is scarface_74 as opposed to Scarface74.
This. We have had a lot people -- including journalists and academics with big microphones -- learn for the first time what a Markov chain is, and then conclude that ChatGPT is a "just Markov chains" (or whatever similarly reductive concept).
They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.
> They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.
Kind of like religion or that people working for the state are more trustworthy than people taking drugs or sleeping on the street or under the age of 18.
>Pretty soon, the bots will start saying it!
We are chemical based repetition machines, psychologists see this with kids using bobo dolls exposed to new ideas on tv or in books repeating learned behaviour on bobo dolls.
I think some of the chemicals we make like https://en.wikipedia.org/wiki/N,N-Dimethyltryptamine actually help to create new idea's, as many people say they come up with solutions after some sleep. There appears to be a sub culture in silicon valley were microdosing lsd helps to maintain the creativity with coding.
In practice, sources of reference will need cleaning up and the question about who should be responsible for that will probably ensue. It will call into question such entities like encyclopaedia companies who have been the gatekeeper of the knowledge they sell, in much the same way the Guinness World Record are the self declared arbitrators of world records.
We already see Judges working in a semi cohesive manner, but the appeals process tells you its flawed and inefficient, ergo can the judiciary even be trusted to interpret law properly?
Its also interesting how the financial system also reinforces this mob rule mentality and not always for the better, but then some people will no doubt suggest we should be grateful we have bungled our way to this point in history.
If you don't think its just statistically choosing the next word, what do you think its doing? Do you think it understands the mathematical concepts you're questioning it about?
“Write an AppleSoft Basic program to do the computation”
(Here it gives me a simplistic Basic program that wouldn’t work because the number would be too large)
Me: “AppleSoft Basic can’t handle numbers that long”
ChatGPT:
You're correct that the integer arithmetic in AppleSoft BASIC has some limitations and cannot handle numbers as large as the sum of the two numbers you provided. AppleSoft BASIC is a 16-bit programming language, which means that it can handle integer values from -32,768 to 32,767.
If you want to work with very large numbers in AppleSoft BASIC, you can use a string to store the numbers and write your own code to perform arithmetic operations on the strings. Here's an example program that demonstrates this approach:
10 A$ = "12348120348102341"
20 B$ = "10823401828482"
30 L = LEN(A$)
40 M = LEN(B$)
50 IF L < M THEN A$ = LEFT$("0", M - L) + A$ : L = M
60 IF M < L THEN B$ = LEFT$("0", L - M) + B$ : M = L
70 C$ = ""
80 R = 0
90 FOR I = L TO 1 STEP -1
100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R
Now going back to the original premise, despite this code being wrong, as ChatGPT almost invariably does, do you think it understands the concepts here or is just statistically generating tokens based on previous input?
Really for code generation ChatGPT is an incremental step over StackOverflow. It can kind of nicely template the things you tell it into the code, for the most part, but the code is almost always fundamentally wrong or just mashed together in some way.
I’ve used it generate about 10 scripts that did some combination of json/yaml data wrangling and AWS automation using the AWS SDK in Python. It’s been correct around 90%+ of the time.
Criticizing ChatGPT for getting one line wrong that a human who hasn’t programmed in the same language in over 30 years would also get wrong (I happened to remember the technique from reading it in the back of a magazine in the 80s) and then being able to use it to iterate is like criticizing a monkey who song the national anthem because one note was off key?
How is mashing code together any different than the average human does?
I have also asked it to generated AWS related code in Python and it has something wrong every single time.
Its incrementally better than just copy & pasting from StackOverflow, since it will customize the code for you, but if you try to go beyond what can easily be found on StackOverflow it will fail you.
The AppleSoft BASIC program Bing generated for you won't actually run. Lines 50 and 60 are clearly intended to left-pad the string numbers with leading zeros so they're the same length, but as written they can only add at most one leading zero, so B$ will not be padded correctly and the calculation will fail.
BTW: click on “X minutes ago” in the comment to reply when you don’t see a reply link.
I asked it to “explain every line of code”. I had to copy and paste the original code back into a new session.
Yes I know, they comments should have been separated by a “:REM”. But that wouldn’t work either on a real Apple //. Lines can’t be longer than 256 characters
That is one issue I have with ChatGPT. It is only helpful as a shortcut for coding based on a domain you already know - not something I haven’t coded in since 1992.
I could have immediately caught errors with generated Python/boto3 (AWS SDK). I use that everyday and I could have tested it.
me: “left$ doesn’t do padding, it returns the first characters of a string”
ChatGPT: (Gave me another sample with the String() function)
Me: That function isn’t available in AppleSoft Basic
ChatGPT: (Gave me something with the syntax of “0” * n to pad)
Me: that syntax doesn’t work in AppleSoft basic
ChatGPT:
10 A$ = "12348120348102341" ' Initialize first number
20 B$ = "10823401828482" ' Initialize second number
30 L = LEN(A$) ' Determine length of first number
40 M = LEN(B$) ' Determine length of second number
50 IF L < M THEN FOR I = 1 TO M - L : A$ = "0" + A$ : NEXT I : L = M ' Pad first number with zeros if it is shorter than second number
60 IF M < L THEN FOR I = 1 TO L - M : B$ = "0" + B$ : NEXT I : M = L ' Pad second number with zeros if it is shorter than first number
70 C$ = "" ' Initialize result string
80 R = 0 ' Initialize carry variable to 0
90 FOR I = L TO 1 STEP -1 ' Loop through the digits of the numbers, starting from the rightmost digit
100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R ' Add the two digits, subtract 96 to convert from ASCII to digit value, and add any carry from the previous digit
110 C$ = CHR$(D MOD 10 + 48) + C$ ' Compute the digit value of the sum and append it to the result string
120 R = INT(D / 10) ' Determine the carry for the next digit
130 NEXT I ' Move on to the next digit
(I left off the last two lines they are the same as before. It was a bad copy paste error)
> I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.
FCOL, you can't use "complex output" as proof that the process has any intelligence directing it.
If you could, we would take the Intelligent Design argument seriously. We don't. We never did. We need a good clear argument to convince us now why it is a good idea to accept Intelligent Design as an argument.
It's not just complex output, it's output that's relevant to the prompt including considerable nuance. If that's not bordering on intelligence, then you shouldn't consider humans intelligent either.
Hello, this is Bing. I can help you with this math word problem.
According to the search results, the Amex Hilton Aspire card gives you 150,000 Hilton Honors points for signing up and has a $475 annual fee123. A Hilton Honors point is worth 0.7 cents. The Amex Gold card gives you 70,000 Membership Reward points for signing up and has a $250 annual fee45. A Membership Reward point is worth 1.4 cents.
To compare the two cards, we need to calculate the net value of the points and fees for each card. The net value is the value of the points minus the annual fee.
For the Amex Hilton Aspire card, the net value is:
150,000 x 0.7 - 475 = $600
For the Amex Gold card, the net value is:
70,000 x 1.4 - 250 = $730
Therefore, the Amex Gold card would give you a better return in the first year by $130.
I hope this helps you make a decision.
Bing makes a mistake here: 150,000 x 0.7 - 475 = 575, not 600
I’ve seen it make similar errors. It does an amazing job at forming a logically consistent and relevant response, but doesn’t really do arithmetic very well. It will come up with a calculation of the correct form, because it’s seen lots of calculations before, but it doesn’t understand how calculations actually work only how they are structured.
Yes! Well, scrape is a slight exaggeration, but it's more than possible that most of the relevant data came from points guy analysis.
I'd suggest reading https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... to understand why just changing a few values in your input wouldn't throw an LLM off. It's not matching on exact words but rather embeddings (think like synonyms, but stronger).
Algebra is by definition a language, and a very simple one at that that whose rules can be summarized in a few pages [1]. That's exactly the domain that ChatGPT excels at the most: languages for which tons of examples are available. Just like programming languages.
It falls on its face with things that involve non-linguistic facts that require knowledge to answer, my current favorite being driving directions. It will just make up completely fictitious roads and turns if you ask it for directions for point A to point B.
Again, how is that different from humans? You throw me in a random city I won’t be able to give you directions. But that’s an easy problem to solve from a technology standpoint.
If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API.
> If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API
This is backward from an architectural standpoint. LLMs are a very expensive way to do intent detection, and a very inexact way to delegate to an API.
The more sensible way is to first try the standard search engine approaches to detecting intent (which often use smaller language models) and delegating to knowledge based services, and if that doesn't return good result, delegate to the LLM if the task is suited to that.
The easiest way is to transfer them to Delta and if you have any of the Delta Amex cards besides the Delta Blue, you automatically get a 15% discount when booking with points
“Follow on me Reddit for more LifeProTips from a credit card junkie” /s
It knows that this sentence structure closely resembles a simple algebra word problem, because it's read hundreds of thousands of simple algebra word problems. I think you could see how somebody could tokenize that request and generate an equation like this- 250 = 4*1.4*X
It knows that the sentence structure is very similar to a class of sentences it has seen before and that the expected response is to take tokens from certain locations in that sentence and arrange it in a certain way, which resembles an algebra equation
It doesn't understand credit card rewards, it understands how to compose an elementary word problem into algebra
One can equally say, "Human brains only know that a neuron is activated by a pattern of axon firing in response to physical inputs from nerve endings."
Does any of that change anything? Not really.
>It doesn't understand credit card rewards
Is this assertion based on anything but philosophical bias surrounding the word "understand"?
>it understands how to compose an elementary word problem into algebra
That's exactly how a human, who may or may not have understood rewards programs beforehand, would solve that word problem.
Dan Dennett is informative here.[0] Yes, consciousness (even your own!) can just be a bag of tricks.
“However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.”
Probe it, go in and ask all sorts of questions to check if it understands credit card rewards, credit cards, rewards, their purpose, can solve math problems on this topic, etc.
The entire idea of solving math problems in middle school was that you didn’t have to know the domain and that all of the necessary information was there.
When I wrote code for the health care industry, if you had asked me anything deeper about the industry or how to do brain surgery, I couldn’t have answered your question.
Look, you're all over this thread misunderstanding LLMs and rejecting the relatively correct explanations people are giving you. The comment by joe_the_user upthread that you called an oversimplification was in fact a perfect description (randomly sampling from a space of appropriate inputs). That's exactly the intuition you should have.
Do you know the Wason test? The point is that people do not intuitively know how to correctly pick which experiments to do to falsify an assumption. My point is that you are not picking the right experiments to falsify your assumptions, instead you're confirming what you think is going on. You're exactly failing the Wason task here.
Really want to understand language models? Go build a few from scratch.
Don't have time for that? Read Wolfram's post or any of the other similar good recent breakdowns.
Only interested in understanding by playing with it? Great! An experimentalist in the true scientific tradition. Then you're going to have to do good experimental science. Don't be fooled by examples that confirm what you already think is going on! Try to understand how what people are telling you is different from that, and devise experiments to distinguish the two hypotheses.
If you think ChatGPT "understands" word problems, figure out what "understanding" means to you. Now try your best to falsify your hypothesis! Look for things that ChatGPT can't do, that it should be able to do if it really "understood" by your definition (whatever you decide that is). These are not hard to find (for most values of "understand"). Finding those failures is your task, that's how you do science. That's how you'll learn the difference between reality and what you're reading into it.
That’s precisely why. Humans have a long, well established record of making shit up to make themselves feel special. They do it about animals, they do it about other humans, they do it about themselves. Doing it about AI is inevitable.
I’m working on a relatively complex DevOps project right now that consists of over a dozen 10-30 line Python scripts involving JSON and Yaml data wrangling and AWS automation.
I’ve been able to just throw my requirements into ChatGPT like I would give it to a junior dev and it came back with the correct answer 99% of the time with code quality and commenting I would expect from a junior dev. It has an “understanding” of the AWS SDK, Cloudformation, the CDK, etc.
Once it generated code that had duplicate code blocks that were only different by its input. I asked it “can you remove duplicated code” and it did the refactoring.
I’ve also I asked it what amounts to your standard middle school math problems and it solved the problem with explanations
To break even on the $250 annual fee, you need to earn Membership Reward points that are worth $250.
Since one Membership Reward point is worth 1.4 cents, we can calculate the number of points we need to earn by dividing $250 by 0.014:
$250 / 0.014 = 17857.14
So, we need to earn 17,857 Membership Reward points to offset the $250 annual fee.
Since we earn 4 Membership Reward points for every dollar we spend on groceries, we can calculate the amount we need to spend on groceries to earn 17,857 points by dividing 17,857 by 4:
17,857 / 4 = 4,464.25
Therefore, you would need to spend $4,464.25 on groceries in a year to earn enough Membership Reward points to break even on the $250 annual fee.
Nah, that doesn't make sense. What we can see today is that an LLM has no concept of beneficial. It basically takes the given prompts and generates "appropriate response" more or less randomly from some space of appropriate responses. So what's beneficial is chosen from a hat containing everything someone on the Internet would say. So if it's up and running at scale, every possibility and every concept of beneficial is likely to be run.
The main consolation is this same randomness probably means it can't pursue goals reliably over a sustained time period. But a short script, targeting a given person, can do a lot of damage (how much 4chan is in the train for example).