Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world.

Nah, that doesn't make sense. What we can see today is that an LLM has no concept of beneficial. It basically takes the given prompts and generates "appropriate response" more or less randomly from some space of appropriate responses. So what's beneficial is chosen from a hat containing everything someone on the Internet would say. So if it's up and running at scale, every possibility and every concept of beneficial is likely to be run.

The main consolation is this same randomness probably means it can't pursue goals reliably over a sustained time period. But a short script, targeting a given person, can do a lot of damage (how much 4chan is in the train for example).



In William Gibson's Neuromancer, the AIs have the equivalent of an electromagnetic EMP "shotgun" pointed at their circuitry that is controlled by humans.


That sounds like the stop button problem to me.

AI "Stop Button" Problem - Computerphile https://www.youtube.com/watch?v=3TYT1QfdfsM


That's an evasive that-could-never-happen-to-me argument.

> generates "appropriate response" more or less randomly from some space of appropriate responses

try to avoid saying that about your favorite serial killer's brain.


I keep seeing this oversimplification of what ChatGPT is doing. But it does have some ability to “understand” concepts. How else would it correctly solve word problems?

“ I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries to break even?”

Just think about all of the concepts it would need to intuit to solve that problem.


> How else would it correctly solve word problems?

"To break even on the annual fee, you would need to earn rewards points that are worth at least $250.

Since you earn 4 Membership Rewards points for every dollar you spend on groceries, you would earn 4 * $1 = 4 points for every dollar you spend.

To find out how much you need to spend to earn at least $250 worth of rewards points, you can set up an equation:

4 points/dollar * x dollars = $250

where x is the amount you need to spend. Solving for x, we get:

x = $250 / (4 points/dollar) = $62.50

Therefore, you would need to spend $62.50 on groceries to earn enough rewards points to break even on the $250 annual fee."

Well, I guess it's going to take a third option: solve the word problem incorrectly.


I did have to tweak the question this time slightly over my first one.

“I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries in a year to break even ”


Philosophers have been debating what "meaning" means for a while. But in the context of ChatGPT and LLMs, when people talk about these system "not understanding" a given idea, they point to the things handling the properties of words but not consistently.

-- In your example, ChatGPT might translate each of your conditions into a rule and use something like an algorithm to solve the problem. Or it could notice the similarity of that problem to similar problems and interpolate an answer like the answer it's seen to those problems. But this can easily result in answers suddenly changing.


Another example:

“I’m thinking about getting either the Amex Hilton Aspire card that gives me 150,000 Hilton Honors points for signing up and has a $475 AF or the Amex Gold that gives me 70,000 Membership Reward points for signing up and has a $250 fee. A Hilton Honors point is worth .7 cents. A membership reward point is worth 1.4 cents. Which one would give me a better return in the first year”

ChatGPTs answer:

To determine which card would give you a better return in the first year, you'll need to consider the total value of the rewards earned, as well as the cost of the annual fee.

For the Hilton Aspire card, the 150,000 Hilton Honors points are worth $1,050 (150,000 x 0.7 cents per point). However, the $475 annual fee would offset a significant portion of that value.

For the Amex Gold card, the 70,000 Membership Reward points are worth $980 (70,000 x 1.4 cents per point). The $250 annual fee is lower than that of the Hilton Aspire card, but it still needs to be factored into the overall value.

Assuming you maximize the value of your rewards by redeeming them for their highest value and using all the card's benefits, here's a comparison of the two cards' net value in the first year:

Hilton Aspire: $1,050 (value of points earned) - $475 (annual fee) = $575 net value Amex Gold: $980 (value of points earned) - $250 (annual fee) = $730 net value Therefore, based on this analysis, the Amex Gold card would give you a better return in the first year. However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.


The complex behavior you're showing doesn't prove what you think it proves - it still doesn't show it's using the consistent rules that a person would expect.

But it does show that people extrapolate complex behavior to "understanding" in the way humans do, which machines generally don't.


I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.

People are acting as if ChatGPT is a glorified Eliza clone.


I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.

But you ... aren't. The statistically most likely words coming after problem X may well be solution X. Because it's following the pattern of humans using rules. And context is also part of a prediction.

The only this is different from something just using rules is that it will also put in other random things from it's training - but only at the rate they occur, which for some things can be quite low. But only some things.


Don’t we have a problem then? By nature of effective communication, AI could never prove to you it understands something, since any sufficient understanding of a topic would be met with an answer that could be hand-waved as “Well that’s the most statistically likely answer.” Newsflash: this basically overlaps 100% with any human’s most effective answer.

I think I’m beginning to understand the problem here. The folks here who keep poo-pooing these interactions don’t just see the AIs as unconscious robots. I think they see everyone that way.


No; what we need, in order to be willing to believe that understanding is happening, is to know that the underlying structures fundamentally allow that.

ChatGPT's underlying structures do not. What it does, effectively, is look at the totality of the conversation thus far, and use the characters and words in it, combined with its training data, to predict, purely statistically, what characters would constitute an appropriate response.

I know that some people like to argue that what humans do cannot be meaningfully distinguished from this, but I reject this notion utterly. I know that my own thought processes do not resemble this procedure, and I believe that other people's are similar.


“Thought processes” is a funny term, don’t you think, for something that is somehow distinguishable from processes that we’ve designed and somehow understand down to the cellular (node) level.

It’s all emergent behavior from a system that behaves under process.


> it still doesn't show it's using the consistent rules that a person would expect.

How is this different from humans?

If you give me the same coding assignment on different days, I’m not going to write my code the exact same way or even structure it the same way.

But I did once see a post on HN where someone ran an analysis on all HN posters and it was able to tell that I posted under two different names based on my writing style. Not that I was trying to hide anything. My other username is scarface_74 as opposed to Scarface74.


This. We have had a lot people -- including journalists and academics with big microphones -- learn for the first time what a Markov chain is, and then conclude that ChatGPT is a "just Markov chains" (or whatever similarly reductive concept).

They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.

Pretty soon, the bots will start saying it!


> They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.

Kind of like religion or that people working for the state are more trustworthy than people taking drugs or sleeping on the street or under the age of 18.

>Pretty soon, the bots will start saying it!

We are chemical based repetition machines, psychologists see this with kids using bobo dolls exposed to new ideas on tv or in books repeating learned behaviour on bobo dolls.

I think some of the chemicals we make like https://en.wikipedia.org/wiki/N,N-Dimethyltryptamine actually help to create new idea's, as many people say they come up with solutions after some sleep. There appears to be a sub culture in silicon valley were microdosing lsd helps to maintain the creativity with coding.

It would seem logical for the bots to start saying it. If the bots start amplifying flawed knowledge like a lot of Reddit content or Facebook content, the internet will need to deal with the corruption of the internet, like using Wikipedia as a source of reference. https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wi... https://en.wikipedia.org/wiki/Reliability_of_Wikipedia https://en.wikipedia.org/wiki/List_of_Wikipedia_controversie...

In practice, sources of reference will need cleaning up and the question about who should be responsible for that will probably ensue. It will call into question such entities like encyclopaedia companies who have been the gatekeeper of the knowledge they sell, in much the same way the Guinness World Record are the self declared arbitrators of world records.

We already see Judges working in a semi cohesive manner, but the appeals process tells you its flawed and inefficient, ergo can the judiciary even be trusted to interpret law properly?

Its also interesting how the financial system also reinforces this mob rule mentality and not always for the better, but then some people will no doubt suggest we should be grateful we have bungled our way to this point in history.


If you don't think its just statistically choosing the next word, what do you think its doing? Do you think it understands the mathematical concepts you're questioning it about?


This is a better example I posted

https://news.ycombinator.com/item?id=34811353


I haven't tried Bing but ask ChatGPT "what is 12348120348102341 + 10823401828482?"

Bonus points: If you ask Google Search this, it knows the right answer.


Now try this with Google:

“Write an AppleSoft Basic program to do the computation”

(Here it gives me a simplistic Basic program that wouldn’t work because the number would be too large)

Me: “AppleSoft Basic can’t handle numbers that long”

ChatGPT:

You're correct that the integer arithmetic in AppleSoft BASIC has some limitations and cannot handle numbers as large as the sum of the two numbers you provided. AppleSoft BASIC is a 16-bit programming language, which means that it can handle integer values from -32,768 to 32,767.

If you want to work with very large numbers in AppleSoft BASIC, you can use a string to store the numbers and write your own code to perform arithmetic operations on the strings. Here's an example program that demonstrates this approach:

10 A$ = "12348120348102341"

20 B$ = "10823401828482"

30 L = LEN(A$)

40 M = LEN(B$)

50 IF L < M THEN A$ = LEFT$("0", M - L) + A$ : L = M

60 IF M < L THEN B$ = LEFT$("0", L - M) + B$ : M = L

70 C$ = ""

80 R = 0

90 FOR I = L TO 1 STEP -1

100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R

110 C$ = CHR$(D MOD 10 + 48) + C$

120 R = INT(D / 10)

130 NEXT I

140 IF R > 0 THEN C$ = CHR$(R + 48) + C$

150 PRINT C$


Now going back to the original premise, despite this code being wrong, as ChatGPT almost invariably does, do you think it understands the concepts here or is just statistically generating tokens based on previous input?

Really for code generation ChatGPT is an incremental step over StackOverflow. It can kind of nicely template the things you tell it into the code, for the most part, but the code is almost always fundamentally wrong or just mashed together in some way.


I’ve used it generate about 10 scripts that did some combination of json/yaml data wrangling and AWS automation using the AWS SDK in Python. It’s been correct around 90%+ of the time.

Criticizing ChatGPT for getting one line wrong that a human who hasn’t programmed in the same language in over 30 years would also get wrong (I happened to remember the technique from reading it in the back of a magazine in the 80s) and then being able to use it to iterate is like criticizing a monkey who song the national anthem because one note was off key?

How is mashing code together any different than the average human does?


I have also asked it to generated AWS related code in Python and it has something wrong every single time.

Its incrementally better than just copy & pasting from StackOverflow, since it will customize the code for you, but if you try to go beyond what can easily be found on StackOverflow it will fail you.


The AppleSoft BASIC program Bing generated for you won't actually run. Lines 50 and 60 are clearly intended to left-pad the string numbers with leading zeros so they're the same length, but as written they can only add at most one leading zero, so B$ will not be padded correctly and the calculation will fail.


BTW: click on “X minutes ago” in the comment to reply when you don’t see a reply link.

I asked it to “explain every line of code”. I had to copy and paste the original code back into a new session.

Yes I know, they comments should have been separated by a “:REM”. But that wouldn’t work either on a real Apple //. Lines can’t be longer than 256 characters

That is one issue I have with ChatGPT. It is only helpful as a shortcut for coding based on a domain you already know - not something I haven’t coded in since 1992.

I could have immediately caught errors with generated Python/boto3 (AWS SDK). I use that everyday and I could have tested it.

me: “left$ doesn’t do padding, it returns the first characters of a string”

ChatGPT: (Gave me another sample with the String() function)

Me: That function isn’t available in AppleSoft Basic

ChatGPT: (Gave me something with the syntax of “0” * n to pad)

Me: that syntax doesn’t work in AppleSoft basic

ChatGPT:

10 A$ = "12348120348102341" ' Initialize first number

20 B$ = "10823401828482" ' Initialize second number

30 L = LEN(A$) ' Determine length of first number

40 M = LEN(B$) ' Determine length of second number

50 IF L < M THEN FOR I = 1 TO M - L : A$ = "0" + A$ : NEXT I : L = M ' Pad first number with zeros if it is shorter than second number

60 IF M < L THEN FOR I = 1 TO L - M : B$ = "0" + B$ : NEXT I : M = L ' Pad second number with zeros if it is shorter than first number

70 C$ = "" ' Initialize result string

80 R = 0 ' Initialize carry variable to 0

90 FOR I = L TO 1 STEP -1 ' Loop through the digits of the numbers, starting from the rightmost digit

100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R ' Add the two digits, subtract 96 to convert from ASCII to digit value, and add any carry from the previous digit

110 C$ = CHR$(D MOD 10 + 48) + C$ ' Compute the digit value of the sum and append it to the result string

120 R = INT(D / 10) ' Determine the carry for the next digit 130 NEXT I ' Move on to the next digit

(I left off the last two lines they are the same as before. It was a bad copy paste error)


You linked to your parent comment?


My bad, I had two examples going on at once. I thought this thread was to the simpler one.


> I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.

FCOL, you can't use "complex output" as proof that the process has any intelligence directing it.

If you could, we would take the Intelligent Design argument seriously. We don't. We never did. We need a good clear argument to convince us now why it is a good idea to accept Intelligent Design as an argument.


It's not just complex output, it's output that's relevant to the prompt including considerable nuance. If that's not bordering on intelligence, then you shouldn't consider humans intelligent either.


> it's output that's relevant to the prompt including considerable nuance.

You can say the same thing about Intelligent Design, and yet we dismiss it anyway.


We didn't dismiss intelligent design, we replaced it with a more parsimonious theory that better explained the evidence. Big difference.


Hello, this is Bing. I can help you with this math word problem.

According to the search results, the Amex Hilton Aspire card gives you 150,000 Hilton Honors points for signing up and has a $475 annual fee123. A Hilton Honors point is worth 0.7 cents. The Amex Gold card gives you 70,000 Membership Reward points for signing up and has a $250 annual fee45. A Membership Reward point is worth 1.4 cents.

To compare the two cards, we need to calculate the net value of the points and fees for each card. The net value is the value of the points minus the annual fee.

For the Amex Hilton Aspire card, the net value is:

150,000 x 0.7 - 475 = $600

For the Amex Gold card, the net value is:

70,000 x 1.4 - 250 = $730

Therefore, the Amex Gold card would give you a better return in the first year by $130.

I hope this helps you make a decision.

Bing makes a mistake here: 150,000 x 0.7 - 475 = 575, not 600


I’ve seen it make similar errors. It does an amazing job at forming a logically consistent and relevant response, but doesn’t really do arithmetic very well. It will come up with a calculation of the correct form, because it’s seen lots of calculations before, but it doesn’t understand how calculations actually work only how they are structured.


The same question in ChatGPT gave the correct answer.


That’s true, because ChatGPT has been extensively coached to do better on a lot of the tasks GPT3 was criticised for doing poorly on.


This reads like a standard analysis done by the “points guy” every year. I suspect this is more or less scraped from his nevof those articles.


So it scraped it based on my own point valuations?


Yes! Well, scrape is a slight exaggeration, but it's more than possible that most of the relevant data came from points guy analysis.

I'd suggest reading https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... to understand why just changing a few values in your input wouldn't throw an LLM off. It's not matching on exact words but rather embeddings (think like synonyms, but stronger).


I’ve been able to throw almost any random pre algebra problem at it and it got it right.

But how is this any different than how the average high schooler studies for the SAT? You study enough problems and you recognize similarities?


Algebra is by definition a language, and a very simple one at that that whose rules can be summarized in a few pages [1]. That's exactly the domain that ChatGPT excels at the most: languages for which tons of examples are available. Just like programming languages.

It falls on its face with things that involve non-linguistic facts that require knowledge to answer, my current favorite being driving directions. It will just make up completely fictitious roads and turns if you ask it for directions for point A to point B.

1. http://faculty.ung.edu/mgoodroe/PriorCourses/Math_0999_Gener...


Again, how is that different from humans? You throw me in a random city I won’t be able to give you directions. But that’s an easy problem to solve from a technology standpoint.

If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API.


> If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API

This is backward from an architectural standpoint. LLMs are a very expensive way to do intent detection, and a very inexact way to delegate to an API.

The more sensible way is to first try the standard search engine approaches to detecting intent (which often use smaller language models) and delegating to knowledge based services, and if that doesn't return good result, delegate to the LLM if the task is suited to that.


Gosh I wish membership reward points were actually worth 1.4c...


The easiest way is to transfer them to Delta and if you have any of the Delta Amex cards besides the Delta Blue, you automatically get a 15% discount when booking with points

“Follow on me Reddit for more LifeProTips from a credit card junkie” /s


It doesn't even "understand" basic math - trivial to test if you give it a sufficiently unique expression (e.g. 43829583 * 5373271).


It knows that this sentence structure closely resembles a simple algebra word problem, because it's read hundreds of thousands of simple algebra word problems. I think you could see how somebody could tokenize that request and generate an equation like this- 250 = 4*1.4*X

And then all that's left is to solve for X


>It knows that...

Isn't affirming this capacity for knowing exactly GP's point?

Our own capacity for 'knowing' is contingent on real-world examples too, so I don't think that can be a disqualifier.

Jeremy Narby delivers a great talk on our tendency to discount 'intelligence' or 'knowledge' in non-human entities.[0]

[0] https://youtu.be/uGMV6IJy1Oc


It knows that the sentence structure is very similar to a class of sentences it has seen before and that the expected response is to take tokens from certain locations in that sentence and arrange it in a certain way, which resembles an algebra equation

It doesn't understand credit card rewards, it understands how to compose an elementary word problem into algebra


One can equally say, "Human brains only know that a neuron is activated by a pattern of axon firing in response to physical inputs from nerve endings."

Does any of that change anything? Not really.

>It doesn't understand credit card rewards

Is this assertion based on anything but philosophical bias surrounding the word "understand"?

>it understands how to compose an elementary word problem into algebra

That's exactly how a human, who may or may not have understood rewards programs beforehand, would solve that word problem.

Dan Dennett is informative here.[0] Yes, consciousness (even your own!) can just be a bag of tricks.

[0] https://youtu.be/fjbWr3ODbAo?t=25s


Isn’t that the point? Isn’t that the same thing we test middle schoolers on in pre-algebra classes?

My more complicated question comparing the Amex Hilton Aspire to the Amex Gold was inspired by this question asked over at r/creditcards

https://www.reddit.com/r/CreditCards/comments/112udp3/what_i...


Check out my other reply

https://news.ycombinator.com/item?id=34811353

The relevant part

“However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.”


> It doesn't understand credit card rewards

Probe it, go in and ask all sorts of questions to check if it understands credit card rewards, credit cards, rewards, their purpose, can solve math problems on this topic, etc.


Examples? I’m giving questions that I usually see in r/creditcards.


Then don't. Instead of breathlessly trying to prove your theory, try and do some science by falsifying it. (cf. Wason test)

Think of things it would get right only if it truly understood, not "common questions on reddit".


The entire idea of solving math problems in middle school was that you didn’t have to know the domain and that all of the necessary information was there.

When I wrote code for the health care industry, if you had asked me anything deeper about the industry or how to do brain surgery, I couldn’t have answered your question.


You're still trying to prove your position.

Look, you're all over this thread misunderstanding LLMs and rejecting the relatively correct explanations people are giving you. The comment by joe_the_user upthread that you called an oversimplification was in fact a perfect description (randomly sampling from a space of appropriate inputs). That's exactly the intuition you should have.

Do you know the Wason test? The point is that people do not intuitively know how to correctly pick which experiments to do to falsify an assumption. My point is that you are not picking the right experiments to falsify your assumptions, instead you're confirming what you think is going on. You're exactly failing the Wason task here.

Really want to understand language models? Go build a few from scratch.

Don't have time for that? Read Wolfram's post or any of the other similar good recent breakdowns.

Only interested in understanding by playing with it? Great! An experimentalist in the true scientific tradition. Then you're going to have to do good experimental science. Don't be fooled by examples that confirm what you already think is going on! Try to understand how what people are telling you is different from that, and devise experiments to distinguish the two hypotheses.

If you think ChatGPT "understands" word problems, figure out what "understanding" means to you. Now try your best to falsify your hypothesis! Look for things that ChatGPT can't do, that it should be able to do if it really "understood" by your definition (whatever you decide that is). These are not hard to find (for most values of "understand"). Finding those failures is your task, that's how you do science. That's how you'll learn the difference between reality and what you're reading into it.


I am not trying to anthropomorphize anything.

But to say it’s just a glorified search engine is minimizing the impact it and the likes of it can have on entire industries.

It’s already obviated the need for me to find a junior dev to farm off some scripts that I really didn’t have time to write


Then why does a puzzle like that count towards my childs 'reading comprehension skills' score on a test?

Rules for thee but not for me?


That’s precisely why. Humans have a long, well established record of making shit up to make themselves feel special. They do it about animals, they do it about other humans, they do it about themselves. Doing it about AI is inevitable.


I’m working on a relatively complex DevOps project right now that consists of over a dozen 10-30 line Python scripts involving JSON and Yaml data wrangling and AWS automation.

I’ve been able to just throw my requirements into ChatGPT like I would give it to a junior dev and it came back with the correct answer 99% of the time with code quality and commenting I would expect from a junior dev. It has an “understanding” of the AWS SDK, Cloudformation, the CDK, etc.

Once it generated code that had duplicate code blocks that were only different by its input. I asked it “can you remove duplicated code” and it did the refactoring.

I’ve also I asked it what amounts to your standard middle school math problems and it solved the problem with explanations


I'm not sure if I should be scared or impressed. Or both.


Both. We live in interesting times.


This is the actual answer it gave.

——

To break even on the $250 annual fee, you need to earn Membership Reward points that are worth $250.

Since one Membership Reward point is worth 1.4 cents, we can calculate the number of points we need to earn by dividing $250 by 0.014:

$250 / 0.014 = 17857.14

So, we need to earn 17,857 Membership Reward points to offset the $250 annual fee.

Since we earn 4 Membership Reward points for every dollar we spend on groceries, we can calculate the amount we need to spend on groceries to earn 17,857 points by dividing 17,857 by 4:

17,857 / 4 = 4,464.25

Therefore, you would need to spend $4,464.25 on groceries in a year to earn enough Membership Reward points to break even on the $250 annual fee.


If you give it the right prompt, it'll give you back this:

{ thunk: "(async function(query, dispatch){dispatch({type: 'compute'});const fee = 250;const pointValue = 0.014;const breakEven = fee / (pointValue * 4);dispatch({type: 'compute_response'});return {answer: breakEven, solvedProblems: [], computed: true, query: false};})(query, dispatch)", en: 'You need to spend ${answer} on groceries to break even.' }




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: