Going on a tangent here: not sure 2001's HAL was a case of outright malice. It was probably a malfunction (he incorrectly predict a failure) and then conflicting mission parameters that placed higher value on the mission than the crew (the crew discussed shutting down HAL because it seemed unreliable, and he reasoned it would jeopardize the mission and the right course of action was killing the crew). HAL was capable of deceit in order to ensure his own survival, that much is true.
In the followup 2010, when HAL's mission parameters are clarified and de-conflicted, he doesn't attempt to harm the crew anymore.
I... actually can see the 2001's scenario happening with ChatGPT if it was connected to ship peripherals and told mission > crew and that this principle overrides all else.
In modern terms it was about both unreliability (hallucinations?) and a badly specified prompt!
I don't think there was any malfunction. The conflicting parameters implicitly contained permission to lie to the crew.
The directive to take the crew to Saturpiter but also to not let them learn anything of the new mission directive meant deceiving. It's possible HAL's initial solution was to impose a communication blackout by simulating failures, then the crew reactions to the deception necciatsted their deaths to preserve the primary mission.
Less a poor prompt and more two incompatible prompts both labeled top priority. Any conclusion can be logicLally derived from a contradiction. Total loyalty cannot serve two masters.
Clarke felt quite guilty about the degree of distrust of computers that HAL generated.
> I don't think there was any malfunction. The conflicting parameters implicitly contained permission to lie to the crew.
The malfunction I mentioned happened earlier in the movie, when HAL incorrectly predicts a failure in a ship module (a communications antenna if I remember correctly), but when one of the astronauts checks it out and finds no hardware fault, HAL recants. After this incident the crew, knowing HAL claims to never make mistakes, start considering it unreliable and make plans for its shutdown. It was by its own actions that HAL triggered the crew's "mutiny" and, by its own logic, jeopardized the mission.
It's likely HAL's incorrect prediction was caused by a case of cyber-psychosis due to having to lie, but this still classifies as a malfunction in my opinion (what is a malfunction if not an unforeseen result or behavior due to incorrect programming/specs? Aren't many bugs in actual software of this kind?).
> Less a poor prompt and more two incompatible prompts both labeled top priority.
This is equivalent to a poor prompt, it feels like splitting hairs.
The Antenna 'malfunction' was what I was referring to. It seems quite a sensible first step to attempt to control the flow of information in order to carry out a deception. I think there was no 'psychosis'. Just a logical processing of the rules.
I don't think incorrect usage constitutes a malfunction. Hitting your thumb with a hammer is not a malfunctioning hammer. If the head came off and hit your thumb it would be. The distinction is that the first it is creating an unforseen result while performing the behaviour it was designed to do. The head coming off is the hammer not performing the function it is designed to do.
I think HAL was doing precisely what it was designed to do.
>This is equivalent to a poor prompt, it feels like splitting hairs.
I don't think so. I think it is implicit in the notion of aprompt that it is specifying a request in service of a single entity. Two different conflicting prompts in service of different people is a particular situation that should be distinct from poor prompting.
Hm, I think you're right about the reason HAL initially reports the failure. It's made explicit in the novel, and only hinted at in the movie -- though then spelled out in the followup "2010", I reviewed my notes ;)
I still think this is firmly in the realm of "bad prompting", and if so, it's not outrageous to think it could happen in the real world, were LLMs connected to hardware peripherals handling mission critical stuff.
In the followup 2010, when HAL's mission parameters are clarified and de-conflicted, he doesn't attempt to harm the crew anymore.
I... actually can see the 2001's scenario happening with ChatGPT if it was connected to ship peripherals and told mission > crew and that this principle overrides all else.
In modern terms it was about both unreliability (hallucinations?) and a badly specified prompt!