Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not just the ability to correctly answer this, but the consistency.

I asked this exact question to the `oasst-sft-6-llama-30b` model and it was able to consistently get the correct answer. I then tried the smaller `vicuna-7b` model, and while it usually gave the correct answer, there was the occasional miss.

Interestingly, `oasst-sft-6-llama-30b`'s ability to answer correctly seems to be fairly stable across multiple configurations. I tried various temperature settings from 0.2 up to 1.2, different topP configs, and they all answered correctly.



reminds me of voice recognition

On the one hand, the problem was nearly "solved" in early 2000's, getting to 95% accuracy. But the daily of experience of using something that makes mistakes at that rate is infuriating and well and truly outside of consideration for putting into any kind of critical pathway. So it's a combination of how difficult it is to close the last few percentage points of accuracy with how important they are for most of the high value use cases.

For the forseeable future I see most use of this tech coming from applications where it aids humans and/or checks their decisions rather than running solo.


> Computer, earl grey, hot.

I think we're getting closer to something like this, out of Star Trek. Even in Star Trek, AI did not take over critical functions - but rather assisted the crew in manning the starship.


I’ve never understood why voice recognition has always attempted to be complete understanding of arbitrary input, rather than follow a simple command language eg <subject> <parameters> <action>. It could be made completely reliable with current tech (even a decade ago, really), by just minimizing the possibility space… and I’m pretty sure consumers would trivially be able to learn it, as long as they don’t try to go full pseudo-programming-language mode

And “Computer, execute program alpha beta seven” would be the power user version of it

We should already be at “computer, earl gray, hot” today


Years ago I used a program with that approach for a space sim. Basically it would only recognize voice commands that you define beforehand, which made it very reliable at recognizing the right one because it just had to find the closest match within a limited set of options, and would then simulate associated key inputs.

Meanwhile when I tried Android's voice-based text input it was a catastrophe as my accent completely threw it off. Felt like it was exclusively trained on English native speakers. Not to mention the difficulty such systems have when you mix languages, as it tends to happen.


This is an annoyance that Linus from LTT constantly brings up. The voice assistants try to split the recognition and mapping to commands which results in lots of mistakes which should never happen. If you say "call XYZ", then the result would be so much better if the phone tried to first figure out if any of the existing contacts sounds like XYZ.

Limiting the options rather than making the system super generic would help in so many cases.


> I’ve never understood why voice recognition has always attempted to be complete understanding of arbitrary input, rather than follow a simple command language

Because the UI affordances (in this case the control language) wouldn’t be discoverable or memorable across a large range of devices or apps. Moreover, speaking is an activity that allows for an arbitrary range of symbol patterns, and a feedback loop between two who are in dialog are able to resolve complex matters even though they start from different positions.


I mean, right now the current state is effectively an undiscoverable control language, with somewhat flexibility but generally fails/unreliable unless you restrict yourself to very specific language — language that differs based on the task executed, often with similar but different specific formats required to do similar actions

I’d argue that if the current state is at all acceptable, then a consistent, teachable and specific language format would be an improvement in every way — and you can have an “actual” feedback loop because there’s a much more limited set of valid inputs, so your errors can be much more precise (and made human-friendly, without, I think, made merely programmer-friendly).

As it stands, I’ve never managed a dialogue with Siri/Alexa; it either ingests my input correctly, rejects it as an invalid action, does something completely wrong, or produces a “could not understand.. did you mean <gibberish>?”.

Having the smart-ai dialogue would be great if I could have it, but for the last decade that simply isn’t a thing that occurs. Perhaps with GPT and it’s peers, but afaik GPT doesn’t have a response->object model that could be actioned on, so the conversation would sound smoother but be just as incompetent at actually understanding whatever you’re looking to do. I think this is basically the “sufficiently smart compiler” problem, that never comes to fruition in practice


It's like using a CLI where the argument structure is inconsistent and there is no way to list commands and their arguments in a practical way.


Close your eyes and imagine that CLI system is instead voice / dialog based. The tedium. For bonus points, imagine you’re in a space shared with others. Doesn’t work that well…


What? No, I think it'd be great! I'd love to be able to say out loud "kube get pods pipe grep service" and the output to be printed on the terminal. I _don't_ want to say "Hey Google, list the pods in kubernetes and look for customer service".

The transfer between my language and what I can type is great. It starts becoming more complex once you need to add countless flags, but again, a structured approach can fix this.


Voice recognition often works with a grammar of words you specify to improve the chance of correct detection.

It's just that there is no consumer application and I think the reception of voice commands from the public was fairly cold.

I don't want to do stuff with my voice for once. I'd rather click or press a button


Most voice assistants can work with simple phrases like that. Alexa, lights on. Hey Google, thermostat 70 degrees.


Not Siri, which thinks I'm talking to her all the time when I'm speaking to a family member whose name contains neither an "s" nor an "r".


That's because letters aren't sounds.


Way to jump to unjustified conclusions. The name also doesn’t contain either sound.


The problem is that’s not the only format they work on, and because input format is largely unconstrained, when they misunderstand, they catastrophically misunderstand.

It’s just like the image recognition ML issue, where it can correctly predict a cat, but change a specific three pixels and it has 99% confidence it’s an ostrich.

Or JavaScript equality. If you do it right, it’s right, but otherwise anything goes.

Or Perl, in its entirety


Probably the divide between technical users and non-technical. You and I find that structure completely logical. But less structured natural language with a million ways to ask a certain thing puts it practically in reach of the remainder of the population.


Nerd point of order here: Star Trek TNG had a ship in which a key member of the bridge crew was an android. They routinely relied on Data for all kinds of critical things. And although the ship was manned by people, it couldn't function properly without its computer. Several episodes revolve around computer malfunctions.

Finally, their nemesis was the Borg, a race that explored the question of what happens if a society fully embraces AI and cybernetics instead of keeping it at a distance like the Federation does. The Borg are depicted as more powerful than the Federation exactly because they allowed AI to take over critical functions.


> Nerd point of order here: Star Trek TNG had a ship in which a key member of the bridge crew was an android. They routinely relied on Data for all kinds of critical things.

Data was created by technology not available to the Federation. As far as the order of society is concerned, he's magic and not technology. An immediate implication is that his ship was the only one in the Federation with an android crew member.

> And although the ship was manned by people, it couldn't function properly without its computer. Several episodes revolve around computer malfunctions.

This is true, though. The computer did take over many critical functions.


> This is true, though. The computer did take over many critical functions.

But the Star Trek computer was just a fairly normal computer with an AI-ish voice UI. And there have been present-day ships which couldn't function properly without their computer... I distinctly remember a story about a new (~20 years ago) US Navy warship not being able to go on its maiden voyage because Windows blue-screened.


My boring car can't function without its computer.


Whoa, hold on.

Data was an android, but one that is meant to mimic an individual being. He may have been a form of AI, but he is no more than just an advanced human.

And yes, the ship couldn't function without computers - but they were traditional (but futuristic) computers manned by people, with AI guided by people - not AI that controlled every aspect of their lives.

I think when people think of AI, and the fear that comes with it - they imagine the type of AI that takes over human function and renders them unimportant.

Also, the Borg didn't fully embrace AI. They were a collective, linked together by technology. You can view them as more or less a singular entity with many moving parts that communicated over subspace, striving to achieve perfection (in their own eyes). As a consequence, they seek to assimilate (thus parasitizing their technological advancements for their own) or eradicate other species in an attempt to improve the Hive.



Tangential nerd point: Lt Barclay who became superior to the onboard computer or Data after Barclay was scanned / influenced by an alien probe.

https://memory-alpha.fandom.com/wiki/The_Nth_Degree_(episode...


Star Trek was a fiction series that heavily focused on human experiences and relationships. Picard et al famously do a lot of things that actual navy commanders would absolutely never do, like commanding away teams in hostile and/or dangerous territory.

Having an AI to pilot the ship, target the weapons and control away teams robots/holograms would take away from the core purpose of the show which is to tell a gripping story. It's not meant as an exploration on how to use AI in space exploration.


ST isn't military fiction. It's science fiction. Thinking about technology is absolutely one of its core aims.

They had a whole episode about the legal and moral aspect of AI human rights.


It definitely seeks to explore the impact of many technologies, but the impact of AI was not really one of them. They spent one whole episode out of 178 on AI, and there was a _very_ small plotline near the start of TNG about Data wishing to be more human.

EDIT: There was also the episode where a holodeck character gains true sentience, but then the crew proceeds to imprison it forever into a virtual world and this is treated by the show as the ethical thing to do. Trapping any human in a simulation is (correctly IMO) treated as a horrible thing, but doing it to an evidently sentient AI is apparently no problem.


It’s a good example of what people would like out of AI though - perfect recall and solid reasoning/calculation capabilities; an assistant that covers our weaknesses.


Obligatory counter quote from Douglas Adams:-

> He had found a Nutri-Matic machine which had provided him with a plastic cup filled with a liquid that was almost, but not quite, entirely unlike tea.

Getting LLMs to give you an answer is easy. Getting them to give you the answer you're actually looking for is much harder.

LLMs are a very useful search tool but they can't be relied on as a source of truth ...yet. There in lies their main problem.


> how difficult it is to close the last few percentage points of accuracy

Like, after getting to 99% you are about half way, the last 1% is the hard part.


Because randomly casting dust on a table sometimes says intelligent things, therefore there is a continuous function between dust and ChatGPT?

While “dust” might be flippant, their approach does seem to suggest that even hierarchical Markov models would be able to demonstrate abilities on their continuous metrics.


By adding dense vector search, the accuracy of smaller models can be improved, because the reference material has more hints in it than the frozen model…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: