First, I am also frustrated by companies trying to prevent unauthorised used.
But second, the reasons are:
(1) For AI company, someone publishing: "I asked the model a question about crime, and it talked shit about black people! Look! [damning quote that you can also get model to say/do]." Stability took the "let people do what they will" tack and now Forbes and every other major media mouthpiece slams them at every opportunity about how they are ethically-challenged.
(2) For Replika, someone chatting with their online girlfriend: "I love you more than my wife and children." Then someone hacking Replika exposing these conversations, and now Replika is in hot water because all these divorces. Replace example with 100 other similarly awful situations like talking about mental health problems, crimes, petty squabbles with their coworkers, or political problems.
Forbes can write that crap, but the problem is with people who make decisions basing on that. I wonder who are these people that care about all this nonsense.
Start searching SuperHOT and RoPE together. 8k-32k context length on regular old Llama models that were originally intended to only have 2k context lengths.
Any trick which is not doing full quadratic attention cripples a models ability to reason "in the middle" more than they already are crippled. Good long context length models are currently a mirage. This is why no one is seriously using GPT-4-32k or Claude-100k in production right now.
Edit: even if it's doing full attention like the commentator says, turns out that's not good enough! https://arxiv.org/abs/2307.03172
> While many software problems can cause you to lose money, engineering problems can cause you to lose time.
I'm guessing you were trying to say something else here. I literally cannot think of a single software engineering problem I've ever encountered that didn't cost time. By your definition, then, software engineering is engineering. Your claim and your definitions are at odds with one-another.
Also, you don't directly claim it, but you seem to imply, that software engineering can't have real-world consequences... or something? As another reply points out, sometimes software is in the critical path for things like rockets and airplanes, where mistakes cost lives.
And some people making software for less life-altering systems take their craft just as seriously. Some people think that losing $10M every single second while their software is failing is a big deal.
Are you claiming people who write HFT code, ad arbitrage code, code that powers the front page of Apple, Amazon, Microsoft, and Google are just cowboying it through the day, doing nothing special?
Overall I just find this comment very confused. Maybe you could put some thought into what you're trying to say, and say it better?
This is a smart, motivated audio engineer talking about Atmos. Quick summary: it's awful when it works perfectly, and it almost never works perfectly, in fact, it rarely works at all. Good luck trying to get Atmos working, and even if this was a completely open standard, would you really bother?
I've had an Atmos system for a few years and it works great for movies, both on 4k disc and streaming services. Almost all of the Apple TV+ content is Atmos encoded. The $500 system in that video is entry level Atmos, my system is 5.2.4 and cost a substantial amount of money.
I agree that games support could be better but that's mostly up to publishers. As for music, I've tried it and it's fine but it's mostly a gimmick for selling remasters of established artists IMHO. Tidal has a lot of Atmos music content. It seems he was trying to use Amazon Music and Amazon is notorious for not supporting Dolby standards because of the cost.
I've worked in audio professionally and software I created is used in lots of movies and music.
Perhaps you’re right his $700 unit is not set up correctly or insufficient, but the point is it seems the speaker (and headphones he tried) are basically what you might end up with if you don’t have a high budget and do a fair bit of research: the end result is disappointing and would turn people off of the technology.
Feels like, if you can't get a person who understands the tech to buy the equipment and get it running correctly, there's something very wrong with this system. I get that even experienced people can make mistakes (I sure made many in my field), but if that's a specifically researched content for publication... you tend to be careful with those.
I think Dolby's desire to market the Atmos brand is unfortunately beyond what the tech can deliver. Atmos works great for people willing to invest in a high end system, especially for the right content, but it's diminishing returns on entry level systems and headphones.
I'm an audio engineer who has dabbled in atmos when it started getting popular and most of what he says i completely agree with. On a personal level, i have never found any spatial mix of any song or recording over the original mix.
I enjoyed the quad mix of Pink Floyd's Dark Side of the Moon. I found that separating some of the sounds over 4 speakers helped clarify them. This works because there's so much going on in that album, so many sampled sounds throughout the album like the snippets of conversations. The clocks at the start of Time were great. I listened to the whole thing sitting in the middle of my living room with my eyes closed. It was extremely absorbing and enjoyable.
99% of the time I listen to the regular mix though. I love music but I don't want to sit motionless in the dead center of an array of speakers. Music is a soundtrack to chopping onions, relaxing with a book, fixing my bike etc.
summary of that, from 1:22: "For watching a movie, in a movie theater/in an appropriate speaker setup, shit is awesome. Love it for movies, it's so cool. But for music, it has not, and will not, ever take off."
>> This is a smart, motivated audio engineer talking about Atmos. Quick summary: it's awful when it works perfectly, and it almost never works perfectly, in fact, it rarely works at all. Good luck trying to get Atmos working, and even if this was a completely open standard, would you really bother?
I hope he had more to say than that, because that’s nonsense. Getting it “working” is as simple as playing it on Apple Music via my AirPods or Sonos. There are definitely bad mixes available, but there are some incredible ones too. Listening to “Let It Be” on the Sonos is magical.
I’ve also done some mixing in Atmos and it’s pretty straightforward.
At the end of the day it’s largely subjective, but I’m pretty certain it’s the future - especially now Sonos has stated releasing single devices that can play Atmos to a pretty good standard, and most new major label releases are mixed in Atmos.
Yes, he did. A lot. I suggest you watch the video. He goes into how Atmos isn't anything new, it's not unique, how it's a basic money grab, how it would be better as an open standard, how Atmos is fine if you like it (as music listening is purely subjective), how remixes of music in Atmos are generally objectively terrible (and, in most cases, not what the artist would have wanted) and a lot more.
Benn Jordan tries very hard to be unbiased and well researched in almost everything he presents. For him to come to the conclusion that Atmos isn't really worth the cost barrier to entry is something I (and many others) will take into account. I'd say Dolby Labs would also do well to take it into account, but they don't have a very good track record with listening to valuable criticism. When it comes to criticism, their noise reduction is perhaps too effective.
IMHO Atmos is the 3D TV of audio, except it could be good if it was an open standard.
FYI, as far as I can tell, Apple (and most others) are delivering Atmos in lossy formats. These may be good, but they aren't by any means state of the art for multi-channel. Server side de-muxing of spatial audio to the required number of channels would mean less overall processing (channel combos could be cached) and higher quality delivery using open standards (multi-channel FLAC supports up to 8 channels, and it's an open format which would allow easy extension). This would be better for the consumer (bandwidth is not an argument in these days of 4K+ video streaming), arguably better for the artists and publishers (better quality audio to the consumer), but it wouldn't make anywhere near as much money in licensing for Dolby.
I have a hard enough time keeping the sweet spot locked in with my stereo/studio setups. The day I get bored with 2 channels, I'll reach for more.
A proper soundstage with a high-end stereo loudspeaker setup will typically make the best multichannel kits sound like shit by comparison. Achieving this is about physical location of speakers and compensation for any time-delay in the signal chain. Clearly, getting 2 things positioned well in space/time is much simpler than 5+.
2.1 or 2.2 is sufficient for nearly all music, but for almost no modern cinema.
The primary mix for movies is basically mono, with sound effects sprinkled around you. If you want a chance to make sense of the dialogue, you really need a good center channel aligned with the center of the screen. The stereo downmixes almost universally suck because they don't boost the center channel enough before splitting it to the left and right speakers.
I've watched this one several times on my stereo setup. I don't think the multichannel mix would provide a meaningfully-different experience.
The most important part of the BR2049 audio experience for me lives below 100hz. I don't need that .1 to feel what you are feeling. I have a DSP engine that siphons everything <60Hz off my stereo channels and feeds it into a quarter ton worth of subwoofer. Once you have this movie running flat down to 12Hz, your "multichannel effects" will be produced by the structure you are watching it in.
Nah, you can't easily dynamically downmix surround sound audio to stereo. In the main mix, the dialog mostly comes through the center channel, and the left and right surround sound speakers are used for sound effects. If you simply split the center channel to left and right without massively boosting the volume of the center channel first, you're going to have stupidly loud sound effects (from the L/R surround sound speakers) and the dialogue will sound like whispers. If you're playing the movie from a PC, you can boost the center channel audio with software, but sometimes (e.g. when a character is off-screen) dialogue will come from the left or right channels and you won't be able to clearly hear them.
If you've got a stereo speaker setup, there's no good solution unless you can get your hands on the actual official stereo mix (some streaming services let you select this), but even the official stereo mix is an afterthought cobbled together by some overworked and underpaid audio engineer: the primary mix is absolutely the theatrical mix with all the surround sound channels, then everything is downmixed progressively from there.
I've had no problems with downmixed surround on stereo speakers. Never had a problem with dialogue, not once. And I've watched basically every notable English-language film that's come out in my lifetime.
I didn't say anything about a centre speaker, though. A centre speaker is useful when you have a large screen and people will be sitting outside of the sweet spot. You don't have to be far outside the sweet spot before you lose the ghost centre channel. But you can still get really far without a centre channel and it's much easier because the ideal position is behind the screen which is hard to achieve (requires an acoustically transparent projector screen). The ghost centre channel is always behind the screen, though. Some AVRs even include an option to "lift" the centre channel by mixing it into the L/R to account for the common suboptimal below-screen positioning of the centre speaker. I've never tried it, though.
Nobody has ever heard my stereo set ups and commented on lack of surround speakers. And when I did have surround speakers nobody ever commented on them. Nobody ever couldn't hear the dialogue. They were always impressed with the bass and dynamic range.
For me, the diminishing returns are something like this (specifically for film soundtracks):
- 60%: Good stereo speakers (preferably full-range, down to 60Hz),
- 80%: Quiet environment and ability to listen with full dynamic range at close to reference SPLs (ie. no kids, no neighbours annoying you or you annoying them),
- 90%: Good subwoofer to add the actual sub-bass material (down to 20Hz),
- 99%: Good LCR set up (ie. centre matched to the L/R speakers),
Leaving 1% for anything extra like surrounds etc. It's just really silly to add surrounds before getting the huge gains above.
> Leaving 1% for anything extra like surrounds etc. It's just really silly to add surrounds before getting the huge gains above.
This advice would save so much headache for so many nerds. The juice is simply not worth the squeeze unless that whole checklist is already satisfied and you are still not blown away.
The stupid thing is people listen to Atmos stuff on terrible speakers. Like soundbars with upfiring speakers. Tiny, whimpy little drivers that are tuned to make it sound like you have "bass". It's a shame because if people just set up a decent stereo system it would blow those systems away.
Even if you have a decent system (that's going to cost thousands, a good room and modification of said room) the gains are tiny outside of a few gimmicky demos. In a home environment you really don't need anything more than front speakers. Surrounds do not add anything. Save your money and buy a bigger screen.
In my opinion, the problem is not Atmos, it's the lack of head-tracking. That's why it can sound awesome on a calibrated surround speaker setup but usually fails to deliver on headphones.
I once built a wwise plugin that allows you to play Atmos and the likes on a Oculus with proper Headphone Surround 3D and everyone agreed that it was fantastic. But those consumer $0.01 ASICs integrated into mainboards obviously can't compete with a solution that has more sensors and GHz of compute available.
Yes, and some people love it. But the AirPods don't have enough processing power so they need to send the measurements to the host via bluetooth and then the adjusted audio is sent back later. The result is 200+ ms of latency on the head-tracking which is much larger than the perception threshold at 30ms.
I don’t notice any latency issue generally. But, if you change your heads “default position” (i.e. instead of looking forward most of the time you’re now looking out the window next to you most of the time) it does take quite a few seconds to readjust and find the middle.
I've got a couple of the original Apple HomePods set up in stereo config, and I have to say it sounds pretty good. I'm not an audiophile, but I have friends who are and they are impressed with the sound.
> Atmos through stereo headphones is a non-sensical premise
If you watch atmos content through an apple tv it has head tracking for airpods. It's not going to change your life, but the audio being centered on the actual screen a la a proper surround sound setup is a nice touch.
> Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.
Are humans limited to low-risk applications like that?
Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
And I don't want to count the number of times I've personally done that, but I'm sure it's >0. And I hate to tell you, but I've spent the last 20 years in positions of authority that could have caused massive amounts of damage not only to the companies I've been employed by, but a large cross-section of society as well. And those fools I referenced in the last paragraph? Same.
I think people are too hasty to discount LLMs, or LLM-backed agents, or other LLM-based applications because of their limitations.
(Related: I think people are too hasty to discount the catastrophic potential of self-modifying AGI as well)
Can people please stop making this comment in reply to EVERY criticism of LLMs? "Humans are flawed too".
We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.
You remember a few isolated incidents because they're salient. That does not mean that it's representative of your average personal interactions.
Oh yes we do lol. Many experiments show our perception of reality and of cognition is entirely divorced from the reality of what's really going on.
Your brain is making stuff up all the time. Sense data you perceive is partly fabricated. Your memories are partly fabricated. Your decision rationales are post hoc rationalizations more often than not. That is, you don't genuinely know why you make certain decisions or what preferences actually inform them. You just think you do. You can't recreate previous mental states. You are not usually aware. But it is happening.
We don’t hallucinate in such a way / to the extend that it compromises our ability to do our job.
Currently no one will trust a LLM to even run a helpline - that just a lawsuit waiting to happen should the AI hallucinate a “solution” that results in loss of property, injury or death.
>Currently no one will trust a LLM to even run a helpline - that just a lawsuit waiting to happen should the AI hallucinate a “solution” that results in loss of property, injury or death.
I'm not quite sure exactly what you mean by helpline here (general customer service or more specific ?) but assuming the former..
How much power do you think most helplines actually have ? Most are running off pre-written scripts/guidelines with very little in the way of decisional power. There's a reason for that.
Injury or death ? LLM hallucinations are relational. Unless you're speaking to Dr GPT or something to that effect, a response resulting in injury or death isn't happening.
Having worked in the help-line business, I can tell you that many corporations would and do use LLMs for their helpline, and used worse options before.
> We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.
In my average interaction with GPT 4 there are far less errors than in this paragraph. I would say that here you in fact "spout fully confidence nonsense" (sic).
Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence. Some LLMs are better than some humans in some situations at doing these things.
You seem to be hung up on the word "hallucinate". It is, indeed, not a great word and many researchers are a bit annoyed that's the term that's stuck. It simply means for an LLM to state something that's incorrect as if it's true.
The times that LLMs do this do stand out, because "You remember a few isolated incidents because they're salient".
> Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence.
That's true - which is why we have constructed a society with endless selection processes. Starting from kindergarten, we are constantly assessing people's abilities - so that by the time someone is interviewing for a safety critical job they've been through a huge number of gates.
> Are humans limited to low-risk applications like that?
No, but arguably civilization consists of mechanisms to manage human fallibility (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc). We might not fully understand why, but we've found methods that sorta kinda "work".
> No, but arguably civilization consists of mechanisms to manage human fallibility
Exactly. Civilization is, arguably, one big exercise in reducing variance in individuals, as low variance and high predictability is what lets us work together and trust each other, instead of seeing each other as threats and hiding from each other (or trying to preemptively attack). The more something or someone is unpredictable, the more we see it or them as a threat.
> (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc).
And on the more individual scale: culture, social customs and public school system are all forces that shape humans from the youngest age, reducing variance in thoughts and behaviors. Exams of all kind, including psychological ones, prevent high-variance individuals from being able to do large amount of harm to others. The higher the danger, the higher the bar.
There are tests you need to pass to be able to own and drive a car. There are tests you may need to pass to own a firearm. There are more tests still before you'll be allowed to fly an aircraft. Those tests are not there just to ensure your skills - they also filter high-variance individuals, people who cannot be safely given responsibility to operate dangerous tools.
Further still, the society has mechanisms to eliminate high-variance outliers. Lighter cases may get some kind of medical or spiritual treatment, and (with gates in place to keep them away from guns and planes) it works out OK. More difficult cases eventually get locked up in prisons or mental hospitals. While there are lot of specific things to discuss about the prison and mental care systems, their general, high-level function is simple: they keep both predictably dangerous and high-variance (i.e. unpredictably dangerous) people stashed safely away, where they can't disrupt or harm others at scale.
> We might not fully understand why, but we've found methods that sorta kinda "work".
Yes, we've found many such methods at every level - individual, familial, tribal, national - and we stack them all on top of each other. This creates the conditions that let us live in larger groups, with less conflicts, as well as to safely use increasingly powerful (i.e. potentially destructive) technologies.
I think you’re weighting the contribution of authority a bit too highly. The bad actors to be concerned about are a very small percentage of the population and we do need institutions with authority to keep those people at bay but it’s not like there’s this huge pool of “high variance” people that need to be screened out. The vast majority of people are extremely close in both opinion and ability, any semblance of society would be impossible otherwise.
> it’s not like there’s this huge pool of “high variance” people that need to be screened out. The vast majority of people are extremely close in both opinion and ability, any semblance of society would be impossible otherwise.
Yes, but I'm saying it's not an accident - I've mentioned mechanisms like culture, social customs, and education, which we've been using in some form for all our recorded history. I should've probably added violent conflicts within and between tribes/groups, too, which also acted to reduce variance, by culling the more volatile and less agreeable people. People today are "extremely close in both opinion and ability" because for the past couple thousands years, generation by generation, we've been busy reducing the variance of individuals.
EDIT: keeping high-variance individuals locked up safely away is just one of the methods we use, specifically to deal with outliers. It too traces back to the dawn of recorded history - shunning, expelling individuals from the tribe (which often meant certain death), sending them to faraway lands, or forcing them into war, were other common means past societies used to eliminate high-variance outliers.
As for authority, it's a separate topic - I argue that hierarchical governance is an artifact of scale: it's necessary to coordinate groups past certain size (~Dunbar's number), when our basic social intuitions are no longer up to the task. But the first level of hierarchy can handle only so many people, and if you want to coordinate multiple such groups, you need to add another layer... and that's how, over time, human societies scaled from tribes of couple dozen people, to nation states of hundreds of millions.
Even as the focus is usually on the national governments, the entire hierarchy is still there - you have states and lands/vovoidships/counties with their own governance, then another level for a major city and surrounding villages, then yet another level in each individual village, and one or two levels in the city itself, etc. We don't often pay attention to it, but the hierarchy of governance does reach down, in some form, all the way to groups of couple hundred people or less.
>Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
Spouting out the most ignorant stuff is one of the lowest risk things you can do in general. We're talking about running a code where bug can do a ton of damage, financial or otherwise, not water-cooler conversations.
In the train example, the UI is in place to prevent a person from making a dangerous route. I think the idea here is that an LLM cannot take the place of such a UI as they are inherently unreliable.
To your point,Humans are augmented by checklists and custom processes in critical situations. And very certainly applications include which mimic such safety checklists. We don't NEED to start from LLM perspective of our goal is different and doesn't benefit from LLM. Not all UI or architecture is fit for all purposes.
> Are humans limited to low-risk applications like that?
Yes, of course. That's why the systems the parent mentioned designed humans out of the safety-critical loop.
> Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
The key difference is that when the human you're having a conversation with states something, you're able to ascertain the likelihood of it being true based on available context: How well do you know them? How knowledgeable are they about the subject matter? Does their body language indicate uncertainty? Have they historically been a reliable source of information?
No such introspection is possible with LLMs. Any part of anything they say could be wrong and to any degree!