Firstly compliments to Apple for all these incredible accessibility features.
I think there is an important little nod to the future in this announcement. "Personal Voice" is training (likely fine tuning) and then running a local AI model to generate the user's voice. This is a sneak peak of the future of Apple.
They are in the unique position to enable local AI tools, such as assistance or text generation, without the difficulties and privacy concerns with the cloud. Apple silicone is primed for local AI models with its GPU, Neural Cores and the unified memory architecture.
I suspect that Apple is about to surprise everyone with what they do next. I'm very excited to see where they go with the M3, and what they release for both users and developers looking to harness the progress made in AI but locally.
Just yesterday I started using a new maxed out Mac mini and everything about it is snappy. I have no doubt that it is ready for enormous amount of background processing. Heavy background work is the only way to use the processing power in that little computer.
Think Siri+ChatGPT trained on all your email, documents, browser history, messages, movements, everything. All local, no cloud, complete privacy.
"Hey Siri, I had a meeting last summer in New York about project X, could you bring up all relevant documents and give me a brief summary of what we discussed and decisions we made. Oh and while you're at it, we ate at an awesome restaurant that evening, can you book a table for me for our meeting next week."
All of my current experience with Siri tells me there is a 50-50 chance of the result coming back as “Sorry, I have having trouble connecting to the network” or playing a random song from Apple Music.
Just last night, we were entertaining our toddler with animal sounds. It worked with “Hey Siri, what does a goat sound like?”, then we were able to do horse, cow, sheep, boar, and it somehow got tripped up on pig, for which it responded with the Wikipedia entry and told us to look at the phone for more info.
You’ve touched on what is probably the biggest reason I don't use Siri more: Apple does not limit it to what’s important to me as user.
I have thousands of contacts, lots of photos, videos, and emails, all in Apple’s first-party apps and yet Siri is more likely to respond with a popular song or listing of news articles that’s only tangentially connected to my request.
This becomes more complicated when Siri is the interface on a homepod in a shared area. Who's data and preferences should be used? Ideally it would recognise different voices and give that person's data priority, but how much can/should be shared between users? Where are these data - they shouldn't be in the homepod, so it would have to task the phone with finding the answer. I'm sure something good could be done here, but it wouldn't be easy.
>All of my current experience with Siri tells me there is a 50-50 chance of the result coming back as “Sorry, I have having trouble connecting to the network” or playing a random song from Apple Music.
Well, this is about adding ChatGPT-level smartness to Siri, not just the semi-dumb assistant of yore.
> I’m feeling nostalgic. Make me a playlist with 25 mellow indie rock songs released between 2000 and 2010 and sort them by release year, from oldest to most recent.
This doesn't just return a list of songs, it will create the playlist for you in Music.
> Check the paragraphs of text in my clipboard for grammar mistakes. Provide a list of mistakes, annotate them, and offer suggestions for fixes.
> Summarize the text in my clipboard
> Go back to the original text and translate it into Italian
I haven't tried it myself, but it has other integrations like "live text" where your phone can pull text out of an image and then could send that to GPT to be summarized.
Version 1.0.2 makes improvements for using it via Siri including on HomePod.
Today I asked Siri for the weather this week. She said daytime ranges from 31C to 23C, so I then asked "on what day is the temperature 31 celsius?". And, of course, what I got back was "it's currently twenty seven degrees".
The weather ones are so annoying: "Is it going to rain today?". "It looks like it's going to rain today". "What time is it going to rain today?". "It looks like it's going to rain today".
It seems ironic then that specific thing failed spectacularly for me today. Siri put the text "set a timer for 15 minutes" into the text field of a reminder. I have no clue why, and no timer was set.
But you know what? Still better than Alexa for managing my smart home stuff. By miles and miles, IMO.
And god help you if you give up halfway through a command with a prompt. “Cancel”, “stop” and “nevermind” don’t work for half of that for some reason, so you have to walk up and tap the HomePod to cancel.
> All of my current experience with Siri tells me there is a 50-50 chance of the result coming back as “Sorry, I have having trouble connecting to the network” or playing a random song from Apple Music.
Meanwhile, Google and Amazon have decided that the data center costs of their approach just aren't worth it.
>Google Assistant has never made money. The hardware is sold at cost, it doesn't have ads, and nobody pays a monthly fee to use the Assistant. There's also the significant server cost to process all those voice commands, though some newer devices have moved to on-device processing in a stealthy cost-cutting move. The Assistant's biggest competitor, Amazon Alexa, is in the same boat and loses $10 billion a year.
Yes. I dont understand the criticism of the current Siri in this context, the point of a language model on the device would be to derive intent and convert a colloquial command into a computer instruction.
Siri was so good before iOS 13, I'm not sure what they did in that release but it went from around 90-95% accuracy and 80-90% contextual understanding - down to 70% and 75% respectively.
As someone who dictates more than half of their messages and is an incredibly heavy user of Siri for performing basic tasks I really noticed this sudden decline in quality and it's never got back up there - in fact, iOS 16 really struggles with many basic words. Before iOS 13. I would have been able to dictate these two paragraphs likely without any errors however, I've just had to edit them in five places.
I thought the lack of ability to execute on current “easy” queries would indicate something about ability to execute something as complicated as figuring out the restaurant you ate at and making a reservation. At least anytime in the next few years.
I don’t think it does. This isn’t a hypothetical Siri v2 with some upgrades; it’s a hypothetical LLM chatbot speaking with Siri’s voice. I recall one of the first demonstrations of Bing’s ability was someone asking it to book him a concert where he wouldn’t need a jacket. It searched the web for concert locations, searched the web weather information, picked a location that fit the constraint and gave the booking link for that specific ticket. If you imagine an Apple LLM that has local rather than web search, it seems obvious that this exact ability that LLMs have to follow complicated requests and “figure things out” would be perfectly suited to reading your emails and figuring out which restaurant you mean. With ApplePay integration it could also go ahead and book for you.
Certainly not the only place, but you’re very right that it does house a large population of commenters like me who enjoy the “sport” of “being correct on the internet”.
And yet the parent makes a very specific (and correct) comment, that this wont be Siri with some upgrades, but Siri in the name only, with a totally different architecture.
Whereas yours and your sibling comment are just irrelevant meta-comments.
Siri today is built on what’s essentially completely different concepts from something like ChatGPT.
There are demos of using ChatGPT to turn normal English into Alexa commands and it’s pretty flawless. If you assume Apple can pretty easily leverage LLM tech on Siri and do it locally via silicon in the M3 or M4, it’s only a matter of chip lead time before Siri has multiple orders of magnitude improvement.
That experience likely isn’t transferable to Siri, that has deeper problems. People, me included, are reporting their problems with Siri, e.g. setting it to transcribing what they and Siri says as text on the screen, and then are able to show that given input as “Please add milk to the shopping list” results in Siri responding “I do not understand what speaker you refer to.”, in writing.
Likely problems like these could be overcome, but preparing better input would probably not address the root cause of the problems with Siri.
Microsoft voice assistant was equally dumb as Siri, but ChatGPT is another thing entirely. Wont even be the same team at all, is most likely.
So nothing about their prior ability, or lack thereof, to make Siri smart means anything about their ability to execute if they add a large LLM in there.
I love Steve Jobs' "bicycle for the mind" metaphor, and what you describe is the best possible example of this concept. A computer that does that would enable us to do so much more.
This is the sort of AI I want; a true personal assistant, not a bullshit generator.
It appears that we are tantalizingly close to have the perfect voice assistant. But for some inexplicable reason, it does not exist yet. Siri was introduced over a decade ago, and it seems that its development has not progressed as anticipated. Meanwhile, language models have made significant advancements. I am uncertain as to what is preventing Apple, a company with boundless resources, from enhancing Siri. Perhaps it is the absence of competition and the duopoly maintained by Apple and Google, both of whom seem reluctant to engage in a competitive battle within this domain.
It is probably a people problem. The people who really understood Siri have probably left, the managers left running it are scored primarily on not making any mistakes and staying off the headlines. Any engineers who understand what it would take to upgrade it aren't given the resources and spend their days on maintenance tasks that nobody really sees.
It's more likely a perverse incentive problem. Voice activated "assistants" weren't viewed as assistance for end users. They were universally viewed as one of two things: A way of treating the consumer as a product, or a feature check-box.
That Siri went from useful to far less useful had more to do with the aim to push products at you rather than actually accomplishing the task you set for Siri. If Apple actually delivers an assistant that works locally, doesn't make me the product, and generally makes it easier to accomplish my tasks, then that's a product worth paying for.
When anyone asks "who benefits from 'AI'?" the answer is almost invariably "the people running the AI." Microsoft and OpenAI get more user data, and subscriptions. Google gets another vehicle for attention-injection. But if I run Vicuna or Alpaca (or some eventual equivalent) on my hardware, I can ensure I get what I need, and that there's much less hijacking of my intentions.
So Microsoft, if you're listening: I don't want Bing Chat search, I want Cortana Local.
When was Siri ever useful? I have yet to encounter a voice "assistant" that can do more than search Google and set timers reliably, and Siri itself can't even do those very well.
I use it around 50 - 100 times per day. Mostly playing music, sending messages, controlling lights in the home, weather, timers, and turning on/off/opening apps on the TV
There are definite frustrations, mostly around playing music. Around 5% of the time, Siri will play the wrong album or artist because the artist name sounds like some other album name, or vice versa. I wish, here, that it used my Music playback history to figure out which one I meant
Doing what Siri is doing is not rocket science. It’s a simple intent based system where you give it patterns to understand intents and you trigger some API based on it.
Once you have the intents parsing, it should be just a matter of throwing man power at it and giving it better intents.
Yes, I have experience with building on top of such a system.
But the group managing Siri has probably been gutted in the past 10 years, and while the core is always simple the integrations and the QA testing to make sure it all keeps working is probably brittle and time consuming, and the core code is likely highly-patched spaghetti at this point.
It would be easy to write Siri again and make it a hundred times better, if you could start all over and only write the core features, and not have to validate against the whole product/feature matrix.
The problem with the rewrite of course would be that you won't be able to deliver that minimal viable product any more and you will have 10 years worth of product requirements and user expectations that you MUST hit for the 1.0 release (which must be a 1.0 and not an 0.1).
I've worked on lots of "simple" and "not rocket science" systems that were 10-years old, and it is always incredibly difficult due to the state of the code, the lack of resources, and the organizational inertia.
This is already felt in use of Stable Diffusion, where M2 is fully capable offline.
Anything that can be done to reduce the need to “dial out” for processing protects the individual.
It erodes the ability of business and governmental organizations to use knowledge of otherwise private matters to target and influence.
The potential of moving a HQ LLM like GPT to the edge to answer everyday questions reminds me of my move from Google to DDG as my default search engine.
Except it’s even a bigger deal than that. It reduces private data exhaust from search to zero, making going to the net a backup plan instead of a necessity.
Apple delivering this on device is a major threat to OpenAI, which will have to provide some LLM model with training that Apple can’t or won’t.
Savvy users will begin to leer at having to produce queries over the wire, feeding valuable data (proven by ShareGPT)
Even then, Apple will likely chose to or be forced to open up on device AI to allow user contributed apps like LORAs which would ask the question why does OpenAI need to exist?
Also fascinating the potential to do this at the Server level for enterprise. If Apple produced a stack for enterprise training it could replace generalized data compute needs, shifting IT back to local or intranet.
Apparently, you are not an actual user of Siri, because I get jack shit out of her. speech to text is infinitely worse than the first week Siri was released.
Yes and we should also have EU regulators at every design meeting for every company. They did such a good job with the GDPR making the user experience better on the web
Yes, alas they didn't leave room for a 'cookie preferences' cookie, so that whenever I choose the option 'reject all', it's of course going to ask me again, every time I visit the website.
saying that, their intentions were good, I'm always horrifically amazed at the number of cookies used whenever I see the preferences popup. I honestly had no idea how many tracking cookies were used by the average website.
>Think Siri+ChatGPT trained on all your email, documents, browser history, messages, movements, everything. All local, no cloud, complete privacy.
That sounds absolutely horrifying if you remove the "all local" part. And that part's a pipe dream anyway. Plus, when using a model you'd basically become subservient / limited to the type of data in the model, which would necessarily abide by Apple's TOS, so a couple of hundred million people would be the Apple TOS but in human form. I don't understand why apple fanboys don't get this. Apple is pretty shoddy when privacy is concerned. Are these apple employees making these posts?
Fat chance Apple will alow us to do this locally. More like, upgrade to Apple Cloud Plus to get these features. But yeah, I've also dreamt of what my Apple hardware could do.
> Just yesterday I started using a new maxed out Mac mini and everything about it is snappy.
Really?! I didn't think anyone here would fall for that.
Mac Mini 12-core M2, 19-core GPU, 32GB, 10Gbit, 8TB storage? $4500
Mac Studio 20-core M1, 48-core GPU, 64GB, 10Gbit, 1TB storage is $4000. 128GB of RAM is $800 more
but either Studio RAM configuration obviously spanks the M2 mini. It's sacrificing Apple's expensive storage, but with Thunderbolt 3 it's pretty academic to find 8TB or more of NVMe storage, probably 32GB of NVMe RAID[1], for less than Apple's charge of $2200 above cost of 1TB.
I specced the smallest SSD. I use netwomr homes. The mini is a stop gap waiting for the pro. Drive size Indont really consider a performance item anymore.
I spent just over $2,000.
Mac mini
With the following configuration:
Apple M2 Pro with 12‑core CPU, 19-core GPU, 16‑core Neural Engine
32GB unified memory
512GB SSD storage
Four Thunderbolt 4 ports, HDMI port, two USB‑A ports, headphone jack
10 Gigabit Ethernet
Not awful, but for $2K you could have had 16-core CPU, 20-core GPU, 32-core Neural Engine, 48GB unified memory, 512K SSD storage, Four Thunderbolt 4 ports, two HDMI ports, four USB-A ports, two headphone jacks, two Gigabit Ethernet.
Yes. I wanted the 10Gbt Ethernet. My purchasing question is when is the right time to buy a great monitor. In the CRT days the monitor lasted the longest and buying the best one could afford worked for me.
I just went back to compare the Mini with the Studio again. Despite your advice I would buy the Mini again for these reasons:
I'm on a newer generation chip that has a lower power draw. Meets my network speed minimum. All for the price of the entry level Studio. This box is basically an experiment to see how much processing power I need. I have a very specific project that will require the benchmarking of Apple's machine learning frameworks. I want to see how much of a machine learning load this Mini can handle. Once I have benchmarks maybe the Pro will exist and I will be in good shape to shop and understand what I'm buying.
I think a Mini of any spec is a great value. The studio has a place but I'm hoping the Pro ends up being like an old Sun E450.
This Mini experiment is to help me frame the hardware power vs. the software loads.
My second suggestion for 16-core was M2, also. $100 less with 1Gb, and with 10Gb it would be $100 more than you paid. i.e. two of the 8-core M2 Minis with 24GB RAM each would do about twice as much work as the high end Mini M2 Pro alone, sometimes less than twice the work, sometimes more. The same is true of two M1 Max Studios vs one M1 Extreme Studio for the same price. 2 less powerful machines spank one more powerful machine every single time, and one M1 Extreme Studio is definitely NOT worth two M1 Max Studios, same as one 12-core M2 Pro Mini is definitely NOT worth two 8-core M2 Minis.
Everyone is drawn to "the best," and that's where Apple fleeces and makes its money. Pretty consistently forever, the best buys from Apple are never the high end configurations. We may feel secure in what our choices were, doubling down on affirming them, but we definitely pay for it.
I don't see a 16 core M2 or any Studio's with an M2. I was drawn to the latest chip Apple has produced. They put that chip in a small headless form factor. I shopped for a Macintosh computer and judged whether I wanted the motherboard bandwidth of the Mac Studio or the latest chip with the Mac mini.
I'm sorry I disappointed you. I have retroactively looked over everything you have said and doubt I would do it differently. If this machine turns out to be such a dog I can get another one to pair it with as you have suggested I do with 8-core. Finally are you speaking from first hand experience or benchmarks?
I think the disconnect is that you are trying to get as much processing power as possible and I'm trying to understand how much processing power currently exists.
Sounds plausible. Also due to the news yesterday that Apple uses 90% of TSMC‘s 3nm space in 2023 [1]. Whereas everyone is talking about a recession, Apple seems to see opportunities. Or maybe they just had too much cash on hand. Also possible.
Density doesn't always matter. I'm reminded of Apple's 5nm M1 Ultra struggling to keep up with Nvidia's 10nm RTX 3080 in standard use. Having such a minor node advantage won't necessarily save them here, especially since Nvidia's currently occupying the TSMC 4nm supply.
You're comparing a pickup truck with a Main Battle Tank. An RTX 3080 is an electricity hog and produces heat like a monster. No wonder it performs better than an M1 Ultra with a worse node tech.
The RTX 3080 consumes ~300w at load, the M1 Ultra consumes ~200w. If you extrapolate the M1 Ultra's performance to match the 3080, it would also consume roughly the same amount of power.
Is this not a battle-tank-to-battle-tank comparison?
You can run an RTX 3080 off anything with enough PCI bandwidth to handle it. Presumably the same goes for Apple's GPU. We could adjust for CPU wattage, but at-load it amounts to +/-40w on either side and when we're only testing the GPU it's like +/-10w maximum.
The larger point is that Apple's lead doesn't extrapolate very far here, even with a generous comparison to a last-gen GPU. It will be great at inferencing, but so are most machines with AVX2 and 8 gigs of DRAM. If you're convinced Apple hardware is the apex of inferencing performance, you should Runpod a 40-series card and prove yourself wrong real quick. It's less than $1 and well worth the reality check.
My point was mostly that the 200W TDP you quote is for the whole package (CPU, GPU, RAM, plus the Neural network thingy and the whole IO stuff). A 120W figure for the GPU is more realistic.
I'm not pretending the Apple chips are the be-all-end-all of performance. They certainly have limitations and are not able to compete with proper high end chips. However I can confidently say that on mobile devices and laptops, competition is largely behind. Sure a 1000+$ standalone GPU will be faster, but it doesn't fit in my jeans. It's the same as comparing a Hasselblad camera with the iPhone 14 pro...
The competition is all fine, though. They have enough memory to run the models, they have hardware acceleration (ARMnn, SNPE, etc.) and both OSes can run it fine. Apple's difference is... their own set of APIs and hardware options?
How can you justify your claim that they're "largely behind"? It sounds to me like the competition is neck-and-neck in the consumer market, and blowing them out at-scale. It's simply hard to entertain that argument for a platform without CUDA, much less the performance crown or performance-per-watt crown.
Nvidia is somewhat encumbered by their need to optimize for raster performance. Ideally, all those transistors should be going toward tensor cores. Apple has never really taken the gaming market seriously. If they wanted to, they could ship their next M3 chip with identical GPU performance and use all that new 3nm die space for AI accelerators.
Is that a minor advantage? I would think that, the smaller the nodes get, the larger the impact of a 1nm difference. Because transistors have area, I think the math, in ≈transistor count would be 3nm:4nm = ⅓²:¼², and that’s 1,777… so a 3nm node could have 75% more transistors on a given die area than a 4nm one (roughly).
4nm -> 3nm no longer means size goes down as a result directly ratiometricly. You have to look at what TSMC is claiming for their improvements. They're claiming 5nm -> 3nm is a 70% density improvement (I can't find any 4nm -> 3nm claims)... so 4->3 must be much less.
Also, most folks seem to have gone directly from 5nm to 3nm, and skipped 4nm altogether.
It will be quite the showdown, then. The M1 struggled to compete with current-gen Nvidia cards at release, we'll have to see if the same holds true for M3.
A lot of people buy Android. But very few people buy Pixel:
> In a world dominated by iPhones and Samsung phones, Google isn't a contender. Since the first Pixel launched in 2016, the entire series has sold 27.6 million units, according to data by analyst firm IDC -- a number that's one-tenth of the 272 million phones Samsung shipped in 2021 alone. Apple's no slouch, having shipped 235 million phones in the same period. [1]
I've wanted to buy a Pixel for years but Google doesn't distribute it here. It's not like I'm living in some remote area, I live in Mexico, right next door.
The first couple of years I assumed Google was just testing the waters, but after so many Pixel models I suspect it's really just more of a marketing thing for Android. They don't seem to have any interest in distributing the Pixel worldwide, ramping up production, etc.
Because jayd16 was responding to samwillis's comment about Apple being in a unique position.
Part of that unique position is already being a popular product. Google adding a bunch of local ML features isn't going to move the needle for Google if people aren't buying Pixels in the first place for reasons that have nothing to do with ML.
If Google's trying to roll out local ML features but 90% of Android phones can't support them, it's not benefiting Google that much. Hence, Apple's unique position to benefit in a way that Google won't.
> number of phones Google has sold is completely irrelevant to the fact that they too do local ai
How will they make money? For Apple, device purchases make local processing worth it. For Google, who distribute software to varied hardware, subscription is the only way. For reasons from updating to piracy, subscription software tends to be SaaS.
Does Google do on-device processing? Or do they have to pander to the lowest denominator, which happens to be their biggest marketshare?
If the answer is no, then does it make sense for them to allocate those resources for such a small segment, and potentially alienate its users that choose non-Pixel devices?
Also, if the answer is no, this is where Apple would have the upper-hand, given that ALL iOS devices run on hardware created by Apple, giving some guarantees.
Pixel is just an example of Google owning the stack end to end but the Qualcomm chips in the Samsung phones have Tensor accelerator hardware and all mobile hardware is shared memory. I think samwillis was referring to the uniqueness of their PC hardware and my comment was that they're simply using the very common mobile architecture in their PCs instead of being in a completely unique place.
Google doesn’t want to run local AI. It channels everything through the google Plex on purpose.
So while pixel phones may be possible, they don’t want to.
Take image processing for example. iPhones will tag faces and create theme sets all locally. Google could too, but they don’t. They send every picture to their cloud to tag and annotate.
If anyone were to write a chronological history of regulations imposed by different authorities throughoug history I think that it is a fair assumption to make that regulations related to making bread would already show up in the first chapters of the book.
Depends on who you ask. I wouldn't trust them too much. I think their security reputation is mostly hype and marketing, which some on this thread seem to have bought hook, line and sinker.
Google has the absolute worst ARM silicon money can buy (Tensor G2), go look at the benchmarks it's comical they would charge $1800 for a phone with it.
Even with something as simple as dictation, when iOS did it over the cloud, it was limited to 30 seconds at a time, and could have very noticeable lag.
Now that dictation is on-device, there's no time limit (you can dictate continuously) and it's very responsive. Instead of dictating short messages, you can dictate an entire journal entry.
Obviously it will vary on a feature-by-feature basis whether on-device is even possible or beneficial, but for anything you want to do in "real time" it's very much ideal to do locally.
Edit in response to your edit: nope, on privacy specifically I don't think most users care at all. I think it's all about speed and the existence of features in the first place.
Apple has positioned itself as big on privacy, turning privacy into a premium product (because no other big tech company has taken that stance or seems willing to), further entrenching Apple as the premium option. In that respect I think users will "care" about privacy.
Yes. The amount of times I ask Siri on my homepod "What time is it?" and it replies "One moment..." [5 seconds] "Sorry, this is taking longer than expected..." [5 seconds] "Sorry, I didn't get that".
I have to assume this is due to connectivity issues, there is no other logical reason why it would take so long to figure out what I said for so long, or not have the data on what the time is locally.
A lot of end users do not and they have no interest in spending the time figuring it out. That's why it's very important that the companies behind the technology we use make ethical choices that are good for their users and when that doesn't happen, legislators need to step in.
Apple has been on both sides of that coin and what is ethical isn't always clear.
Local also solves any spotty connection issues. Your super amazing know everything about you assistant that stops working when you’re on a plane or subway or driving through the mountains is very less amazing. If they can solve it, local will end up being way way smoother of a daily experience.
> Do users actually care whether something is local or not?
I think most don’t, but they do care about latency, and that’s lower for local hardware.
Of course, it’s also higher for slower hardware, and mobile local hardware has a speed disadvantage, but even on a modern phone, local can beat in the cloud for latency.
Some workloads on M1 absolutely smash other ARM processors in part because of M1's special-purpose hardware. In particular, the undocumented AMX chip is really nice for distance matrix calculations, vector search, embeddings, etc.
Non-scientific example: for inference, whisper.cpp links with Accelerate.framework to do fast matrix multiplies. On M1, one configuration gets ~6x realtime speed, but on a very beefy AWS Gravatron processor, the same configuration only achieves 0.5x realtime, even after choosing an optimal threadcount, even linking with NEON-optimized BLAS. (Maybe I'm doing something wrong though).
>They are in the unique position to enable local AI tools
The only unique Apple thing here is how bad their AI products here and how behind they are in AI. This is the only thing that matters here - performance is adequate or better for the other processors out there, but you can't get anywhere without the appropriate software. Maybe they'll get smart enough to buy some AI startups/companies to get the missing talent.
Which AI products that they have actually implemented are bad? I think Siri is pretty poor to be fair and improves at a glacial pace. Pretty much everything else I'd say is state of the art from things like text selection from images, cutting out of image subject, their computational photography, even Map directions have come a long way.
When people talk about AI they mean the new tech like LLM or Diffusion, and the only relevant Apple offering (Siri) is way behind and there's no evidence they have anything to replace it.
(Aside, their image manipulation and Map is worse - though with Maps I dunno what's the underlying issue, and OCR was already mostly solved. I'm far from a photography expert so can't compare there).
True that. But I won't give Apple credit for products we can't see or assume good performance without proof. As they say in the movies: "Show me the money".
I'll say though that no multibillion company is under existential threat. Not Apple, not Google and not even Intel. At worst they will lose a couple tens of billions and some marketshare. Even IBM still exists and took a long long time to fall to where it is still today.
What most people think of as AI can be better described as generative AI. Things like LLM and image making programs like Stable Diffusion. Apple has yet to implement anything like that.
They have done a ton with ML though. Some of these accessibility features, the ipad pencil, FaceID, image cataloging, live text, etc. etc. etc. showcase how Apple can not only do ML well but also make good use of them. All of it is done on device. LLMs and image generation are other examples of ML processes that Apple could include in the OS and run locally. With all of the issues surrounding LLMs and the like I am perfectly happy that Apple has been taking its time implementing them. It does feel like they could flip a switch when the time is right and that is why people say they are in a great position.
The question is whether there will be models that can’t fit into an iPhone that apple will miss out on because they find cloud based personalization so abhorrent.
Agree these are tremendously good features and having them run locally will provide the best possible experience.
> I'm not even sure what "cloud based personalization" means to the user, other than "Hoover up all of your personal information."
It means having actually good ML.
I see so many posts around here saying Apple is absolutely well positioned to dominate in ML. It's just not true.
Nobody who is a top AI player wants to work at Apple where they have few if any AI products, no data, don't pay particularly well, not a big research culture, etc. etc.
The only thing they have going for them in this space is a good ARM architecture for low power matrix multiplication.
> I think there is an important little nod to the future in this announcement. "Personal Voice" is training (likely fine tuning) and then running a local AI model to generate the user's voice.
UMA may turn out to be visionary. I really wonder if they saw the AI/ML trend or just lucked out. Either way, the apple silicon arch is looking very strong for local AI. It’s a lot easier to beef up the NPU than to redo memory arch.
I think pretty much any multicore ARM CPU with a post ARMv8 ISA is looking pretty strong for local AI right now. Same goes for x86 chips with AVX2 support.
All of them are pretty weak for local training. But having reasonably powerful inferencing hardware isn't very hard at all, UMA doesn't seem very visionary to me in an era of MMAPed AI models.
I think pretty much any multicore ARM CPU with a post ARMv8 ISA is looking pretty strong for local AI right now. Same goes for x86 chips with AVX2 support.
Apple Silicon AMX units provide the matrix multiplication performance of many core CPUs or faster at a fraction of the wattage. See eg.
Plus, the benchmark you've linked to is comparing hardware accelerated inferencing to the notoriously crippled MKL execution. A more appropriate comparison would test Apple's AMX units against the Ryzen's AVX-optimized inferencing.
The visionary part is having a computer with 64GB RAM that can be used either for ML or for traditional desktop purposes. It means fewer HW SKUs, which improve scale economy. And it means the same HW can be repurposed for different users, versus PCs where you have to replace CPU and/or GPU.
For raw ML performance in a hyper-optimized system, UMA is not a big deal. For a company that needs to ship millions of units and estimate demand quarters in advance, it seems like a pretty big deal.
Very different. Intel Macs had separate system RAM and video RAM, like PCs.
Apple Silicon doesn't just share address space with memory mapping, it's literally all the same RAM, and it can be allocated to CPU or GPU. If you get a 96GB M2 Mac, it can be an 8GB system with 88GB high speed GPU memory, or a 95.5GB CPU system with a tiny bit of GPU memory.
Apple's GPUs are slow today (compared to state of the art nvidia/etc), but if Apple upped the GPU horsepower, the system arch puts them far ahead of PC-based systems.
That doesn't have any relevance to the efficiency and cost improvements of having the same very fast RAM connected to both CPU and GPU cores.
I can't believe anyone is arguing that bifurcated memory systems are no big deal. Are you like an x86 arch enthusiast? I'm sure Intel is frantically working on UMA for x86/x64, if that makes it more palatable. Though they'll need on-die GPU, which might get interesting.
I'm a computer enthusiast. I've got my M1 in a drawer in my kitchen, it's just not very useful for much unless I'm being paid to fix something on it. MacOS is a miserable mockery of itself nowadays and Apple Silicon is more trouble than it's worth, at least in my experience.
As I'm working on AI stuff right now, I have to be a realist. I'm not going to go dig up my Mac Mini so my AI inferencing can run slower and take longer to set up. Nothing I do feels that much faster on my M1 Mini. It feels faster than my 2018 Macbook Pro, but so did my 2014 MBP... and my 2009 x201. Being told to install Colima for Docker with reasonable system temps was the last straw. It's just not worth the hoop-jumping, at least from where I stand.
So... when a day comes where I need UMA for something, please let me know. As is, I'm not missing out on any performance uplift though.
> I'm sure Intel is frantically working on UMA for x86/x64
Everyone has been working on it. AMD was heavily considering it in the original Ryzen spec iirc. x86 does have an impetus to put more of the system on a chip - there's no good reason for UMA to be forced on it yet. Especially at scale, the idea of consolidating address space does not work out. It works for home users, but so does PCI (as it has for the past... 2 decades).
It's just marketing. It's a cool feature (they even gave it a Proper Apple Name) but I'm not hearing anybody clamor for unified memory to hit the datacenter or upend the gaming industry. It's another T2 Security Chip feature, a nicely-worded marketing blurb they can toss in a gradient bubble for their next WWDC keynote.
> They are in the unique position to enable local AI tools, such as assistance or text generation, without the difficulties and privacy concerns with the cloud.
I don't see why client-side processing mitigates the privacy concerns. That doesn't stop Apple from "locally" spying on you then later sending that data to their servers.
Ok, sure, but surely you see how it is that much harder to do?
Also since Apple is built around selling expensive devices and services you could also see why they’d have much less incentive to spy and collect data than, say, Google or Facebook?
The cynicism of “everything is equally bad so why care” is destructive.
Now. It was just two decades ago that Apple was on life support. That could happen again. And the temptation would be much stronger to start monitizing their user's data.
I'm not the target audience for these features, and don't want to speculate on behalf of others, so I'll just focus on my own needs...
Live Speech: I actually answer unknown phone numbers (usually) and would like text-to-speech on my calls because I've started to get concerned about what can be done, fraud-wise, with even small samples of my voice. So in this case, using another's voice is fine, even preferred. (Edit: I suppose I'd actually prefer a voice-changer here, which is less related to this accessibility feature. But I think Apple is unlikely to do that.)
Personal Voice: When my girlfriend texts me when I'm out with my AirPods, I think we'd both like me to hear her message in her actual voice rather than Siri's. This feature doesn't allow for that yet, but the pieces are all there.
Finally, Apple needs to detect and tell me when I'm listening to a synthetic voice, including when it's not being generated by Apple. There's fraud potential here. I'm clearly excited about this tech, but I want to know more on this front.
> When my girlfriend texts me when I'm out with my AirPods, I think we'd both like me to hear her message in her actual voice rather than Siri's
And then you can use voice to text to text her back, and she can hear it in your voice! It's just like a phone call from 30 years ago, but one that requires infinitely more processing power!
Genuinely funny reply. But (a) whether I prefer typing or speaking, and (b) whether I prefer reading or hearing, is very context-dependent and might not be the same for the person at the other end -- and so yeah I think having flexibility there is good! If I'm on AirPods and not on my phone, I'd like to hear the message. CarPlay, too. When actively on my phone, I prefer to type, even if on AirPods. CarPlay, I shouldn't probably be typing ever. So yeah, generating text and speech simultaneously and having the end result be situational is in fact a good thing.
It's worth noting this is already how iPhones work and people already love it. What I'm suggesting additionally is substituting Siri's voice for a DIFFERENT customized synthetic voice in a very specific circumstance. I'm not advocating for using synthetic voices where there currently aren't any here.
> It's just like a phone call from 30 years ago, but one that requires infinitely more processing power
There are folks who couldn’t, for a variety of reasons, do that thirty years ago. This feature is for them. The rest of us get to e.g. more naturally text a response to a call we’re listening into on a flight.
> Can't you already do that with iMessage's voice delivery feature?
Voice memos? No. I would have to, at some point, speak it. If you’re referring to text-to-speech, there is a difference between having your speech read in a different voice and your own.
On the plus side it would use less bandwidth. That phone call from 30 years ago probably used (ballpark) 64kilobits/second. This could use a lot less and have higher audio quality.
> Personal Voice: When my girlfriend texts me when I'm out with my AirPods, I think we'd both like me to hear her message in her actual voice rather than Siri's. (This feature doesn't allow for that yet, but the pieces are all there.)
Neat idea, same with carplay, having it reliably imitate the voice of the person who sent the message would make it a lot nicer.
Though they would need to get all the TTS and intonation right first, which IME is not the case, I think having the right voice but the wrong intonation entirely would be one hell of an uncanny valley.
Surely Shatner has sold his voice already like Bruce Willis did? I'd pay to hear my morning information read in the voice of the shat, in the format of a captain's log.
A voice changer for unknown or blocked caller id's is actually a great idea. Just looked in F-Droid and the Play Store, where there seems to be no such app. Preventing callers from sampling my voice is a concern I did not know I have.
A couple weeks ago ago, I looked into building it for iPhone, but there was no way in iOS to integrate it. So now I'm just hoping for Apple to do it. This article was what prompted the thought: https://www.washingtonpost.com/technology/2023/03/05/ai-voic...
>because I've started to get concerned about what can be done, fraud-wise, with even small samples of my voice.
This vector was recently highlighted as a weakness of the Australian "Voiceprint" system(1)
It's also an excellent point to make that these tools can be useful for everyone. I find myself using a number of the accessibility tools simply to speed up some of my common interactions with the phone and watch.
I've also noticed that these technologies end up in other products. For example livetext is now a standard feature on macOS/iOS yet it's an accessibility feature originating in the screenreader to deal with text flattened in images. This technology sharing also gives us a bit of a preview of what they're working on (e.g. AR)
I was thinking it would be fun just to have Siri use a Personal Voice, but your idea is better! And it could just offer to make your Personal Voice data available along with your Contact picture.
While that sounds like it would unlock some very cool experiences it also scares me to think about the potential abuses of making personalized voice models fairly easily available. It seems like the sort of thing that would need to stay secure on your own device. It would be great to see some kind of middle ground where a text to speech mechanism would generate audio output and send that, rather than make the model itself available.
> Personal Voice: When my girlfriend texts me when I'm out with my AirPods, I think we'd both like me to hear her message in her actual voice rather than Siri's.
It's also hard to do privacy-wise unless every text message she sends pre-generates the audio message using her on-device voice and then attaches it. That would make every message use 10x as much bandwidth, storage, and battery power. (10x is a random number but you get the point). Seems cute but really impractical.
I would think that your phone could request audio messages from the sender only when necessary. They already sync things like your DND status to show others so this would just be another flag. Messages could also then alert the sender that their message may be read aloud in their Personal Voice. Or maybe allow turning this on per conversation.
10x compared to what though? FaceTime (and similar) is already full-duplex video and audio, which I have to imagine is at least another 10x on top of what you’re describing. Are we really budgeting our computer resources so strictly that this would even show up as more than a rounding error?
I think I'd rather just eliminate the ringing portion of a phone call (when I have airpods in) and instead just let a trusted list of contacts talk directly into my ear (1-way) until I "answer" and open up a 2-way channel.
The "answer" button seems to serve to also say "I'm ready to listen" as much as it does to say "I'm ready to say". IMO this kind of goal is better covered by voice messages, or, at least, starting with a voice message. This allows the receiver to pick when they are ready to hear it (including immediately), replay it as needed, and choose when they respond (if at all). Many of these are benefits for the sender as much as the receiver.
Yeah I want “Apple Intercom” or something where me and my girlfriend can have linked airpods and it’s like a spy movie. Perfect for communicating in loud places or crowds without requiring data.
Not sure if you ever had a Nextel phone, but they had a cellular walkie talkie feature for awhile. It was pretty popular in my high school circa 2003, but I remember kinda hating it.
> Finally, Apple needs to detect and tell me when I'm listening to a synthetic voice.
There's already some version of this in "Hey Siri" detection -- if I record myself with a prompt and play it back, my HomePod briefly wakes up at the "Hey Siri" but turns off mid-prompt. I guessed it was some loudspeaker detection using the microphone array, but it could be a mix of both?
I was thinking this exact thought. I want Samantha reading my notifications. I desperately want an AI like her to be able to talk to. ChatGPT is getting there.
I wish they'd have a year of fixing accessibility bugs instead of making feature after feature. Like they had one release of 16 where if you opened the notification center with a Braille display connected, which is crucial for Deaf-Blind people, the display would disconnect. This was brought up during betas, but it still made it into production. Now there's this bug where if you're reading a book with your Braille display, after a page or two you can't scroll down a line to continue reading. Also they've been working on a new speech platform, and it's pretty buggy.
I'm not saying Android is any better. We get a new Braille HID standard, in 2018, and the Android OS doesn't support it yet. So what does the TalkBack team have to do? Graft drivers for each Braille HID display into TalkBack, Android's screen reader like VoiceOver, with another driver coming in TalkBack 14 cause of course they can't update Android Accessibility suite like they do Google Maps, Google Drive, Google Opinion Rewards, and even Google Voice gets an update every few weeks. I mean, the accessibility suite is not a system app. If it were, it could just grab Bluetooth access and do whatever. But it's not, so it should be able to be updated far more frequently than it is. It's sad that Microsoft out of all these big companies that'll talk and talk and talk, which doesn't always include Apple, that has HID Braille support in the OS. Apple has HID Braille support too. Google doesn't, though, neither in Android or ChromeOS. They just piggy-back off of BRLTTY for all their stuff.
It’s not just accessibility, everything about their operating systems is crawling with bugs piling up on each other. They have bugs in Passwords, bugs in Shortcuts, bugs in permissions, bugs in Clock… I can no longer even trust that setting an alarm will be done for the correct time.
I'm guessing this is the "I wants alarms to be pinned to a specific moment in time and not the time the alarm is set to in the current timezone" thing.
It would be nice if this was an option but I can't really fault them for not including it since it's kinda niche.
As I wrote, I can no longer trust the alarm. I’m talking about a supported use case which broke, not a niche and unsupported situation.
This seems to be have been fixed in 16.4, but before that I would:
1. Pull down to show Spotlight search.
2. Type “alarm” and tap “Create Alarm”.
3. Set a time and tap “Done”.
The feedback message would tell me the alarm was set to a different time, with multiple hours of difference.
This is using only first-party features to do a basic task and even that didn’t work right. I could reproduce it reliably. Luckily the message was correctly showing the wrong time that was set and I double-checked it to notice.
All the other issues I mentioned still have open feedbacks about it. All but one are regressions, the other is a security flaw which has always been there.
"if you opened the notification center with a Braille display connected"
This sounds obvious, but imagine all the combinations of all the features that can interact, and then imagine having to test them all manually, because there's no automated model for specific Braille displays.
For decades, integration testing teams outnumber developers. It's just a hard problem, particularly at Apple's scale. It's not unlikely that there are only 10's of users experiencing a bug (though this one likely has thousands), and that it would take doubling or tripling the size of teams to find all these bugs before release.
"Assistive Access distills experiences across the Camera, Photos, Music, Calls, and Messages apps on iPhone to their essential features in order to lighten their cognitive load for users."
I might use this myself even though I'm not disabled.
Not all disabilities are permanent! Sometimes you're just situationally disabled, i.e. you're in the car, or carrying a child, or sleep-deprived, or just stressed out.
Agreed. Presenting this feature solely as a tool for users with cognitive disabilities might undersell its potential. There's a significant number of smartphone users who only utilize a small fraction of the available features and would prefer a simpler interface focusing on the 10% of features they actually use. Interestingly, this demographic includes both less tech-literate users, who might feel overwhelmed by the complexity, and extremely tech-literate users who know exactly what they need and prefer clean, distraction-free tools.
My 84 year old mother does remarkably well with her suite of Apple products. However, it would be nice to simplify some things. She only calls or chats with a limited number of people. Reworking the phone and messenger apps to use large pictures of the people would be a benefit. Something similar for the camera / photos app would be great as well.
You can pin messages in iMessage, which turns it into a big photo. Also when you hit the search button in Photos it’ll show you big images of people’s faces if you want to see photos of a particular person (if you’ve trained your phone on those faces).
I do wish the “favorites” view in the phone app made the headshots big like iMessage, though.
Finally! Grandma mode! With how hard it is to find uncomplicated dumbphones that work on AT&T, this will be an alternative for if we need to get my grandma another phone.
I find it interesting that they are not waiting to talk about this at WWDC since I assume these features are tied to iOS 17. I wonder if that means that the simplified apps won't be available to developers (I don't see an App Store so I guess that makes sense).
All of these features do seem really awesome for those that need it. Particularly the voice synthesis. I honestly just want to play with that myself and see how good it is and I am curious if they would use that tech for other things as well.
The whole new simple UI I can really see being a major point. Especially if it includes an Apple Watch component and can still be synced with one for the safety features a watch has. Particularly fall detection.
Edit:
Maybe I missed it but this brings up an interesting problem. Is the Voice Synthesis only stored on the device and never backed up or synced to a new device. Can you imagine your phone breaking or you needing to upgrade after you lost your voice (but you had previously set that up) and you can no longer use it?!?
I fully understand the privacy concerns of something like this being lost. But this isn't like FaceID that could just be easily re-created in some situations. So I really hope they thought about that.
Apple has a way to encrypt sensitive data locally and store to iCloud, then de-encrypt it locally later. They are kind of hit-and-miss with when they do this. But they certainly can do it, so it should be possible to securely back up a voice profile.
FaceID is not backed up to iCloud. That is in part because it is a local-only feature. And it is in part because people’s faces change over time, so requiring them to re-enroll their face with each new phone ensures accuracy over time.
It may also be sensor-dependent; the model produced and stored by an iPhone 14 might not “make sense” to an iPhone 16, if the hardware is different.
True, but they have made choices of where they will and will not do this in the past.
I can see the privacy reasons not to have this data sync, but given the reasons you would be doing it in the first place the risk of loosing those recordings or the data for the voice would be a huge risk.
I don't doubt that Apple could sync this data, I just hope that they are. I don't see anything about that happening on this document so I worry that they won't for privacy concerns.
If they didn't save Personal Voice for WWDC, just think what they may have ready to announce there. On its own Personal Voice would have been a headline grabbing announcement, but they dropped it now. That suggests to me exciting things.
I have been starting to wonder if this really is going to be a very packed WWDC. Between this announcement and the Final Cut Pro (and the other tool I don't remember now) announcement.
This is a pattern they have gotten into in the last few years. Announce accessibility features a few weeks before WWDC. I think it’s so they don’t have to spend time on important features for the target demographic knowing that they will get swamped out by all the major OS features in the news cycle.
For ease of reading for other users, here is a quote from the article about the Personal Voice feature:
"For users at risk of losing their ability to speak — such as those with a recent diagnosis of ALS (amyotrophic lateral sclerosis) or other conditions that can progressively impact speaking ability — Personal Voice is a simple and secure way to create a voice that sounds like them.
"Users can create a Personal Voice by reading along with a randomized set of text prompts to record 15 minutes of audio on iPhone or iPad. This speech accessibility feature uses on-device machine learning to keep users’ information private and secure, and integrates seamlessly with Live Speech so users can speak with their Personal Voice when connecting with loved ones.
“At the end of the day, the most important thing is being able to communicate with friends and family,” said Philip Green, board member and ALS advocate at the Team Gleason nonprofit, who has experienced significant changes to his voice since receiving his ALS diagnosis in 2018. “If you can tell them you love them, in a voice that sounds like you, it makes all the difference in the world — and being able to create your synthetic voice on your iPhone in just 15 minutes is extraordinary.”
You could also just record some conversations with them? I think hearing them actually speak about their experiences and memories would be much more impactful than hearing a simulation of them, no matter how accurate
Honestly, I wish this was around 10 years ago to save my late mother's voice. I saved all my voicemails I have from her and wish I could hear more from her.
These features that will be built into the operating system are things that otherwise would have cost someone hundreds to thousands of dollars before... if they even existed at all.
I love that Apple chooses to invest in these areas of accessibility that are extremely difficult to make sustainable businesses out of without charging users exorbitant sums.
The is is amazing. I have a non speaking person in my life and the idea of them being able to create their own unique voice is so powerful. Stephen Hawkins kept his voice synth years after better tech was available because he wanted to keep his voice.
If anyone here is interested-in or working-on open-source tech for the non speaking please reach out. I’m building tech that uses LLMs and Whisper.cpp to greatly reduce the amount of typing needed. What apple has here is great, it still requires typing in real-time to communicate. Many of the diseases that take your voice also impact fine motor control, so tools to be expressive with minimal typing are super important. Details (and link to GitHub projects) here: https://scosman.net/blog/introducing_voicebox
Marketing matters. Apple's "narrative" is breaking their "privacy is a human right" motto so they can kowtow to China and keep their manufacturing margins. That's not very reassuring to people who want truly private AI hardware, but Apple can fix that with marketing.
If you trust them to provide a secure model, I'd wager you haven't fully explored the world of iCloud exploitation.
I trust them more than I should but less than I could.
In theory researches and hackers can and do watch network traffic while training an on-device model. Apple doesn't want their position of "on device AI is better for the consumer" to be tarnished, so I trust them to not screw around with this.
I, for one, don't want a voice cloner in every hand. These things are very, very destructive. Obviously, this is a post about a new apple feature (even though great voice cloning has existed for months) so all the fanboys are unable to control their enthusiasm.
Apple's assistive and accessibility features (on mobile) has always been way ahead of Android's - but this taking it to the next level. Talk about a moat.
Android has on-device speech to text for what is happening around you and what is going on in phone calls built-in, which iOS still cannot do, and allows third party accessibility services. Apple remains far, far behind.
In general, just like with applications that aren't for accessibility, it is better to use a platform that lets you customize the system to fit your specific needs. We, as technologists, should understand this better than most. iOS simply fails here.
But "We, as technologists" need to understand that that customization is a hard thing for many people. Most people, especially the people that part of what is announced today is targeted at, need something that is baked into the operating system and easier to manage.
If an accessibility feature requires someone to get the help of someone else to setup it is already a failure right out the gate. They are still reliant on someone else to help manage their phone.
Now yes there are situations where this is impossible to avoid, particularly for vision impaired people since you need to first set that up (but even that there are attempts to address this by the phone setup having those systems turned on by default).
But those are the exceptions and should not be the rule for accessibility features.
Edit:
Just to be clear there is obviously a market for highly customizable accessibility tools similar to the Xbox Adaptive Controller which would need someone else's assistance with.
But not everyone needs this level of support and where possible we should be making tools that allow someone to be fully self reliant and not rely on someone else for setup.
I’ve been digging into Apple’s (existing) accessibility features last week and it made me think about what (for lack of a better term) an “accessibility first” app architecture might look like.
Current app UIs are all about how to visually represent the objects you can interact with and the actions you can take on them, and then accessibility features are layered on top.
(Warning: half-baked idea that’s probably been tried countless times) what if you started with semantic information about the objects and the actions you can take on them and then the GUI is just one of several interaction modes (along with voice, keyboard, CLI, etc)?
This was the original premise of the Smalltalk Model-View-Controller architecture. (Not to be confused with web MVC, which reused the name for something else.) The “Controller” in Model-View-Controller refers to the input device, and a given Model can have multiple independent Views and Controllers. The “Model” is supposed to an object that’s a semantic representation of the underlying thing being manipulated.
One of the visible impacts of MVC is that changes occur in real-time: you modify a value in a dialog, and everything affected by that value instantly updates. This is already common in Mac apps (in contrast to Windows apps, which typically want you to press “ok” or “apply”), so it wouldn’t surprise me if Apple was already using a modern MVC variant. It’s a well-known pattern.
In the Apple documentation for MVC, "controller" refers to a class that sits between the model and view. When data changes in the model, it updates the view; and when the user interacts with the view, it passes events to the model.
Like you said, this separation means you can "drive" the same model through different UIs. That's one of the things I always thought was cool about AppleScript support -- the app exposes a different interface to the same model.
Obviously developers who design semantically for different UIs would be far, far ahead. And Apple API's can be used there.
The harder problem is building accessibility API's for oblivious developers, so they can retrofit at the last minute before release (as usually happens). Apple's done a pretty good job there, harvesting the semantics of existing visual UI's to adapt for voice/hearing.
It would be totally within their MO to suddenly wake up, and turn this cutting edge AI stuff, which no one is quite sure what to do with, into a killer app with super high quality. Fingers crossed.
This announcement links to demo reels of actual users. More of this, please.
I wish Apple would produce Starfire style demos showing off their products. In context narratives showing how real people use stuff. Covering features old and new.
I so want the voice interface future. I became addicted to audiobooks and podcasts during the apocalypse. Hours and days outside, walking the dogs, hiking, gardening.
I made multiple attempts to adapt to Siri and Voice Activated input. Hot damn, that stuff pissed me off. Repeatedly. So I'd have to stop, take off gloves, fish out the phone, unfsck myself, reassemble, then resume my task. Again and again.
How very modal.
This mobile wearable stuff is supposed to be seamless, effortless. Not demand my full attention with every little hiccup.
So I just gave up. Now I suffer with the touch UI. And aggrevations like recording a long video of my pocket lint and invoking emergency services.
Maybe I should just get a Walkman. Transfer content to cassette tapes.
One of the most useful accessibility features, as someone who doesn't require any, has been the live caption feature. I have no hearing or vision impairments, but I do struggle with ADHD. It was launched a while ago, along with audio detection for things like doorbells or alarms. I was hoping multi-language translations to be announced along with this. Currently things are still limited to English (U.S.) or (Canada) audio and text.
It's been so helpful to be able to read a short transcript of what was just said. The live caption feature works on video calls, and while multi-speaker captioning isn't perfect, it mostly works. I also really like how Android, iOS, and Windows all seem to keep feature parity among the operating systems. I wonder how Google and Microsoft will respond to this and I wonder how Personal Voice will work for communication outside of the Apple ecosystem.
Off-topic, but I'm getting tired of Apple creating a Proper Noun for every new feature they want to include in marketing. They're always so vague and/or obtuse that I keep forgetting them and what they mean. As an example, I thought I knew what ProMotion was but am realizing now that I was confusing it with True Tone.
Honestly without needing to personally use the accessibility features, I think I might be taking a look at the new layouts, because I really yearn to make my device more minimalistic, there is just to much going on in modern devices.
But Bravo to Apple once again for doing excellent accessibility features and continuing to improve them.
I have a habit of checking the accessibility features from time to time, there's great stuff in there :)
My favorite is Spoken Content > Speak Screen: you can swipe down with two fingers from the top of the screen and it'll read the written content for you. I use it to read articles in Safari while I'm brushing my teeth or walking my dog
Accessibility makes life better for everybody. A lot of accessibility boils down to rethinking the obvious to enable _more_ use cases. "Things that might make your life better." Dig around in those settings and you might find your phone can do things you never thought of.
There are so many. One feature most people don't know about is macOS can speak the current time every 60, 30, or 15 minutes. (Settings > Control Center > Clock) It's a very old feature.
Most people can't understand why they would want a computer speaking the time, it would drive them nuts. I have ADHD and no sense of time passing. It helps me offload keeping track of time. (Which is otherwise continuously looking at a clock.)
I also turn off every auto-playing feature in any app that supports it. Some types of motion can be highly distracting. That can trigger a panic attack if I'm constantly needing to redirect my focus away from it. (If this sounds strange, it causes me to feel trapped in a tiny closet. Anxiety is a bitch.)
Google's latest video conferencing iteration lets people spam flying emoji. It's a "fun" feature that is absolute hell for me. Fortunately, there is a buried setting to remove it from my view.
I never thought about using that feature but I should for the same reasons. I have a LCD digital clock right under my monitor even tho the OS has a clock of course. But as soon as I'm "immersed" in something, or the menubar is obscured bc I do keep some apps fullscreen, time does not exist.
I know there's a lot of concerns about generative voices, but it's a shame the only solution is 'insist on a live recording of random phrases'. For those whose voices have already degenerated, but who have hours of recordings available from historical sources (e.g. speech therapy sessions), it's too late to ask them to record something fresh.
Don't get me wrong, it's fantastic to see the tech being used this way. My Dad has lost most of his speech due to some kind of aphasia-causing condition and if this had been available just a year or two ago it could've been a big help for him; it's reassuring that others earlier in the journey will benefit from this.
It's not necessarily the only solution, just the one Apple has started with. My guess is that at least in part there are technical reasons, since even this structured approach will require overnight processing (source: WaPo) by Apple.
I do think additionally, in the case for a situation like with your dad, Apple would need to verify that he is the same person that there's hours of audio of. That strikes me as a difficult at Apple's scale.
Not sure why everyone thinks local language processing, generation and voice synthesis has anything to do with privacy. Apple's business model has supported their relatively pro-privacy approach (along with how bad they are at a lot of cloud services) but doing anything useful with a 30B token LLM stored on your machine is still going to require the Internet, and your AI Personal Assistant will be just as tracked and cracked as you are today.
If you haven't looked at the accessibility features on your iPhone, you should check them out. There's some interesting stuff there already. Though it's a little awkward to use, the facial gesture functionality is really interesting.
I feel that many features currently focused on accessibility will soon be integrated into the main user interface as AI becomes a more important part of our computing experience. Talking to your phone without a wake word, hand gestures, object recognition/detection, face and head movements, etc. are all part of the future HCI. Live multimodal input will be the norm within the decade.
It's taken a while for people to get used to having a live mic waiting for wakewords, and it'll take them a while to get used to a live camera (though this is already happening with Face ID), but sooner than later, having a HAL 9000 style camera in our lives will be the norm.
a) These features are truly moving. I didn’t expect to get emotional from a press release.
b) It’s gonna be a bonkers keynote if they’re releasing this before WWDC.
I’m exited to try Personal Voice feature. If it happened years ago I could create the voice of my parents, grand parents and other close people to me.
It will be wonderful to hear their voice again.
Hmm I wonder if we can extract a voice from a video and feed it to Personal Voice.
This is great, though they should really fix accessibility basics: keyboard control in App Store etc. I contacted Apple Support about keyboard control of the Look Around feature in Maps, not sure they even understood the request. Directed me to an unsolicited ideas disclaimer.
When my grandfather lost his ability to speak, it was frustrating for him because he was such an intelligent, articulate person. Having a disease that renders your voice useless and then having to communicate via a small whiteboard for him was not fun. He couldn't write fast enough and often times felt like a burden because we would all be waiting to see what he wanted to say.
the Live Speech features for people with ALS will be a game changer. It saddens me this wasn't around when my grandfather passed, but am optimistic others who have this horrendous disease can utilize it to continue to communicate with their families even when ALS has taken their voice.
Finally! Such a simple idea thats taken so long. My parents & grandparents are going to LOVE this.
Only thing they need now is a simplified user interface for Apple Podcasts, 1 click button to listen to the latest episode of a certain podcast on there
The main issue I see older people have with iPhones is unrelated to ui complexity. iOS’ worst parts are services/updates where the user is frequently asked for Apple ID passwords and shown confusing ads for Apple services like iCloud Photo Library that end up making photos more complicated to use as some pictures are now unavailable without internet. More parts of the iOS experience are getting unnecessarily tangled with server side services that require unpredictable prompts to get acceptance to terms and conditions, logins, and traps that make you set up two factor auth or buy an unnecessary subscription.
These things are the bane of my existence with my Mom's iphone. She is blind but is not able to use all the swiping/tapping features of Voiceover due to a tremor. She mostly uses the phone with Siri only and that works well enough until a popup decides to appear.
I agree that apple prompts for iCloud passwords way too frequently and doesn’t have the ability to authorize with another device (despite the fact that you could reset your password from another device).
“Type your password to buy this item on the App Store”. Why?
> “Type your password to buy this item on the App Store”. Why?
Because store items can be quite expensive (particularly the microtransactions), and it's using a stored credit card. I too get bugged by the plethora of prompts, but this is a good example of when to use one. Plus, after doing it once you can tie it back to Face ID or Touch ID and not have to enter it again for quite some time.
Financial transactions are held to a higher standard (i.e. there's a bunch of laws around them) than device access. That said, you can make your device passcode as complex as your Apple ID password if it's a tradeoff you're willing to make.
Device asks me to type my kid’s password in. Kid is in my family group. After authorizing my kid, I have to still approve on my device. If I forget kid’s password, I can reset on my device.
Zero extra security forcing me to type in his iCloud password on his device, especially given I have typed it in the passed.
It’s not financial requirements but instead layziness.
I wonder if the “make the buttons larger” features will also help apps prepare for when we are using them on virtual reality and augmented reality, where they need to be bigger and with fewer options.
This is completely incomprehensible to me.
Okay, Russia's image is far from being the best now, and it's hard for many to even stand next to it. But Apple does not go away and continues to collect money.
Yes, and they speak Russian not only in the Russian Federation. Almost the entire former CIS could use it, but here they cut it down to only Ukrainians .
Disabled people from Kazakhstan, Belarus and other countries somehow guilty?
Assistive Access looks great! This is perfect for my grandfather with Parkinson's, who often struggles with his phone due to the deterioration of his motor skills.
Same thought here! it really sucks that 3g networks shut down making most flip phones functionality useless. there really aren't any modern smart phones designed for people who want something easy to use which makes this announcement so exciting
Another underserved group of users is people with ADHD. I think a "minimize distractions" mode at the OS level would be very welcome by a lot of people.
As @dagmx said above, we've already had a great bunch of features added in the last 18 months, time and geo sensitive focus modes that can lock a device to only certain apps and/or certain features, notification center batches notifications up instead of them being drip fed
So I agree that easily available speech controls would be a nice benefit, but Books is fully compatible with VoiceOver already (and has been since VoiceOver’s launch).
This is problematic because it competes with audiobooks. Presumably Apple wants to sell audiobooks and also doesn’t want to piss off publishers. Kindle provides a (mediocre) text-to-speech feature, but only for books where publisher have consented to it. Apple’s strategy seems to be to have publishers (rather than readers) use their AI-powered digital narration service: https://authors.apple.com/support/4519-digital-narration-aud...
Always nice to see accessibility features. I would love to see a way to share settings. There are so many hidden features and iOS settings is getting crowded and is very difficult to use. A "simplified" mode where only few selected apps are available like they have shown in the press release would be nice even for not so challenged people.
Off-topic: when you work with multiple monitors, wouldn't you like some sort of eye-tracking mechanism that could identify what window you're looking at, and immediately shift the focus to that?
I have several consoles open in different monitors and sometimes I accidentally run commands in some because the focus is in some other one :/
I’d love to learn about Apple’s Accessibility design process, as their features are much more advanced than other platforms. As in what sort of user research, audience research is done; how they decide on which communities to support, and which features to build.
That assistive access home screen is essentially what I had whittled my parents phones down to. But they would inadvertently swipe in some direction and get stuck somewhere and become confused. Big thanks to Apple for this.
Frankly, I think Assistive Access could be great for a lot of boomer parents/grandparents. My dad is unwilling to learn how to use a smartphone and this could be sufficiently approachable to let him overcome his fear of it.
They would enable 10x more people if they allowed to use their assistant with 3rd party language recognition service that supports languages remaining 60% of worlds population...
I guess I'm happy about this... but I find it kinda sad too. Apple used to be the minimalist, clean, and champions of the "it just works". Now so many people are getting lost in the gestures, repaint/caching, warring apps and incompatible standards, that this pairing down is increasingly necessary.
It strikes me as something of a halfway point to a phone version of CarPlay. A safety feature that needs immediate investment. Car focus is overly prescriptive, and focus in general doesn't make my phone really safe in the car. And DO NOT start with me about Siri.
I could rant forever, so don't let that take away from the meat of this announcement.
Can I read an ebook to myself using personal voice? That’s what I really want. Or have my dad train his voice and listen to him read me books whenever I like?
“Apple has announced new accessibility features for its devices, including Live Speech, Personal Voice, and Point and Speak in Magnifier. These updates will be available later this year and will help people with cognitive, vision, hearing, and mobility disabilities. Live Speech will read aloud text on the screen, while Personal Voice will allow users to create a custom voice for their device. Point and Speak in Magnifier will provide a spoken description of what is being pointed at. These new features are part of Apple's ongoing commitment to making its products accessible to everyone.”
This is completely incomprehensible to me.
Okay, Russia's image is far from being the best now, and it's hard for many to even stand next to it. But Apple does not go away and continues to collect money.
Yes, and they speak Russian not only in the Russian Federation. Almost the entire former CIS could use it, but here they cut it down to only Ukrainians.
Disabled people from Kazakhstan, Belarus and other countries somehow guilty?
It's not unusual for Apple to build the pieces it needs for "moonshot" efforts (e.g. a search engine good enough to replace Google) over the course of many years. Apple likely started thinking about this over a decade ago, and you can see precursors in things like Siri suggestions and search, App Store and Apple Music search, Apple Photos search, etc.
I 100% believe that Apple is putting real effort into co-design of these features in a way that other similarly positioned companies do not.
As someone with a disability, these features—even those do not cater to my disability—speak to me in a much more direct way than the typical “let’s guess what the disabled people want” bucket of accessibility features.
I’ve just set a white wallpaper on my iPad and the clock and date are displayed as block on white. On my dark wallpaper the clock and date are displayed inverted as white. This works with dark or light mode also with background dimming on or off.
On the same page is a Zoom option that gives you a magnifying pane you can move around, and I know there's a way to increase the system font size as well.
I think there is an important little nod to the future in this announcement. "Personal Voice" is training (likely fine tuning) and then running a local AI model to generate the user's voice. This is a sneak peak of the future of Apple.
They are in the unique position to enable local AI tools, such as assistance or text generation, without the difficulties and privacy concerns with the cloud. Apple silicone is primed for local AI models with its GPU, Neural Cores and the unified memory architecture.
I suspect that Apple is about to surprise everyone with what they do next. I'm very excited to see where they go with the M3, and what they release for both users and developers looking to harness the progress made in AI but locally.