Blind person using Apple products here, and at least for phones, I agree. I wouldn't say it's exclusively because of iPhone, but a large part of my independence is definitely it. There have been problems, bugs that go unfixed for years, MacOS VoiceOver is quite a disaster even though I do still use and enjoy the platform overall, and anything worth using can be criticized I think. But iOS has so many features built in that help me every single day. VoiceOver, but also all of the features utilizing vision like door detection, OCR, etc. they're in the magnifier as well so you don't need VoiceOver enabled to play with them, and I think a number of them also require a lidar sensor?
Anyway, my phone is such an important companion wherever I go that I keep several magsafe batteries on me whenever I leave the house for a significant time. It has made an absolutely huge difference in confidence. It is definitely one of the single most important assistive tech devices I have together with my computer.
It is just random bugs. Switching punctuation schemes. The terminal doesn't read very well, VoiceOver loves to say "not responding" in Safari and locks up, live regions don't always read correctly, quick nav (basically automatically holding down the voiceover modifier so you can more quickly use navigate through the screen) adds random delay to each key press, it's just lots and lots and lots of small issues like this that compound. This is just a small list of them. None of them are a huge problem by itself, but combined they do make things frustrating sometimes. And then of course the ability to script badly behaving, or completely inaccessible, apps is just missing, so you can't fix apps even if you knew how to.
And of course VoiceOver on the Mac is all you get. So if you don't like it, tough luck. You won't ever get a real alternative that can access what VoiceOver can.
I honestly don't know how anyone thinks terminal support in VoiceOver is acceptable--it's virtually unusable. It's so bad that I used to fire up a Windows VM just for a functioning terminal, and while I was at it, I'd browse the web and use Notepad++ there, because Windows accessibility is just better (I used NVDA). But then I discovered Fenrir, figured out that it worked with Vim (NVDA doesn't), shut down the VM, and never looked back. Today, I use Wezterm, which VoiceOver doesn't read at all. In my case, that's good, because the only thing I want talking in the terminal is my terminal screen reader (I started writing my own, and it's my daily driver).
To be fair, reading the terminal is a completely different beast from reading a GUI. In addition to building a static view of the screen for review, you have to handle dynamic updates (auto read). Cursor movement tracking, figuring out when to read what, when not to read (an f just appeared on my screen, but I just typed the letter f; if key echo is turned off, I don't want to hear "f"). If a line was just added, it should be read, but if my cursor was moved to a different line, I want to hear the line it moved to, but not if that line was just read because it just appeared. All sorts of rules you sort of discover as you go. But the one thing you definitely don't want is for any new change to interrupt what was already being read, and that's exactly what VoiceOver does.
> VoiceOver loves to say "not responding" in Safari and locks up,
I wonder, what's the correct solution for this ? Because so many apps I use including browser are definitely "not responding" multiple times per day for various reasons (full ram, internet stall, etc.)
Using VoiceOver compounds the not responding issue. I don't know how its internals work, but I imagine it tries to keep a view of the window's state--tree of elements, ETC. If the window has a lot going on, VoiceOver can get really sluggish, and I think it must somehow block the underlying app's ability to send/receive events, because you will press VO+right arrow to move to the next element, VO says "Safari/Chrome/Brave" not responding, and if you open up Force Quit, it reflects the same there. Reading a large diff on GitHub flat out doesn't work for me at all. Also, sometimes when navigating certain webpages, VoiceOver will just outright crash. Luckily, it does restart itself (not that pressing CMD+F5 is hard), but then my focus is moved to a completely different part of the page.
Will drop this here in case you’re not aware of it (but I’m guessing you probably are), sorry if a bit off-topic.
I’m low-vision and made great use of Microsoft Soundscape until it got discontinued. I’d been waiting for an alternative for ages and didn’t realise one actually got released and is on the app store!
I absolutely LOVE! Voice Vista. It is an amazing bit of software. I wasn't able to use SoundScape when it first came out because it was never made available in my region, but VV is, and I would never want to miss it anymore when traveling. I love it. A lot.
For what it's worth, text selection has been badly broken on iOS for at least a decade and autocorrect has been steadily getting worse for probably the same amount of time, and these are features that affect the mainstream segment of Apple users on a daily basis. Apple seems generally happy to let bugs go unaddressed for years and years regardless of how many people they affect or how often.
It’s really really inconsistent. Sometimes select all is available, sometimes not. Sometimes the handles don’t work. Selecting text in a scrollable region is fiddly, etc.
I’ve seen an insane drop in the quality of swipe typing recently as well. To the point where I’ll often go back to regular typing. I’ve made maybe six or more corrections just to this paragraph alone.
I think swipe typing suggests words inconsistent with any higher level language model, like word tuples, when proposing words which are possible matches for letter sequences swiped.
and it drives me crazy too.
I've just had good luck it seems with text select.
Have you found any way to do a Find within a span of text on iOS? That would be very useful, but I haven't seen it.
Excuse my language here but: I fucking love this! My mom pretty much mirrors your experience. I purposefully left out macOS and voiceover. I would almost call it unusable, sadly. The amount of key layering that voiceover and macOS in general has makes it very hard to use.
I’ve been hacking on a macOS app that leans on LLMs, vision use, and the AX macOS APIs to try and make voiceover less.. prickly haha. Hoping to visit in person soon to watch her use it :)
Not to mention that this seems to completely ignore all the things that we might use computers for. Browsing websites is only one of the things I do. Many of the things I do I think would be extraordinarily clunky through natural language. Also I just do not feel comfortable talking to my computer out loud, especially when I'm anywhere with other people around. Or I don't know... playing games with friends on voice chat. It seems to be common for people to assume that a fix is very easy and simple. LLM's, OCR for screen readers, etc. If it really was as simple as just slapping OCR on everything, it would already have happened. Also I definitely like some privacy and would prefer my computing not to happen entirely through OpenAI, Anthropic or Google, and whether someone can use computers well or not, we shouldn't force them to do that exact thing. At least in my opinion. And that doesn't even go into the costs associated with all of that LLM usage.
You are correct. At least in my case, more synthetic voices like Eloquence are easier to understand at high speeds especially because of their 'formulaic' nature. You don't listen to each individual phoneme or letter, you listen more for groups of syllables, tone, etc. The more unpredictable the text to speech, the harder this is. Also, performance is another big point. If you have large bits of silence at the beginning of the audio, or slow attacks, then the responsiveness will suffer, whether that's because of the actual audio itself, or the generation time.
Some of this is surely ssubjective, but I'm pretty sure I'm not the only screen reader user with these opinions.
And sadly it is also not accessible to screen readers. VS Code for all its flaws is really, really good for screen reader accessibility. In fact, I'd go as far as to say that it's not only one of the most accessible code editors we have, but one of the most accessible electron apps overall. So losing it to this Microsoft stuff would be a huge deal to anyone who relies on screen reader or accessibility tools. :(
I just cracked open osx voice over for the first time in a while and hoo boy, you weren't kidding. I wonder if you could still "stun" an LLM with this technique while also using some aria-* tags so the original text isn't so incredibly hostile to screen readers. Regardless I think as neat as this tool is, it's an awful pattern and hopefully no one uses it except as part of bot capture stuff.
Do screen readers fall back to OCR by now? I could imagine that being critical based on the large amount of text in raster images (often used for bad reasons) on the Internet alone.
Sounds like a potentially useful improvement then.
I've had more success exporting text from some PDFs (not scanned pages, but just text typeset using some extremely cursed process that breaks accessibility) that way than via "normal" PDF-to-text methods.
no, it is not. simple ocr is slow and much more expensive than an api call to the given process. on the positive side, it is also error prone and cannot follow the focus in real time. no, adding ai does not make it better. AI is useful when everything else fails and it is word waiting 10 seconds for an incomplete and partially hallucinated screen description.
Huh? Running a powerful LLM over a screenshot can take longer, but for example macOS's/iOS's default "extract text" feature has been pretty much instant for me.
is "pretty much instant" true when jumping between buttons, partially saying what you are landing on while looking for something else? can it represent a gui in enough detail to navigate it, open combo boxes, multy selects and whatever? can it make a difference between an image of a button and the button itself? can it move fast enough so that you can edit text while moving back and forth? ocr with possible prefetch is not the same as object recognition and manipulation. screen readers are not text readers, they create a model of the screen which could be navigated and interacted with. modern screen readers have ocr capabilities. they have ai addons as well. still, having the information ready to serve in a manner that allows followup action is much better.
Oh, I don't doubt at all that it's a measure of last resort, and I am indeed not familiar with the screen reader context.
I was mostly wondering how well my experience with human-but-not-machine-readable PDFs transferred to that domain, and surprised that OCR performance is still an issue.
Just as a quick datapoint here in case people get worried; yes, it is absolutely possible to program as a blind person, even without language models. Obviously you won't be using your eyes for it, but we have tried and tested tools that help and work. And at the end of the day, someone's going to have to review the code that gets written, so either way, you're not going to get around learning those tools.
Source: Am a blind person coding for many years before language models existed.
Thank you for sharing your experience. It provides me a bit of comfort to know it's possible for me to keep coding in the event of vision loss, and I'm glad tools exist for people that are blind.
A part of me wants to start using the available tools just to expand my modalities of interfacing with technology. If you have the time, any recommendations? What do you use?
The core thing here is the "I'm sorry you feel this way". This immediately deflects all sense of wrong-doing from the people actually doing wrong to the people feeling hurt. There are so many other ways to phrase this that are either more neutral or even acknowledging of some kind of mistake being made that's not on the volunteer's side, but that's not what's happening here. Essentially this means "We did the right thing and now we need to figure out how to make you understand this", not "Something went wrong and we need to figure out how to come to an understanding which might include us having done something wrong".
I'd love to test this, however it seems to not be accessible with screen readers. I assume this is because of the GUI library not supporting accessibility. I found an open issue about this on the Iced GitHub where in 2024 it was mentioned that the version after next should support it, and the last comment was in february of this year (https://github.com/iced-rs/iced/issues/552)
I bookmarked this so hopefully once that effort gets further along I can give it a try!
I figured I'd leave this comment so that some folks can see that there are real people even on HN who require these features and that accessibility work is always appreciated. We definitely exist :)
Heh, that roadmap is also not accessible to screen readers, at least on FireFox. That's unfortunate. But I understand it's a big undertaking with little reward for most people. I do think there are UI libraries with AccessKit integration, egui I believe?
Ah well. I'll check back on it every now and then either way.
I feel the same way. It also appears to be a lot more difficult to actually find jobs, though that's probably just the general state of the job market and less specifically AI related. All of it is thoroughly discouraging, demotivating, and every week this goes on the less I want to do it. So for me as well it might be time to try to look beyond software, which will also be difficult since software is what I've done for all my life, and everything else I can do I don't have any formal qualifications for, even if I am confident I have the relevant skills.
It's not even just that. Every single thing in tech right now seems to be AI this, AI that, and AI is great and all but I'm just so tired. So very tired. Somehow even despite the tools being impressive and getting more impressive by the day, I just can't find it in me to be excited about it all. Maybe it's just burnout I'm not sure, but it definitely feels like a struggle.
I keep coming to the same conclusion, which basically is: if I had an LLM write it for me, I just don't care about it. There are 2 projects out of the maybe 50 or so that are LLM generated, and even for those two I cared enough to make changes myself without an LLM. The rest just sit there because one day I thought huh wouldn't it be neat if, and then realized actually I cared more about having that thought than having the result of that thought. Then you end up fighting with different models and implementation details and then it messes up something and you go back and forth about how you actually want it to work, and somehow this is so much more draining and exhausting than just getting the work done manually with some slight completion help perhaps, maybe a little bit of boilerplate fill-in. And yes, this is after writing extensive design docs, then having some reasoning LLM figure out the tasks that need to be completed, then having some models talk back and forth about what needs to happen and while it's happening, and then I spent a whole lot of money on what exactly? Questionably working software that kinda sorta does what I wanted it to do? If I have a clear idea, or an existing codebase, if I end up guiding it along, agents and stuff are pretty cool I guess. But vibe coding? Maybe I'm in the minority here but as soon as it's a non trivial app, not just a random small script or bespoke app kind of deal, it's not fun, I often don't get the results I actually wanted out of it even if I tried to be as specific as I wanted with my prompting and design docs and example data and all that, it's expensive, code is still messy as heck, and at the end I feel like I just spent a whole lot of time actually literally arguing with my computer. Why would I want to do that?
I’ve written a full stack monorepo with over 1,000 files alone now. I’ve started with AI doing a lot of the work, but the percentage goes down and down. For me a good codebase is not about how much you’ve written, but about how it’s architectured. I want to have an app that has the best possible user and dev experience meaning its easy to maintain and easy to extend. This is achieved by making code easy to understand, for yourself, for others.
In my case it’s more like developing a mindset building a framework than to push feature after feature. I would think it’s like that for most companies. You can get an unpolished version of most apps easily, but polishing takes 3-5x the time.
Lets not talk about development robustness, backend security etc etc. Like AI has just way too many slippages for me in these cases.
However I would still consider myself a heavy AI user, but I mainly use it to discuss plans,(what google used to be) or to check it if I’ve forgotten anything.
For most features in my app I’m faster typing it out exactly the way I want it. (with a bit of auto-complete) The whole brain-coordination works better.
I guess long talk, but you’re not alone trust your instinct. You don’t seem narrow minded.
It’s nothing special. Not in the realm of anything technical outstanding. I just stated that to emphasize that it’s a slightly bigger project than default single-dev coded SAAS projects which are just a single wrapper. We have workers, multiple white-labeled applications sharing a common infrastructure, data scraping modules, AI-powered services, and email processing pipelines.
I’ve had an impossible learning curve the last year, but as I should rather be vibe-coded biased I still use less AI now to make sure it’s more consistent.
I think the two camps are different in terms of skill honestly, but also in terms of needs. Like of course you are faster vibe-coding a front-end then to write the code manually, but build a robust backend/processing system its a different kind of tier.
So instead of picking a side it’s usually best to stay as unbiased as possible and choose the right tool for the task
We just had a story last night about a Python cryptography maintainer using Claude to add formally-verified optimizations to LLVM. I think the ship has sailed on skepticism about whether LLMs are going to produce valuable code; you can follow Simon Willison's blog for more examples.
I don't understand people who are sceptical about whether LLMs can give value. We're way past that, now at the stage where we're trying to figure out how to extract the most value out of them, but I guess humans don't like change much.
They jury is still out, they have spent hundreds of billions, trillions. And want trillions in ROI.
It does really cool stuff now when it is given away for free, but how cool is it when they want you to pay what it actually costs? With ROI and profits on top.
Anyway, my phone is such an important companion wherever I go that I keep several magsafe batteries on me whenever I leave the house for a significant time. It has made an absolutely huge difference in confidence. It is definitely one of the single most important assistive tech devices I have together with my computer.
reply