Does anybody find it funny that sci-fi movies have to heavily distort "robot voices" to make them sound "convincingly robotic"? A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations. I don't expect a smart toaster to talk like a BBC host; it'd be enough is the speech if easy to recognize.
A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations[...]it'd be enough is the speech if easy to recognize.
We've had formant synths for several decades, and they're perfectly understandable and require a tiny amount of computing power, but people tend not to want to listen to them:
The YouTube video [1] was published in 2019. The Blog spam posts range from Nov 2022 to July 2023.
Other than the video, the only relevant content is on the about page [2]. It says the voice is a collaboration between 5 different entities, including advocacy groups, marketing firms and a music producer.
The video is the only example of the voice in use. There is no API, weights, SDK, etc.
I suspect this was a one-off marketing stunt sponsored by Copenhagen pride before the pandemic. The initial reaction was strong enough that a couple years they were still getting a small but steady flow of traffic. One of the involved marketing firms decided to monetize the asset and defaced it with blog spam.
Huh. Sounds perfectly intelligible and definitively artificial. Feels weakly feminine to me, but only because I was primed to think about gender from the branding.
It’s a good choice for a robot voice. It’s easier to understand than the formant synths or deliberately distorted human voices. The genderless aspect is alien enough to avoid the uncanny valley. You intuitively know you’re dealing with something a little different.
In the Culture novels, Iain Banks imagines that we would become uncomfortable with the uncanny realism of transmitted voices / holograms, and intentionally include some level of distortion to indicate you're speaking to an image
Depends on the movie. Ash and Bishop in the Alien franchise sound human until there's a dramatic reason to sound more 'robotic'.
I agree with your wider point. I use Google TTS with Moon+Reader all the time (I tried audio books read by real humans but I prefer the consistency of TTS)
Slightly different there because it's important in both cases that Ripley (and we) can't tell they're androids until it's explicitly uncovered. The whole point is that they're not presented as artificial. Same in Blade Runner: "more human than human". You don't have a film without the ambiguity there.
I remember that the novelization of the fifth element describes that the cops are taught to speak as robotic as possible when using speakers for some reason. Always found the idea weird that someone would _want_ that
I got an error when I tried the demo with 6 sentences, but it worked great when I reduced the text to 3 sentences. Is the length limit due to the model or just a limitation for the demo?
"This first Book proposes, first in brief, the whole Subject, Mans disobedience, and the loss thereupon of Paradise wherein he was plac't: Then touches the prime cause of his fall, the Serpent, or rather Satan in the Serpent; who revolting from God, and drawing to his side many Legions of Angels, was by the command of God driven out of Heaven with all his Crew into the great Deep."
It takes a while until it starts generating sound on my i7 cores but it kind of works.
This also works:
"blah. bleh. blih. bloh. blyh. bluh."
So I don't think it's a limit on punctuation. Voice quality is quite bad though, not as far from the old school C64 SAM (https://discordier.github.io/sam/) of the eighties as I expected.
I tried to replicate their demo text but it doesn't sound as good for some reason.
If anyone else wants to try:
> Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications. Our smallest model is less than 25 megabytes.
> Error generating speech: failed to call OrtRun(). ERROR_CODE: 2, ERROR_MESSAGE: Non-zero status code returned while running Expand node. Name:'/bert/Expand' Status Message: invalid expand shape
Thanks, I was looking for that. While the reddit demo sounds ok, even though on a level we reached a couple of years ago, all TTS samples I tried were barley understandable at all
On PC it's a python dependency hell but someone managed to package it in self contained JS code that works offline once it loaded the model? How is that done?
ONNXRuntime makes it fairly easy, you just need to provide a path to the ONNX file, give it inputs in the correct format, and use the outputs. The ONNXRuntime library handles the rest. You can see this in the main.js file: https://github.com/clowerweb/kitten-tts-web-demo/blob/main/m...
Plus, Python software are dependency hell in general, while webpages have to be self-contained by their nature (thank god we no longer have Silverlight and Java applets...)
yeah, this is just a preview model from an early checkpoint. the full model release will be next week which includes a 15M model and an 80M model, both of which will have much higher quality than this preview.
Not open source. "You will need internet connectivity to validate your AccessKey with Picovoice license servers ... If you wish to increase your limits, you can purchase a subscription plan." https://github.com/Picovoice/orca#accesskey
Going online is a dealbreaker but if you really need it you could use ghidra to fix that. I had tried to find a conversion of their model to onnx (making their proprietary pipeline useless) but failed.
Hopefully open source will render them irrelevant in the future.
Does an apk for Android exist for replacing its speech to text engine? I tried sherpa-onnx but it was too slow for real time usage it seemed, and especially so for audiobooks when sped up.
I can't test this out right now, is this just a demo or is it actually an apk for replacing the engine? Because those are two different things, the latter can be used any time you want to read something aloud on the page for example. This is the sherpa-onnx one I'm talking about.
I get the feeling we're going to end up in a place where we don't make docs any more. A project will have a trusted agent that can see the actual code, maybe just the API surface, and that agent acts like a customer service rep to a user's agent. It will generate docs on the fly, with specific examples for the task needed. Maybe the agents will find bugs together and update the code too.
Not exactly where I'd like to see us go, but at least we'll never get outdated information.
There are lots of things that neither the code nor the docs cover, so I suspect that's not quite possible, yet.
For example, if you're deploying a Postgres proxy, it will have a TCP timeout setting that you can tweak. Neither the docs nor the code will tell you what the value should be set to though.
Your engineers might know, because they have seen your internal network fail dozens of times and have a good intuition about it.
Software complexity has a wide range. If you're thinking of simple things like Sendgrid, Twilio or Stripe APIs, sure, an agent can easily write some boilerplate. But I think in certain sectors, we would need to attach some more inputs to the model that we currently don't have to get it to a good spot.
The Rust ecosystem needs more high-level frameworks like this. However, I've been shipping Django since 0.96, and I don't think Cot really addresses the main issues Django currently has. Performance isn't in the top 5.
Django's biggest issue is their aging templating system. The `block`, `extend` and `include` style of composition is so limited when compared to the expressiveness of JSX. There are many libraries that try to solve Django's lack of composition/components, but it's all a band-aid. Today, making a relatively complex page with reusable components is fragile and verbose.
The second-biggest issue is lack of front end integration. Even just a blessed way of generating an OpenAPI file from models would go a long way. Django Ninja is a great peek at what that could look like. However, new JS frameworks go so much further.
The other big issue Django has _is_ solved by Cot (or Rust), which is cool (but not highlighted): complicated deployments. Shipping a bunch of Python files is painful. Also, Python's threading model means you really have to have Gunicorn (and usually Nginx) in front. Cot could have all that compiled into one binary.
About performance: I agree, and I'm not even trying to make performance a priority in Cot. I mean, of course, it's nice to have an actual compiled language, but I think a bigger perk in using Rust is having *a lot* of stuff checked in compile time, rather than in runtime. This is something I'm trying to make the main perk of, and it is reflected in multiple parts in Cot (templates checked at compile time, ORM that is fully aware of database schema at compile time, among many others).
About JSX: I think that's the one I'll need to explore further. In my defense, the templating system Cot currently uses (Rinja) is much more expressive and pleasant to use than Django's, but admittedly, the core concepts are very similar. This one might be difficult to address because of an ecosystem of templating engines that is pretty lacking in Rust, but I'll see what I can do to help this.
About front-end integration: that's something that will be (at least partially) addressed no later than v0.2. Django REST Framework is a pain (mostly because it's never been integrated in Django), Django Ninja is something I haven't personally used very much - good to have it mentioned so it can be a source of inspiration. Generating OpenAPI docs is something that's even mentioned in the article "Request Handler API is far from being ergonomic and there’s no automatic OpenAPI docs generation" so yeah, I'm aware of this.
Deployment is indeed something that's super nice – and a part of this is that newer Rust versions generally don't break compatibility with existing code, unlike Python. I agree this should be highlighted, thanks for suggestion!
You could potentially address both templating and front-end integration by adopting Dioxus which does full stack rendering with React-like components (but in Rust). A "batteries included" full-stack framework could be quite exciting I think.
There is another solution, in this specific case. If all they wanted is to start returning the test results before all the tests are done, a streaming http response can be used.
In Bottle, returning a generator or iterator will send the response in chucks, instead of all at once. The effect would be that the test results load in one by one, providing the user with feedback. No JavaScript needed.
Dart is crazy because it runs on every platform, compiles to native, has real parallelism via isolates, native async, and native type safety.
There's not really a backend that takes advantage of all that. In theory, one server binary could handle REST, web sockets, background workers, and have generated type safe client packages for every platform. Dart also has a great Rust ffi story. It would be great to see that leveraged.
ServerPod is a great start, but it's really Flutter focused. The web apis feel like second class.
Additionally, database management isn't a solved problem yet. ServerPod uses yaml to define models, and the other main option is just a Prisma wrapper. Dart needs something like Drizzle.
You could state the same thing as your first sentence for e.g. Rust or many other languages, I personally only see Dart being useful if you already have a Flutter app and you don't want to learn another language, and to have shared types easily, similar to fullstack web devs using TypeScript for their React and Node apps.
I personally use Rust backends and Flutter frontends for my apps. I'd use pure Rust for the entire thing but Rust frontends are nowhere near the capabilities and maturity as Flutter, but I use FFI like flutter_rust_bridge and rinf at least, as you mention.
I actually can't think of another language that has all of that built in. Rust doesn't, it needs a run time for async. JavaScript doesn't, it needs typescript and it doesn't compile to native.
That's true about Rust but that's a feature not a bug as you can swap out async runtime if needed and if you do add it, it is still as or more efficient than Dart.
Kotlin Native is a toy for JetBrains to eat some of that Apple pie and capture teams that want to share logic between their mobile codebases.
Kotlin Native has no std, they cut down platform support with K2, performance and compilation speed are atrocious and there are no plans to improve any of that short term.
Kotlin without JVM can’t hold a candle to Dart. Which is a real shame for Dart, because Dart has improved dramatically last couple years while Kotlin has not introduced anything major last 5 years since release of coroutines.
Their K2 compiler, that was supposed to promise major compilation speed improvements, was mostly a flop and we are yet to see if they’ll do anything good with it. Context receivers are not even close, pattern matching is not even on a roadmap and they’re refusing to consider union types. Kotlin lives on a borrowed time.
1. runs on every platform (KNative runs natively on Linux, Mac, Windows, Android, iOS. It can also run under the JVM non-natively, and anywhere Javascript runs non-natively. The native code can build for a variety of architectures including ARM and x86)
2. compiles to native (As above, compiles to native on Linux/Mac/Windows/Android/iOS)
3. has real parallelism via isolates (Kotlin can spawn and interact with full processes, OS threads, and/or green threads in any admixture)
4. native async (Kotlin has native async/await support via coroutines, which work under KNative)
5. native type safety (Kotlin has a strong static type system which is available for native code as well and encompasses native types interactive with Kotlin code in either direction)
I don't think anything you said pertains to the listed five features. Especially complaining about compile speed is a strange thing to be doing in the context of this conversation.
On the topic of databases, I think https://drift.simonbinder.eu/ might interest you. I've been using it in a Flutter app with SQLite, but my understanding is that you could use it on the server too. I recall them having support for at least SQLite and Postgres.
In Rust, there's a controversial practice around putting unit tests in the same file as the actual code. I was put off by it at first, but I'm finding LLM autocomplete is able to be much more effective just being able to see the tests.
If the LLM can't complete a task, you add a test the shows it how to do it. This is multishot incontext learning and programming by example.
As for real TDD, you start with the tests and code until they pass. I haven't used an LLM to do this in Rust yet, but in Python due its dynamic nature, it is much simpler.
You can write the tests, then have the LLM sketch the code out enough so that they pass or at least exist enough to pass a linter. Dev tools are going to feel like magic 18 months from now.
The benefit of this approach is that you can directly test any function in the same scope without altering its visibility: it implicitly encourages you to test all functions (and design functions in a way they can be tested, as you are writing tests as you write code), not just those part of the public api contract.
Plus you can update tests, code, and comments in one go, with visibility into them at all times.
I agree with you on Django Ninja, so refreshingly simple compared to DRF. I think Django core needs to adopt something like it.
However, Vite is pretty complicated. I prefer just esbuild if I don't need all the extra features of Vite, which is usually true with Django. I wrote a post[0] with an example repo[1] if anyone wants to see how everything wires up.
With Solidjs, the minimum JS payload is around 9kb, and you get access to the whole JS ecosystem if you want it.
> I agree with you on Django Ninja, so refreshingly simple compared to DRF. I think Django core needs to adopt something like it.
I was going to ask about this with respect to DRF, but you answered it. I am re-learning Django after having been away from it and Python for ~4 years now, and my previous experience was with DRF in a somewhat toxic group so I had less than ideal feelings about it. I know PTSD is a real thing and I don't mean to sound glib about it, but I think I actually had the beginnings of it from that experience.
This is great, thank you for sharing! The QR code generator alone sold me on getting it. So many online generators demand I make an account for some reason.
It would be amazing if this were extendable with plugins though. I have a ton of custom terminal scripts for my workflows, but some of them would just be better with a simple UI. Global hotkeys that take me right to the tool would be awesome too.
Edit: it looks like global hotkeys can be done with the URL Scheme feature and Raycast. Nice.
Do you need fancy looking ones or just barebones QR codes? Because the latter you can just get from the qrcode Python package and simply go "qr news.ycombinator.com > hn.png" in your terminal.
Nice work! If you’re looking for more questions, my nonprofit specializes in authentic communication in groups. We have a list of prompts for our group moderators, but you’re welcome to use them as well: https://www.totem.org/repos/prompts/
Maybe it's not for you, but the "everything is a string" thing is just the default. SQLite has STRICT table option since 2021 that people really should be using if possible: https://www.sqlite.org/stricttables.html
This brings strict types that people expect from the other server-based databases.
It sounds ok, but impressive for the size.