More

blopker · 2025-08-06T02:00:17 1754445617

Web version: https://clowerweb.github.io/kitten-tts-web-demo/

It sounds ok, but impressive for the size.

nine_k · 2025-08-06T02:06:10 1754445970

Does anybody find it funny that sci-fi movies have to heavily distort "robot voices" to make them sound "convincingly robotic"? A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations. I don't expect a smart toaster to talk like a BBC host; it'd be enough is the speech if easy to recognize.

userbinator · 2025-08-06T05:18:16 1754457496

A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations[...]it'd be enough is the speech if easy to recognize.

We've had formant synths for several decades, and they're perfectly understandable and require a tiny amount of computing power, but people tend not to want to listen to them:

https://en.wikipedia.org/wiki/Software_Automatic_Mouth

https://simulationcorner.net/index.php?page=sam (try it yourself to hear what it sounds like)

miki123211 · 2025-08-06T07:02:37 1754463757

SAM and the way it works is not what people typically associate with the term "formant synthesizer."

DECtalk[1,2] would be a much better example, that's as formant as you get.

[1] https://en.wikipedia.org/wiki/DECtalk [2] https://webspeak.terminal.ink

saretup · 2025-08-06T05:52:17 1754459537

Well, this one is a bit too jarring to the ears.

rixed · 2025-08-06T06:17:45 1754461065

But there is no latency, as opposed to KittenTTS, so it certainly has its applications too.

cess11 · 2025-08-06T06:24:15 1754461455

Try this demo, which has more knobs:

https://discordier.github.io/sam/

actionfromafar · 2025-08-06T07:27:06 1754465226

I think it's charming

boobsbr · 2025-08-06T15:27:15 1754494035

Huh, now I know what Airdorf used in Faith: Unholy Trinity.

tapper · 2025-08-06T07:59:06 1754467146

Yeah blind people love eloquence

roywiggins · 2025-08-06T02:07:54 1754446074

This one is at least an interesting idea: https://genderlessvoice.com/

cosmojg · 2025-08-06T03:44:26 1754451866

The voice sounds great! I find it quite aesthetically pleasing, but it's far from genderless.

a96 · 2025-08-08T09:22:55 1754644975

So, what's the gender?

dang · 2025-08-06T06:00:31 1754460031

Meet Q, a Genderless Voice - https://news.ycombinator.com/item?id=19505835 - March 2019 (235 comments)

cyberax · 2025-08-06T07:41:31 1754466091

It doesn't sound genderless.

degamad · 2025-08-06T04:15:30 1754453730

Interesting concept, but why is that site filled with Top X blogspam?

pbronez · 2025-08-06T12:17:34 1754482654

The YouTube video [1] was published in 2019. The Blog spam posts range from Nov 2022 to July 2023.

Other than the video, the only relevant content is on the about page [2]. It says the voice is a collaboration between 5 different entities, including advocacy groups, marketing firms and a music producer.

The video is the only example of the voice in use. There is no API, weights, SDK, etc.

I suspect this was a one-off marketing stunt sponsored by Copenhagen pride before the pandemic. The initial reaction was strong enough that a couple years they were still getting a small but steady flow of traffic. One of the involved marketing firms decided to monetize the asset and defaced it with blog spam.

[1] https://www.youtube.com/watch?v=lvv6zYOQqm0

[2] https://genderlessvoice.com/about/

pbronez · 2025-08-06T11:54:59 1754481299

Huh. Sounds perfectly intelligible and definitively artificial. Feels weakly feminine to me, but only because I was primed to think about gender from the branding.

It’s a good choice for a robot voice. It’s easier to understand than the formant synths or deliberately distorted human voices. The genderless aspect is alien enough to avoid the uncanny valley. You intuitively know you’re dealing with something a little different.

qmr · 2025-08-07T23:00:27 1754607627

Thanks, I hate it.

mfro · 2025-08-06T13:53:13 1754488393

In the Culture novels, Iain Banks imagines that we would become uncomfortable with the uncanny realism of transmitted voices / holograms, and intentionally include some level of distortion to indicate you're speaking to an image

incone123 · 2025-08-06T07:01:47 1754463707

Depends on the movie. Ash and Bishop in the Alien franchise sound human until there's a dramatic reason to sound more 'robotic'.

I agree with your wider point. I use Google TTS with Moon+Reader all the time (I tried audio books read by real humans but I prefer the consistency of TTS)

regularfry · 2025-08-06T08:53:25 1754470405

Slightly different there because it's important in both cases that Ripley (and we) can't tell they're androids until it's explicitly uncovered. The whole point is that they're not presented as artificial. Same in Blade Runner: "more human than human". You don't have a film without the ambiguity there.

incone123 · 2025-08-06T17:11:34 1754500294

You're right. I should have used Marvin from Hitchhiker's Guide as an example instead. There's very light processing on his speech.

Twirrim · 2025-08-06T05:32:44 1754458364

> I don't expect a smart toaster to talk like a BBC host;

Well sure, the BBC have already established that it's supposed to sound like a brit doing an impersonation of an American: https://www.youtube.com/watch?v=LRq_SAuQDec

looperhacks · 2025-08-06T08:35:34 1754469334

I remember that the novelization of the fifth element describes that the cops are taught to speak as robotic as possible when using speakers for some reason. Always found the idea weird that someone would _want_ that

addandsubtract · 2025-08-06T09:51:58 1754473918

If you're on a Mac, you can type "say [thing to say]" into your terminal.

msgodel · 2025-08-06T10:21:11 1754475671

I personally prefer the older synthetic voices for TTS when the text is coming from software or a language model.

bkyan · 2025-08-06T06:05:25 1754460325

I got an error when I tried the demo with 6 sentences, but it worked great when I reduced the text to 3 sentences. Is the length limit due to the model or just a limitation for the demo?

divamgupta · 2025-08-06T07:34:43 1754465683

Currently we don't have chunking enabled yet. We will add it soon. That will remove the length limitations.

cess11 · 2025-08-06T06:22:22 1754461342

Perhaps a length limit? I tried this:

"This first Book proposes, first in brief, the whole Subject, Mans disobedience, and the loss thereupon of Paradise wherein he was plac't: Then touches the prime cause of his fall, the Serpent, or rather Satan in the Serpent; who revolting from God, and drawing to his side many Legions of Angels, was by the command of God driven out of Heaven with all his Crew into the great Deep."

It takes a while until it starts generating sound on my i7 cores but it kind of works.

This also works:

"blah. bleh. blih. bloh. blyh. bluh."

So I don't think it's a limit on punctuation. Voice quality is quite bad though, not as far from the old school C64 SAM (https://discordier.github.io/sam/) of the eighties as I expected.

Retr0id · 2025-08-06T03:04:08 1754449448

I tried to replicate their demo text but it doesn't sound as good for some reason.

If anyone else wants to try:

> Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications. Our smallest model is less than 25 megabytes.

cortesoft · 2025-08-06T04:40:48 1754455248

Is the demo using the not smallest model?

Retr0id · 2025-08-06T11:22:38 1754479358

Perhaps, but the 25MB model is the only thing they've released

quantummagic · 2025-08-06T02:21:08 1754446868

Doesn't work here. Backend module returns 404 :

https://clowerweb.github.io/node_modules/onnxruntime-web/dis...

Retr0id · 2025-08-06T02:33:02 1754447582

Looks like this commit 15 minutes ago broke it https://github.com/clowerweb/kitten-tts-web-demo/commit/6b5c...

(seems reverted now)

itake · 2025-08-06T03:52:21 1754452341

> Error generating speech: failed to call OrtRun(). ERROR_CODE: 2, ERROR_MESSAGE: Non-zero status code returned while running Expand node. Name:'/bert/Expand' Status Message: invalid expand shape

Doesn't seem to work with thai.

jainilprajapati · 2025-08-06T04:13:59 1754453639

You can also try on https://clowerweb.github.io/node_modules/onnxruntime-web/dis...

nxnsxnbx · 2025-08-06T05:48:15 1754459295

Thanks, I was looking for that. While the reddit demo sounds ok, even though on a level we reached a couple of years ago, all TTS samples I tried were barley understandable at all

divamgupta · 2025-08-06T07:36:35 1754465795

This is just an early checkpoint. We hope that the quality will improve in the future.

Aardwolf · 2025-08-06T07:53:22 1754466802

On PC it's a python dependency hell but someone managed to package it in self contained JS code that works offline once it loaded the model? How is that done?

a2128 · 2025-08-06T08:22:54 1754468574

ONNXRuntime makes it fairly easy, you just need to provide a path to the ONNX file, give it inputs in the correct format, and use the outputs. The ONNXRuntime library handles the rest. You can see this in the main.js file: https://github.com/clowerweb/kitten-tts-web-demo/blob/main/m...

Plus, Python software are dependency hell in general, while webpages have to be self-contained by their nature (thank god we no longer have Silverlight and Java applets...)

scotty79 · 2025-08-06T09:27:22 1754472442

It feels like it doesn't handle punctuation well. I don't hear sentence boundaries and commas. It sounds like continuous stream of words.

rohan_joshi · 2025-08-06T07:33:44 1754465624

yeah, this is just a preview model from an early checkpoint. the full model release will be next week which includes a 15M model and an 80M model, both of which will have much higher quality than this preview.

rldjbpin · 2025-08-13T10:14:24 1755080064

besides issues with webgpu (it is in beta fwiw), it'd be nice to increase voice speed through the setting without affecting the voice pitch.

Jotalea · 2025-08-07T13:26:19 1754573179

Using male voice 2 at 48kHz at 0.5x speed sounds a lot like Madeline's voice lines in Celeste. Seemed funny to me.

belchiorb · 2025-08-06T07:14:08 1754464448

This doesn’t seem to work on Safari. Works great on Chrome, though

divamgupta · 2025-08-06T07:22:56 1754464976

Hmm, we will look into it.

tapper · 2025-08-06T08:05:37 1754467537

You should post on the NVDA email list. https://nvda.groups.io/g/nvda Or the Screen reader list: https://winaccess.groups.io/g/winaccess FYI blind people do not like any lag when reading that’s is why so many still use eloquence and espeak.

kenarsa · 2025-08-06T02:35:31 1754447731

[flagged]

gary_0 · 2025-08-06T03:04:17 1754449457

Not open source. "You will need internet connectivity to validate your AccessKey with Picovoice license servers ... If you wish to increase your limits, you can purchase a subscription plan." https://github.com/Picovoice/orca#accesskey

papichulo2023 · 2025-08-06T06:57:41 1754463461

The guy is just spamming the project in a lot of comments.

cakealert · 2025-08-06T06:20:03 1754461203

Going online is a dealbreaker but if you really need it you could use ghidra to fix that. I had tried to find a conversion of their model to onnx (making their proprietary pipeline useless) but failed.

Hopefully open source will render them irrelevant in the future.

satvikpendem · 2025-08-06T02:38:58 1754447938

Does an apk for Android exist for replacing its speech to text engine? I tried sherpa-onnx but it was too slow for real time usage it seemed, and especially so for audiobooks when sped up.

kenarsa · 2025-08-06T02:46:48 1754448408

https://github.com/Picovoice/orca/tree/main/demo%2Fandroid

satvikpendem · 2025-08-06T03:19:35 1754450375

I can't test this out right now, is this just a demo or is it actually an apk for replacing the engine? Because those are two different things, the latter can be used any time you want to read something aloud on the page for example. This is the sherpa-onnx one I'm talking about.

https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

blopker · 2025-05-20T16:08:07 1747757287

I get the feeling we're going to end up in a place where we don't make docs any more. A project will have a trusted agent that can see the actual code, maybe just the API surface, and that agent acts like a customer service rep to a user's agent. It will generate docs on the fly, with specific examples for the task needed. Maybe the agents will find bugs together and update the code too.

Not exactly where I'd like to see us go, but at least we'll never get outdated information.

levkk · 2025-05-20T16:14:47 1747757687

There are lots of things that neither the code nor the docs cover, so I suspect that's not quite possible, yet.

For example, if you're deploying a Postgres proxy, it will have a TCP timeout setting that you can tweak. Neither the docs nor the code will tell you what the value should be set to though.

Your engineers might know, because they have seen your internal network fail dozens of times and have a good intuition about it.

Software complexity has a wide range. If you're thinking of simple things like Sendgrid, Twilio or Stripe APIs, sure, an agent can easily write some boilerplate. But I think in certain sectors, we would need to attach some more inputs to the model that we currently don't have to get it to a good spot.

blopker · 2025-02-18T19:38:23 1739907503

The Rust ecosystem needs more high-level frameworks like this. However, I've been shipping Django since 0.96, and I don't think Cot really addresses the main issues Django currently has. Performance isn't in the top 5.

Django's biggest issue is their aging templating system. The `block`, `extend` and `include` style of composition is so limited when compared to the expressiveness of JSX. There are many libraries that try to solve Django's lack of composition/components, but it's all a band-aid. Today, making a relatively complex page with reusable components is fragile and verbose.

The second-biggest issue is lack of front end integration. Even just a blessed way of generating an OpenAPI file from models would go a long way. Django Ninja is a great peek at what that could look like. However, new JS frameworks go so much further.

The other big issue Django has _is_ solved by Cot (or Rust), which is cool (but not highlighted): complicated deployments. Shipping a bunch of Python files is painful. Also, Python's threading model means you really have to have Gunicorn (and usually Nginx) in front. Cot could have all that compiled into one binary.

m4tx · 2025-02-18T20:04:57 1739909097

Thanks a lot for this extensive feedback!

About performance: I agree, and I'm not even trying to make performance a priority in Cot. I mean, of course, it's nice to have an actual compiled language, but I think a bigger perk in using Rust is having *a lot* of stuff checked in compile time, rather than in runtime. This is something I'm trying to make the main perk of, and it is reflected in multiple parts in Cot (templates checked at compile time, ORM that is fully aware of database schema at compile time, among many others).

About JSX: I think that's the one I'll need to explore further. In my defense, the templating system Cot currently uses (Rinja) is much more expressive and pleasant to use than Django's, but admittedly, the core concepts are very similar. This one might be difficult to address because of an ecosystem of templating engines that is pretty lacking in Rust, but I'll see what I can do to help this.

About front-end integration: that's something that will be (at least partially) addressed no later than v0.2. Django REST Framework is a pain (mostly because it's never been integrated in Django), Django Ninja is something I haven't personally used very much - good to have it mentioned so it can be a source of inspiration. Generating OpenAPI docs is something that's even mentioned in the article "Request Handler API is far from being ergonomic and there’s no automatic OpenAPI docs generation" so yeah, I'm aware of this.

Deployment is indeed something that's super nice – and a part of this is that newer Rust versions generally don't break compatibility with existing code, unlike Python. I agree this should be highlighted, thanks for suggestion!

nicoburns · 2025-02-19T00:48:04 1739926084

You could potentially address both templating and front-end integration by adopting Dioxus which does full stack rendering with React-like components (but in Rust). A "batteries included" full-stack framework could be quite exciting I think.

(Disclaimer: I work on dioxus's native renderer)

wheregmis · 2025-02-19T02:05:38 1739930738

+1 (Disclaimer: I dont work on Dioxus's but i am admirer of Dioxus)

blopker · 2025-02-10T00:48:05 1739148485

There is another solution, in this specific case. If all they wanted is to start returning the test results before all the tests are done, a streaming http response can be used.

In Bottle, returning a generator or iterator will send the response in chucks, instead of all at once. The effect would be that the test results load in one by one, providing the user with feedback. No JavaScript needed.

blopker · 2025-02-08T16:28:49 1739032129

Nice work! I'm happy to see the server is in Dart, no Firebase!

I wish the Dart server ecosystem was more mature though. Being able to compile Dart into a static binary is so nice for deployment.

hoppp · 2025-02-08T19:09:41 1739041781

I wrote some dart servers years ago, there is a nice framework called aqueduct that makes it very easy and productive.

I stopped using it because I needed to make money and my work was with nodejs.

satvikpendem · 2025-02-08T19:19:22 1739042362

Serverpod and Dart Frog are both pretty good backend Dart frameworks, what are you looking for?

blopker · 2025-02-08T22:31:06 1739053866

Dart is crazy because it runs on every platform, compiles to native, has real parallelism via isolates, native async, and native type safety.

There's not really a backend that takes advantage of all that. In theory, one server binary could handle REST, web sockets, background workers, and have generated type safe client packages for every platform. Dart also has a great Rust ffi story. It would be great to see that leveraged.

ServerPod is a great start, but it's really Flutter focused. The web apis feel like second class.

Additionally, database management isn't a solved problem yet. ServerPod uses yaml to define models, and the other main option is just a Prisma wrapper. Dart needs something like Drizzle.

satvikpendem · 2025-02-08T22:42:34 1739054554

You could state the same thing as your first sentence for e.g. Rust or many other languages, I personally only see Dart being useful if you already have a Flutter app and you don't want to learn another language, and to have shared types easily, similar to fullstack web devs using TypeScript for their React and Node apps.

I personally use Rust backends and Flutter frontends for my apps. I'd use pure Rust for the entire thing but Rust frontends are nowhere near the capabilities and maturity as Flutter, but I use FFI like flutter_rust_bridge and rinf at least, as you mention.

blopker · 2025-02-08T23:03:02 1739055782

I actually can't think of another language that has all of that built in. Rust doesn't, it needs a run time for async. JavaScript doesn't, it needs typescript and it doesn't compile to native.

satvikpendem · 2025-02-09T06:32:59 1739082779

That's true about Rust but that's a feature not a bug as you can swap out async runtime if needed and if you do add it, it is still as or more efficient than Dart.

Borealid · 2025-02-09T00:10:25 1739059825

Kotlin Native has every one of those features.

wiseowise · 2025-02-09T08:52:00 1739091120

Hahahaha, by a long shot not.

Kotlin Native is a toy for JetBrains to eat some of that Apple pie and capture teams that want to share logic between their mobile codebases.

Kotlin Native has no std, they cut down platform support with K2, performance and compilation speed are atrocious and there are no plans to improve any of that short term.

Kotlin without JVM can’t hold a candle to Dart. Which is a real shame for Dart, because Dart has improved dramatically last couple years while Kotlin has not introduced anything major last 5 years since release of coroutines.

Their K2 compiler, that was supposed to promise major compilation speed improvements, was mostly a flop and we are yet to see if they’ll do anything good with it. Context receivers are not even close, pattern matching is not even on a roadmap and they’re refusing to consider union types. Kotlin lives on a borrowed time.

Borealid · 2025-02-09T10:41:42 1739097702

The listed features are:

1. runs on every platform (KNative runs natively on Linux, Mac, Windows, Android, iOS. It can also run under the JVM non-natively, and anywhere Javascript runs non-natively. The native code can build for a variety of architectures including ARM and x86)

2. compiles to native (As above, compiles to native on Linux/Mac/Windows/Android/iOS)

3. has real parallelism via isolates (Kotlin can spawn and interact with full processes, OS threads, and/or green threads in any admixture)

4. native async (Kotlin has native async/await support via coroutines, which work under KNative)

5. native type safety (Kotlin has a strong static type system which is available for native code as well and encompasses native types interactive with Kotlin code in either direction)

I don't think anything you said pertains to the listed five features. Especially complaining about compile speed is a strange thing to be doing in the context of this conversation.

vips7L · 2025-02-09T16:21:16 1739118076

While KNative does have these things it still does not have a standard library or ecosystem. Without another runtime KNative is practically useless.

lawgimenez · 2025-02-09T14:27:10 1739111230

What made you think the K2 compiler is a flop?

SpaghettiCthulu · 2025-02-09T00:20:27 1739060427

On the topic of databases, I think https://drift.simonbinder.eu/ might interest you. I've been using it in a Flutter app with SQLite, but my understanding is that you could use it on the server too. I recall them having support for at least SQLite and Postgres.

desumeku · 2025-02-09T03:19:08 1739071148

> I wish the Dart server ecosystem was more mature though. Being able to compile Dart into a static binary is so nice for deployment.

Considering that both are made by Google, I can imagine that they just use Go for everything servers.

igor_st · 2025-02-09T05:52:28 1739080348

Agree, Dark is pretty nice, but ecosystem is not mature.

blopker · 2025-01-16T18:25:10 1737051910

In Rust, there's a controversial practice around putting unit tests in the same file as the actual code. I was put off by it at first, but I'm finding LLM autocomplete is able to be much more effective just being able to see the tests.

No clunky loop needed.

It's gotten me back into TDD.

sitkack · 2025-01-16T21:04:45 1737061485

If the LLM can't complete a task, you add a test the shows it how to do it. This is multishot incontext learning and programming by example.

As for real TDD, you start with the tests and code until they pass. I haven't used an LLM to do this in Rust yet, but in Python due its dynamic nature, it is much simpler.

You can write the tests, then have the LLM sketch the code out enough so that they pass or at least exist enough to pass a linter. Dev tools are going to feel like magic 18 months from now.

ComputerGuru · 2025-01-17T16:29:10 1737131350

The benefit of this approach is that you can directly test any function in the same scope without altering its visibility: it implicitly encourages you to test all functions (and design functions in a way they can be tested, as you are writing tests as you write code), not just those part of the public api contract.

Plus you can update tests, code, and comments in one go, with visibility into them at all times.

regularfry · 2025-01-17T10:46:55 1737110815

I've sometimes done the same in python. I do quite like the ergonomics.

blopker · 2025-01-15T22:52:19 1736981539

I agree with you on Django Ninja, so refreshingly simple compared to DRF. I think Django core needs to adopt something like it.

However, Vite is pretty complicated. I prefer just esbuild if I don't need all the extra features of Vite, which is usually true with Django. I wrote a post[0] with an example repo[1] if anyone wants to see how everything wires up.

With Solidjs, the minimum JS payload is around 9kb, and you get access to the whole JS ecosystem if you want it.

[0] https://blopker.com/writing/07-django-islands-part-1/ [1] https://github.com/blopker/typesafedjango

michaelcampbell · 2025-01-16T00:59:00 1736989140

> I agree with you on Django Ninja, so refreshingly simple compared to DRF. I think Django core needs to adopt something like it.

I was going to ask about this with respect to DRF, but you answered it. I am re-learning Django after having been away from it and Python for ~4 years now, and my previous experience was with DRF in a somewhat toxic group so I had less than ideal feelings about it. I know PTSD is a real thing and I don't mean to sound glib about it, but I think I actually had the beginnings of it from that experience.

blopker · on Sept 9, 2024

This is great, thank you for sharing! The QR code generator alone sold me on getting it. So many online generators demand I make an account for some reason.

It would be amazing if this were extendable with plugins though. I have a ton of custom terminal scripts for my workflows, but some of them would just be better with a simple UI. Global hotkeys that take me right to the tool would be awesome too.

Edit: it looks like global hotkeys can be done with the URL Scheme feature and Raycast. Nice.

llarsson · on Sept 9, 2024

Do you need fancy looking ones or just barebones QR codes? Because the latter you can just get from the qrcode Python package and simply go "qr news.ycombinator.com > hn.png" in your terminal.

https://pypi.org/project/qrcode/

blopker · on Sept 4, 2024

Nice work! If you’re looking for more questions, my nonprofit specializes in authentic communication in groups. We have a list of prompts for our group moderators, but you’re welcome to use them as well: https://www.totem.org/repos/prompts/

blopker · on Aug 28, 2024

Maybe it's not for you, but the "everything is a string" thing is just the default. SQLite has STRICT table option since 2021 that people really should be using if possible: https://www.sqlite.org/stricttables.html

This brings strict types that people expect from the other server-based databases.