Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, I wanted to like this, but frankly the quality isn't fantastic.

The text to speech is alright, but it lacks almost any emotion, and it reads everything literally, which when the article/pdf has a weird layout, or has figures, doesn't sound natural. Though I expect they're just not using their top-of-the-line models for this - I've had much more luck pushing a pdf through Claude to generate the "verbal version" (which is mostly literal, but also describes the layout and figures) and then the result through the top-of-the-line ElevenLabs model.

Now, I've also checked out the podcast feature, and it's pretty clear they first do a textual generation, and then a simple text to speech. Again, lack of emotion, very mechanical flow.

I made a podcast of a technical article[0] in both ElevenLabs reader and Google's NotebookLM, and the NotebookLM podcast is a night-and-day improvement - maybe they use a better model, maybe they use straight "article to podcast" end-to-end multimodal generation, I don't know, but the quality, flow, emotion, is just on a completely different level. I had to quickly turn off the ElevenLabs-generated podcast cause I couldn't keep listening to it, while NotebookLM's one is legitimately enjoyable.

Now to finish on a more positive note, fingers crossed for the ElevenLabs team improving this, and us getting some competition in the area of article-to-audio, both podcast-style, and direct! I think, in general, it's a very promising product direction. Feature-wise, I would also love to get a daily overview podcast based on all my RSS feed articles for a given day.

[0]: https://huggingface.co/blog/modernbert



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: