I don’t think this is all that impressive, the generated podcast is pretty shallow - lots of ‘whoa meta’ and the word ‘like’ thrown into every sentence.
Yes, it will generate a middle-of-the-road waffling podcast, but not one with any real depth.
Look I agree with you at a certain level, maybe it can't emulate deep conversations about big topics (maybe it can, I haven't seen an attempt...), but a vast vast majority of podcasts and radio shows are just like this: shallow and incredibly simplified with no more than a nod to the underlying concepts. 70% personality, 20% dumb analogies that the producer thought up in thirty minutes, and <10% actually communicating the material is standard fare for normie podcasts, sadly.
Honestly, given the personalization maybe it's a net improvement.
Kind of feels like looking at an overflowing landfill and thinking "I wonder if we can invent a robot that just generates new trash directly into the landfill".
This holier than thou attitude that crops up in these threads is so annoying, as if people wanting to casually enjoy a mediocre podcast or radio show on the 1 hour commute to their shitty job is a crime.
I don’t think anyone cares about other people’s cheap pleasures. What people do care about is the displacement of quality and craft. For instance, you could say the same thing about the state of the web - say when searching for recipes. Maybe some people like the ads, the consent forms, the backstories? Why so purist? Isn’t it nice with a bit of scrolling and getting in the mood for cooking with a bit of SEO?
Defending craftsmen and attention to detail is not just about purism or gatekeeping. I appreciate people who care, even in fields I don’t personally care about (yet?). The professor who annoyingly insists on making sure every student “really gets it”, or the woodworker who is adamant about what joints are superior, or the kernel hacker who maintains rigor in face of hundreds of feature requests. The integrity of professionals can make or break institutions.
With AI reducing the effort to create garbage to the point of commoditization, people have a right, and arguably even an obligation, to be concerned. Remember, tech doesn’t follow potential, it follows incentive.
Right. Similarly, I criticize the people who worked to make cigarettes more addictive, fast food more 'craveable', freemium games more appealing to whales, gambling more attractive to problem gamblers, etc. but not people who smoke, eat fast food, play freemium games, or gamble. That would be deeply hypocritical.
I'm not criticizing the people who consume garbage, but the people who are enthusiastic about opening new markets in garbage. People should strive to do good, worthwhile things with their lives.
I was blown away by how impressive it was. I honestly thought it was real. I still can't believe these realistic audio capabilities are not being used for pure evil everywhere we look.
> like thrown into every sentence
I think that's actually part of why it sounds real, because tons of people do actually talk like that.
To me what would make it even better is the ability to throw in random jokes and utilize information about their surroundings and recent events.
I have been using MeloTTS for text-to-speech and I thought that was about the best we could do right now, but apparently I was very wrong. Is there an offline model one can download today that sounds as good as this NotebookLM?
Bark can sound as good, but Google is using SoundStorm which was specifically trained on dialogs. Surprisingly Bark can even sort of match it without being trained to do so, but not reliably. (https://x.com/jonathanfly/status/1675987073893904386)
And SoundStorm has more than twice the context window of Bark so dialogs are a tight fit.
I just tried the default bark.cpp example from the github readme, and to me it still doesn't sound close enough to realistic, and the audio quality itself was a bit scratchy... maybe I'm doing something wrong.
When I tried my own text with it, it went completely off the rails... skipping completely over random words, and also switching to different voices in the middle of a sentence. Trying to run the large model also crashed entirely.
You aren't doing anything wrong - Bark out the box uses a randomly generated voice and I like to think it's modeling the world of random voices which includes bad microphones/audio-quality. (Even bad 'actors' - see how many Bark voices sound like they are reading a script.)
Presumably it was trained in noisy data. But it can generate and use a clean voice, they are in there. Most of the Suno default voices are not great either - but a great voice can sound perfectly clear. I haven't done much with Bark lately but on my Twitter there's plenty of clear examples of very realistic voices. Actually here I ran a prompt based on some copy and pasted test 20 times in Bark. I put a couple better results up front, but even in later samples you can find lots of evidence of human-sounding voices. https://sndup.net/bzhz5/
Going off the rails and hallucinating is a hard problem. It can be minimized, but probably would have to solved with simple brute force (check the output with S2T and retry if needed.)
For raw audio you can replace the final decoding step with something like VOCOS or MBD if you want to maximize audio quality, though you don't need do with the best voices.
I think it’s “impressive” the first time you use it, but with subsequent runs it’s evident how formulaic it is. The end result, the personalities of the podcast “hosts” and their interactions are similar regardless of the context of inputs.
Basically it’s a neat party trick at the moment. I do hope to see it improve however!
Right?! We call this goalpost moving now, but it is not a new phenomena.
> It is interesting that nowadays, practically no one feels that sense of awe any longer - even when computers perform operations that are incredibly more sophisticated than those which sent thrills down spines in the early days. The once-exciting phrase "Giant Electronic Brain" remains only as a sort of "camp" cliché, a ridiculous vestige of the era of Flash Gordon and Buck Rogers. It is a bit sad that we become blasé so quickly.
> There is a related "Theorem" about progress in AI: once some mental function is programmed, people soon cease to consider it as an essential ingredient of "real thinking". The ineluctable core of intelligence is always in that next thing which hasn't yet been programmed. This "Theorem" was first proposed to me by Larry Tesler, so I call it Tesler's Theorem: "Al is whatever hasn't been done yet."
This quote is from the 80s, from GEB by Douglas Hofstadter.
(and btw, I just took a grainy, poorly-lit picture from the book, and could automagically select the text from it, since I couldn't find the quote online. Imagine that tech in the 80s. Hell, it was bad even in the 2000s, with OCR being hit and miss for a long time. Now it "just works".)
Think about how comfortable your life is, and how the 17th century version of yourself would kill to live it. Then think about how you aren't in a perpetual state of ecstasy for being given this life.
People quickly adapt to their current circumstances, take them for granted, and immediately want more.
You’re taking about advancements made through multiple lifetimes. This burst in AI has lasted about 15 years.
TBH I think it’s more of a knee jerk reaction from those tired of hearing about AI or who just want to post contrarian opinions (which I totally do sometimes, too).
The content is nothing that special these days, you could get it out of Gemini or Claude probably- but the audio affect is awfully convincing.
You can compare it to Google's Illuminate which also generates conversations by summarizing texts but in a much straighter, less fluffy way. It's less shallow but in some ways less compelling:
This was exactly my reaction to listening to the example podcast. Although, I wonder if the base material weren't so meta-level product overview, maybe it would be better. I do think the liveliness of the conversation was good (interjections, tonal variety, etc), so at least parts of the demo are impressive.
To me, that's just how they tuned the 'audience' of the podcast, which I think we can imagine was at least partly informed by the 'audience' IRL podcasts are named at. I, too would like to be able to 'turn up the technical' on these, but for example, I dumped a paper about a latchless mutexless work distribution algorithm into it, which I had read but still had questions about, and the podcast accurately summarized, simplified it, and got my questions answered, which I then validated later by re-reading the paper. It was faster than combing through the paper would have been.
Yes, it will generate a middle-of-the-road waffling podcast, but not one with any real depth.