What's inconceivable about the lyrics being generated?
A single song's lyrics easily fits into the context window of virtually any LLM, and with some of the bigger and better ones (Opus) you could probably feed it every known song's lyrics in your desired genre before asking it to create a new set of verses.
The lyrics it generated are way to clever. I'm a fan of rap, and no LLM I've ever used can generate anything nearly as witty as some of the lines in the demo.
I'd agree i'd imagine they had someone write the lyrics for it, but thats perfectly fine we want, driveable models that we can tell it what to say and it properly coherently turns that shit into smooth words with transitions, hell you can hear the breath sounds its even simulating on the word transitions.