The real endgame in this space would be a tool that first generates a song layout, think Fruityloops, then the corresponding instruments for it, then the vocals, and as the last step allows you to modify each of those layers without nuking the rest. Imagine something similar to what Suno does now, except you had the ability to add in an extra verse without altering the rest of the song, swap out a few passages of the lyrics with the rest staying in tact, swapping out drums for a different drum set etc.
If there’s variance in output, it stands to reason you’d generate many X your desired output count and curate. Standard practice for creative output, from Midjourney to LLMs