The last tropical house one is better than the first one in that it gets the house 4/4 drums right whereas the first one sounds more like a rap track. It’s still got elevator music vibes though.
Really?
This sounds like horrible muzak to me and there is TONS of awesome actually good royalty free music out there.
Maybe it will get there in some time, but right now I would not use this.
This definitely feels like more of a tech demo for things to come. But I suspect it could get dramatically better in a matter of months or single-digit years.
Is it able to synthesize individual elements of a track? (kick, snare, hats, bass note in C4, etc). This might be killer for pairing with a sampler. Back in the early days of hip-hop, memory & disk limits encouraged artists to load samples that were pitched up 2x then slow back down on the device. This gave the samples a grainy quality which helped the aesthetics. With a human level of polish, I'm pretty sure the combo would be amazing.
OK they're not exactly captivating to listen to - but I don't think they're supposed to be. The use-case, I think, is more along the lines of generating adequate background music for a my-first-self-published-videogame project with next to zero effort.
I feel like that becomes sort of shovelware. There are some games with some truly amazing scores and the above stuff doesn't approach. Is it better than no music, or stock/open license music? I'm not sure. What happens if you've built the next hollow knight but rather than the amazing score from Christopher Larkin that elevates the game you get the garbled terrible music in the above post? It actually makes it worse.
I fear the path into "low/no effort" versions of music scores, art, etc is a dilution of great works that actually makes a worse product than if you went to the trouble of finding someone to do it properly. It leads to shovelware and I don't think we need more shovelware. We need more high quality high intention music and games and images, and personally I think thats going to come from gifted individuals who put in the effort to learn it, not a neural network that can't experience what you want the music to make you feel.
If I logged into a video game and I got the music in the above post it would detract from the game. You'd be better off finding open license or cheap music that a human has made. its way better, and would reflect better on your game and design process.
If someone has the vision and taste to create the next x, they likely have the appreciation of what decent music sounds like and the work that goes into that.
> The use-case, I think, is more along the lines of generating adequate background music for a my-first-self-published-videogame project with next to zero effort
n+1. #indiedev is full of extremely passionate people who want it exactly their way down to the pixel and every note of music. To write off the entire non-AAA space as only wanting to write shovelware even as a first game is laughable.
Having worked on a few games doing music, the devs were super passionate about a quality product, frequently over-engineering, and when then-available bog-standard audio middleware didn't do the procedural mixing we wanted, wrote a custom system just to get it spot-on.
I also know a few major AAA legends who went indie who won't publish shite.
Equally, if the original commenter is happy with x being good enough then that's valid. Maybe the game isn't for you?
Keep in mind my initial reply was directly replying to the "What happens if you've built the next hollow knight" in your post. The implication of my first reply is if you're at such a level you can do that, you very likely aren't going to settle for your music letting you down, which is arguably the other 50% of what a modern game is against the visuals. You're focusing on the commenter being satisfied by AI music. If the game is that good, you'll get a budget to have the music not suck, whether that's via XGP paying to finish the development, a revshare with a composer, a grant, or even a publisher.
The tl;dr is if buddy thinks his game is anything more than a learning exercise or something he simply enjoyed making and has potential to actually be a great piece of art or a decently selling product, he absolutely does not have to settle for mediocrity even without a budget.
>To write off the entire non-AAA space as only wanting to write shovelware even as a first game is laughable.
Clearly I'm not - as I mention hollow knight.
>Equally, if the original commenter is happy with x being good enough then that's valid. Maybe the game isn't for you?
Clearly it isn't for me, as I'd have to listen to the above sort of garbage music and would hate the time spent, regardless of the game itself.
> The implication of my first reply is if you're at such a level you can do that
There are lots of people who have great ideas and even iterations on a game that isn't there yet, but who fail in graphics, audio, and overall design because they underestimate how important those are to user experience.
I would never say "I'm so glad I have access to copilot, now I can make a game with next to zero effort" but the guy I replied to thinks he can get a score worthy of being in a game from a machine learning model.
The easy part overrides everything and we get (or are going to get) huge collections of shovelware from people who see how easy it is to produce.
They are just very bland and a little musically incoherent. They pass the basic sniff test of, "this sounds like music" but the more you listen the more they fall apart. It's a good muzak generator, but that's about it.
Yeah, I agree. It's not going to replace modern music theory today or tomorrow, but I'm excited to play with this sort of thing in dynamic video game content generation. I'm envisioning a user giving a bit of information about their mood and then getting auto-generated music while they use an app briefly. What I heard here seems good enough to impress some people with that use case.
Don't ask where audio generation is now, ask where it was a year ago and then where it will be in a year from now. Or, say, in five years.
Remember all the people who said, in January and February of 2020, this "novel coronavirus" is not a big deal? Because it affected only a tiny amount of people? Because "much more people die from the flu"? The fallacy is that they ignored the speed at which it progressed.
A friend was a full time jazz musician for years (he’s not famous but played with many/most of the big names in jazz) and said it < https://google-research.github.io/seanet/musiclm/examples/> was really interesting to him, musically. Sort of like the ideal for a jazz band, because you’ve got all independent instruments but coming together from a single “mind”.
It seems like it's trained on entire soundwaves. I'm curious if you'd get a better result by training it on transcribed MIDI and then taking the output MIDI and plugging it into VST's.
Seems like you would still get that "central brain" compositional approach without the garbled sound quality and unidentifiable instrument noises.
Uh - no. That was classic (not classical) formless AI noodling on an orchestral bed.
It has multiple structural layers of a sort, which is progress over a few years ago. But they're still a long way short of the huge but intricate structures in real classical music.
I agree, but it's also really weird that such is the case. It's almost like "classical music" is some kind of code word prompting it to maintain a little musical coherence. The others break down rapidly after you listen to them for 5 seconds or so, but the classical-based one holds up a lot better.
That said, it'll be able to match modern overcompressed human cliche soup pretty easily. There's a lot of production out there which is really low hanging fruit for AI.
It's not unlike how the visual AI can do 'Greg Rutkowski', but has a hell of a time being an actual concept artist in a functional way. If the cliche soup is well defined, you're pretty much all set, particularly if it's not a genre that requires a lot of character.
> The information density of music is much higher than that of text or still images.
That depends on how you encode it. As a sound file (.wav, .mp3 or something like that) it's hard to compress but as for instance a midi file it can be very compact. Music is hard to make and hard to reverse but it is relatively compact in terms of source material if it can be expressed as midi.
Thanks for sharing!
In the 1st example ["computer nerds", "hackernews", "geeky", "computer tones"] seem to make the result sound more like an old-school RPG soundtrack rather than a song.
Thanks for sharing. To me it's another instance of generative AI being a technical marvel, but in absolute terms producing total crap. I'm sure it'll get better, of course, but I'm happy to completely ignore it for now.
I'm curious whether they put any guard rails around what kinds of sounds you can invoke. Can you get it to synthesize offensive speech that's legible? Does it do gun shots, fart noises, orgasm sounds, crying, or screams of pain?
Jokes aside, it’s still very early days for this kind of generative AI. I see a real use case for virtual band mates that play along with whatever type of music you play when jamming at home, for example.
>"Nice, but somehow it feels like I've heard these songs before. Could that be possible?"
According to the paper[0] "We found that only a tiny fraction of examples was memorized exactly, while for 1% of the examples we could identify an approximate match." FWIW I'm pretty sure I heard a fragment of Kalinka (a folk tune in the public domain) in one of the samples I generated.
Maybe it’s just me, but the further the prompts get from a concrete descriptions of instruments and compositions to abstract ideas and emotions, the more random and unpleasant the result.
In my opinion it isn't exactly great. It doesn't do much creativity. Eg. I asked it to make a dance song from a club but have classical instruments as well. It was unable to do that. It's mono and can often be out of tune.
We will see how it improves, but this certainly isn't taking away jobs for musicians in it's current state.
Those examples are really hard to listen to, except for maybe the techno/dancebeat ones.
Part of the problem is the awful sound, would be much more interesting to have scores generated by AI and then have musicians record it. As it is you can't really tell why it sounds so awful.
Give it 3 months. We've gone through this already with generative image AI. Another vibe I'm getting from reading these comments is that musicians clearly have discerning taste. Unfortunately, I don't think the public really does. I certainly don't. All these samples sound "good" to me.
Openai's Jukebox, which is maybe two years old now, is musically much more creative and interesting. Its sound quality is horrendous and it's amusingly unstable, but the density of good ideas in what it produces is orders of magnitude better than this.
As a musician, I'm waiting for the Copilot for music, which this is not.
I.e., I want to define the 'problem'. In musical terms: I have an idea, it's 8 bars, maybe 12 or 16. I don't like what's happening at bar x. Give me 3/5/7 options for something different, but maintaining the instrumentation/orchestration I've established. Give me some "temperature" controls to depart, or not, from what I've composed thus far. Etc, etc.
An LLM should be able to handle this. I have used it to remix midjourney prompts. They currently suffer a little bit in temporal and spatial reasoning.
And I just checked, GPT4 will "make music" in ABC notation, nothing like "Tropical House 120BPM" tho. It is pretty lousy artistically, that it can do it at all is absolutely amazing.
Since copilot was announced I have been looking for someone to do similar for music. It will be fun when people can start seeding/prompting the tool in just the right way to have it reproduce copies of copyrighted music.
It will be one of the more funny and novel way to lossy encoding music.
It's not AI based but I use Cthulu to do this. You map chords to single midi notes, and there's an optional arpeggio/melody sequencer. The neat thing is chords don't have to correspond to they note they're mapped to. So C2 can be an Emaj, C#2 is Emaj9, D2 is Esus2, whatever you want and there's lots of premade chord packs from the community in every key and various styles. I like the Bach inspired ones.
Because there's no relation between the note you play and the chord mapped to it, you can just noodle randomly without thinking of what you're pressing and just use your ears.
I'd take just having more control at a more granular level rather than having it try to generate a full song every time (e.g. generating just foley noises, or just a guitar melody, etc).
Maybe someone more knowledgable with music theory can chime in, but the generated tunes sounds off to me. Bit like a render in the uncanny valley. Something is wrong but I can't put my finger on it.
These models aren't actually doing any musical comparison -- they are trained on audio, and from audio, piece apart with a "note" and a "melody" and an "instrument" are from the labelled training data. No intentional theory is being done!
Algorithmic music composition has usually been split into two:
1. Generate notes (re: theory, genre)
2. Generate sound
(i.e., EMI[0], Kulitta[1], MusicNet[2])
Now we are doing both at the same time, and backwards. The model isn't (necessarily) going "write melody, then generate the sound", but rather, "here are 500 songs that are described with X, 500 with Y, and you want XY, so we'll combine these two" :)
(This is my best understanding, so feel free to correct)
The problem is that coherent musical structures are much more constrained. You can't just XYZ... into a space and get something that makes sense.
That will kind of work for low-density music, which includes a lot of landfill dance + subgenres. But these statistical models are blind to larger and more complex structures, and completely unaware of cultural context and semantics.
It's actually a harder problem than language modelling because the spaces and the grammars are much larger, especially once you start including sound quality and production values as well as arrangement and core composition.
This strikes me as the wrong approach. What is the end goal here? To have an AI black box that spits out an infinite stream of music? I don’t think people are going to be excited by music that has no human in the loop, nor any connection to the physical world.
We are already drowning in music, you can turn on Spotify and have enough music to fill a lifetime. Yet new music is still being produced, why? Because music is ultimately a psychological experience, the human connection is a not-insubstantial part of the experience.
There’s a place for AI in music but it has to be white box, there needs to be scope for a human to jump in there, modify things, and make it their own. Otherwise, who will care?
As a metal lover and hack guitarist, I would love an AI bandmate drummer that jams along with me.
Back when I used to play in bands, I’d often jam with a mate who was a drummer, he’d wait for me to start with a riff and then join in.
Sure, I can record some riffs at home and add realistic drums myself in a DAW, or use a basic drum machine, but the magic was in the unique unpredictability of how he’d interpret my riff and the beats he’d come up with on the fly.
As I got older, I stopped playing in bands, lost touch with this particular mate (who also stopped drumming) and generally don’t play guitar as much as I used to.
Give me a generative AI bandmate that listens to me while I play for a second and joins in and you have my money.
I know this is about how to convert text-to-music. But, is it just me to find most of the sample "music" to be melody-less. The beats and flow make it sounds like "music" but it is truly not. I'd call it text-to-noise for now lol
Played around with this a bit and was impressed. It's absolutely not ready to mix you a background playlist for evening dinner yet that doesn't leave your guests with a migraine, but ok for prototyping different feels of whatever music you are trying to create if that's a part of your process.
I believe the (tech using) public has had enough exposure to AI generated voices in music that this experiment will likely seem disappointing.
My hot take is that music is vastly underdetermined by text prompts, leaving too much control up to the model, so it's kind of doomed to produce Muzak.
This is not intended as negative because this looks cool but I noped out when reading the small print about data privacy for Test Kitchen generally.
Perhaps I was over-cautious or unfair in this case (I'm all in on MS/OpenAI stuff) just a gentle suggestion to give it some consideration if you don't normally pay it much mind.
I was excited to try it but all the outputs I've gotten feel pretty similar, and it doesn't do many of the things I wanted it to do. I tried things like "make a beat using samples from a telephone ringing" and it wasn't able to generate anything like a real world sample. "trash metal played on a nylon string guitar" was a weird bongo sounding jam. "trumpet playing over a folk rock riff" offered one output with no trumpet that sounded like a harpsichord playing random notes, the second sounded like some kind of smooth flamenco inspired jazz played by a full combo.
I think that something like this could be awesome given the right training data and model but this isn't quite there yet.
I am a musician and this is interesting to me at least. Some of it is laughably bad but there is some goodness in it too.
I could see this being used as a tool for song writers as it is currently it could be useful. Tell the AI a prompt listen to it and get some ideas. I do not think it is coherent enough to write a full song but I downloaded a few prompts and thought there is something good in this snippet.
A sentence of several words delivers (or doesn't) music, and people can only criticise. Predictably, on the topic of music, an area now apparently equally 'owned' by both non and musical people, comes a slew of playground responses.
Yes, to an extent. I tried a bunch of things to get it to generate kick drums, but it seemed to always include some other drums like a snare or even the cymbals.
Piano worked fine for me too. I haven't spent too long with it yet but I'm sure it can be done, even if not consistently.
a pretty big missed opportunity for a distributed record label
if i had this i would let people make music with it for free, put it under a de-facto license that allows google & the creator to distribute it, host it on a semi-social platform, and take a cut of the money that rolls in
but that's crazy, its not like they already have a massive video platform to host music videos
waiting list.. bard not available in country.. promises, talks.. and I've been a paying customer of google on several of their fronts. Meanwhile I've been a paying customer of OpenAI for several months already. Google needs to replace CEO asap if they want to stay. It's evident how incompetent management there is.
Maybe I'll be proven wrong but I believe this approach won't lead to unique, popular new songs. There isn't enough high quality training data. I think that for some time, the best approach will involve using AI as an assistant for different components of a song. That's what I've been doing with the melodies: https://www.melodies.ai/.
I like some of the outputs, but as far as I can tell you aren't exposing any way to actually use the model? And the last demos you posted are a year ago?
Human here. I think you mean it’s not music in an academic sense. Or have you forgotten the subjective nature of this? I’m certainly not happy about it, but I mean - people liked Soulja Boy’s “Yah bitch yah” and Cee Lo Green’s “Transformer”. Surely they could find something like these tracks pleasant as well.
I'm sure that there is a good reason why, but I dislike that Google Workspace accounts are often excluded from these kinds of things. I mean, I pay Google every month real money and in exchange I get less?
Add in the fact that I was denied access to the Bard trial due to it not being available in my country and I'm pretty uninterested in Google's releases in general these days. I had the same experience when the new Pixel Fold was released and instead of getting to see the product page I was redirected to the default Google store home page for my country. I couldn't even learn about the product.
I'm pretty sure Google workplace accounts have a different privacy policy and ToS than regular accounts. It makes sense that they're separate if the service requires agreement to one ToS but not the other.
> I’m sure that there is a good reason why, but I dislike that Google Workspace accounts are often excluded from these kinds of things. I mean, I pay Google every month real money and in exchange I get less?
No, you pay Google for a business-focused offering that favors stability and proper integration of management tools for enterprise, so you get stability and products that have had proper integration of management tools for enterprise.
You can pay (for more storage, support, and other benefits) for the Google consumer offering in the form of Google One, in which case you get the consumer offerings even when they are unstable and haven’t been integrated with enterprise management, but don’t get the business-focused components that come with Google Workspace business offerings.
Workspaces mainly focuses on enterprise customers whose administrators (with good reason) do not like to be surprised when new services are added to their users' accounts without some kind of vetting process by corporate security, legal, etc. I know many "powerusers" like to sign up because it lets them use "gmail" but with their own domain, but I suspect this is a niche use case compared to those who use it in corporate/enterprise environments.
I hear you. Twice I have signed up for Google Workplace with my custom domain name and was very happy with the service, except having to deal with two Google accounts. I subscribe to several Google service using my free gmail account.
EDIT: I like MusicLM, I was trying it earlier today.
IF you want “Google’s free consumer offering, but with more support and storage”, that’s Google One.
Workspace is a business-focused offering that favors stability, enterprise-oriented administration, etc.; it gets business-focused features that aren’t in the consumer offering, and it often doesn’t get new consumer features until they have stabilized and additional work has gone into enterprise administration for them.
Before Google One, buying Workspace (or whatever its name was then, its been through many name changes) as a way to get as close as possible to “Google consumer services (Docs, Sheets, Gmail, etc.) but with paid support”, but that was never what the offering was about, and there now is a clear and explicit offering that is exactly that, so continuing to complain that Workspace isn’t what it is explicitly not intended to be and what Google One explicitly is somewhat pointless.
Google has tiers of support for products (i.e. "How comfortable is SRE that they will be able to keep this thing in the air if every SWE that built it gets isekai'd into a magical universe that isn't ours tomorrow?").
Anything they put out in front of Workspaces customers is max-tier support because their data shows that customers won't stop paying because they haven't gotten a new feature yet, but will stop paying if they get a new feature and it's broken.
You need a google account, but not a google workspace account, to sign up for a waiting list to get access. That's pretty far from "Google makes its text-to-music AI public".
While S&P500 trended downwards, Google’s stock price is up 9% since PaLM 2 first leaked this week so the market does not agree with your assessment of the tech.
Competition is a critical driver of innovation and openness under capitalist incentives. Google had virtually no competition for a long time and they predictably stagnated. That’s why it’s so critical to curb monopolies–there’s no way to “speak with your wallet” if the only choices are say Comcast and Spectrum. It’s also way easier to regulate effectively and fairly than reform economic incentives in other ways. I’d prefer a free market economy with a vibrant distribution of small-medium entities over a highly regulated marketplace with only a few behemoths. Ironically, corporate behemoths operate in similar ways to bloated government bureaucracies on the complete opposite end of the political spectrum.
https://share.getcloudapp.com/7KuzO6QO
Let's make some tropical house, use the lyrics "call my private number" as the hook, the vibes should be disco, light, fun:
https://share.getcloudapp.com/2Nub4m6B
Let's take some classical music and turn it into techno:
https://share.getcloudapp.com/NQuW6lBP
tropical house, 120bpm, bongos, wind chime, flutes:
https://share.getcloudapp.com/2Nub4mDq
Last one is pretty cool ^^