Inside Suno’s New AI-Music Model

Even as it faces a lawsuit from the recording industry for using countless copyrighted songs to train its music-generating AI model, Suno has become the fifth most-used generative-AI service in the world — and the company is still pushing its technology forward. A new, notably more realistic model, V4, is available today to paid subscribers, and will eventually reach all users. “I think it crosses into something I actively want to listen to,” says Suno co-founder Mikey Shulman. “Instead of, like, something that I want to keep making it better.”

Shulman is sitting in a brand-new studio space, complete with actual guitars, basses, and a high-end sound system, in the company’s equally new custom-built offices, which take up two floors — soon to be three — right by the Harvard University campus in Cambridge, Mass. “We had to make the model better to justify having bought the fancy speakers,” Shulman jokes. As of February, the company had 12 or so employees; now they’re up to more than 50, with more to come. “It’s hard to compete with OpenAI for really talented researchers,” says Shulman, referring to the AI giant behind ChatGPT. “But the way we compete is if you want to learn to align [AI] models with human taste, there’s no better place to do it.”

Unlike large language models, which have objective benchmarks — you can compare Claude and ChatGPT’s scores on the LSATs, for instance — Suno’s engineers can only use human preferences. Taking note of past users’ preferences between different results from the same prompt has played a substantial role in improving the new model. “Just after a few more months of being able to do that, we have better ideas of what human preferences are,” Shulman says.

AI-generated music, whether from Suno or its most direct competitor, Udio, tends to have a certain tinniness — not unlike low-bitrate MP3s — that’s most evident in the vocals. As we generate song after song over a couple hours in the studio, V4’s productions are crisper than any previous Suno model could muster, with more realistic singers and instrumentation, plus a broader stereo field. Shulman says that the model has improved its composition skills as well. “The music is getting more interesting,” he argues. “You’re getting chord changes you didn’t expect.”

Editor’s picks

In one of our efforts, which you can hear above, we used a set of lyrics I quickly wrote, paired with a prompt for “organic country,” with fairly impressive results — you can practically see the worn hat on the nonexistent vocalist. That vocalist does sound notably Auto-Tuned, which may reflect the number of electronically enhanced vocals in Suno’s training data.

For AI-music’s opponents — a category that officially speaking, at least, includes almost the entire recording industry and its artists, many of whom have signed anti-AI petitions — the prospect of an even more capable music-generating AI isn’t good news. There are exceptions, though: Timbaland, for one, recently told Rolling Stone he’s using Suno “10 hours a day” to finish incomplete songs, and has partnered with the company as a creative consultant. And Shulman insists he’s hearing from numerous artists, songwriters, and producers who are quietly using Suno, including at least one A-lister who Shulman says signed an anti-AI petition.

Shulman hopes some agreement can be reached about the use of training data, but also thinks artists should be more worried about models that can eventually reproduce their voices even if they’re not trained on them — something Suno doesn’t allow, since artists’ names are prohibited from prompts. “Somebody is going to train a model without any Neil Young in it,” says Shulman. “And then figure out how to prompt a spitting image of Neil Young out of that model by describing it correctly.”

Related Content

Suno’s capabilities have gone well beyond ChatGPT-style text prompts — you can now upload your own partial compositions, a cappella vocals, loops, or other audio and turn it into songs, in an advanced level of human-AI collaboration. (They also have a beta feature that lets you upload videos or photos to inspire songs.)

Rebecca Hu, a project manager for Suno, says the ability to iterate from existing audio is attracting young beatmakers to the platform. “A lot of our power users are young producers,” says Hu. “They think this is the future…. We’re trying to move to a music-based UI. Text is hard to understand when it comes to music…. I think the interesting use cases are producers or songwriters in a room, iterating.” Still, the company is mostly focused on its original mission of getting non-musicians involved in making music.

V4 also comes with an option to use a new, in-progress lyric-generating model the company is working on, which generates quirkier and more human-seeming lyrics than its previous use of ChatGPT’s model. It’s notably better at generating rap lyrics, though it does bite a line from Drake circa 2015 — “running through the Six” — in one of our demos.

The copyright lawsuit looming over Suno isn’t on the minds of most employees, Shulman says, but it “obviously affects things, It’s not good to to get sued. But I think there is a future of music that we are excited about building. And viewed in that light, this is a speed bump, but should not ultimately get in the way of everybody building that future of music.” He adds that he wants to eventually enlist labels and artists as partners: “That future of music, we actually can’t and don’t want to do by ourselves.”

Source link