Meta's SAM bot keeps 'em separated as it isolates voices and instruments from audio clips
No mention of protections to stop it being used to snoop on people
by Brandon Vigliarolo · The RegisterWant to hear just the guitar riff from a song? How about cutting out the train noise from a voice recording? Meta says its new SAM Audio model can separate and edit sounds using simple prompts, cutting down on the manual work typical of audio-editing tools.
The release of the Segment Anything Model (SAM) Audio follows the previous release of Meta-made segmentation models for visual assets. Meta now claims that it has created "the first unified multimodal model for audio separation" in SAM Audio, which is available today on the company's Segment Anything Playground as well as for download.
By "multimodal," Meta is referring to SAM Audio's ability to interpret three types of prompts for audio segmentation: text prompts, time-segment markings, and visual selections in video used to isolate or remove specific sounds.
Take a video of a band playing, for example, and select the guitarist to have SAM Audio automatically isolate that player. Highlight the waveform of a barking dog in an outdoor recording, tell SAM to remove that sound, and it can trace and eliminate those interruptions throughout the entire file.
"SAM Audio performs reliably across diverse, real-world scenarios — using text, visual, and temporal cues," Meta said in its SAM Audio announcement. "This approach gives people precise and intuitive control over how audio is separated."
The company said it sees a number of use cases for SAM Audio, like cleaning up an audio file, removing background noise, and other tasks that previously required hands-on work in audio-editing software or dedicated sound-mixing tools.
That said, using AI to process audio isn't exactly a new idea - there are plenty of products out there that do what SAM Audio does, but Meta describes the space as a "fragmented" one, "with a variety of tools designed for single-purpose use cases," unlike SAM Audio's so-called unified model.
Given its ability to isolate specific sounds based on user prompts, questions may naturally arise about the safety of such a model and whether it could be used to single out voices or conversations in public recordings, potentially creating a new avenue for snooping. We picked through Meta's SAM Audio page and an associated research paper to get more information on safety features built into the new model, but the company didn't cover that at all.
When asked about safety, Meta only told us that if it's illegal without AI, you shouldn't use AI to do it.
"As the SAM license notes, use of the SAM Materials must comply with applicable laws and regulations, including Trade Control Laws and applicable privacy and data protection laws," a Meta spokesperson told The Register, making it sound suspiciously like using SAM Audio for evil would be perfectly within its capabilities.
Then again, it's possible Meta's own admission that SAM Audio has "some limitations" may mean that it's not exactly ready for those who want to use AI to reenact a modern version of The Conversation. It's still "a challenge" for SAM Audio to separate out "highly similar audio events," like picking out one voice among many or isolating a single instrument from an orchestra, Meta noted. SAM Audio also can't complete any audio separation without a prompt, and can't take audio as a prompt either, meaning feeding it a sound you want it to isolate is still outside of the scope of the bot.
One area that SAM Audio could be useful for is in the accessibility space, which Meta said it's actively working toward. The company said it's partnered with US hearing aid manufacturer Starkey to look at potential integrations, as well as working with 2gether-International, an accelerator for disabled startup founders, to explore more accessibility possibilities that SAM Audio could serve. ®