Have you ever tried to learn to play a specific instrument’s part on your favorite song? Isolating the audio of different musical instruments, especially from a video of them being played together, has always been tricky.
However, MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) recently developed technology that allows users to make the audio of distinct instruments louder or softer, just by clicking on them being played in a video.
The “deep learning” system behind this technology was trained on over 60 hours of video footage and does not require any human annotation telling it what an instrument is or what it’s supposed to sound like. The software, dubbed PixelPlayer by its creators, learned on its own.
It can identify the sounds of over 20 musical instruments by examining every pixel in a video of them being played and pinpointing which sound should be associated with each pixel.
PixelPlayer’s deep learning works by training three “neural networks” on certain videos. One network analyzes the video’s audio, another examines the visuals, and a third called the “synthesizer” puts it all together by linking different soundwaves with the pixels of their instruments.
PixelPlayer was created by a team led by Hang Zhao, a PhD student at CSAIL. With more data and time to examine it, Zhao said it will be able to learn and single out even more instruments.
In addition to being used by musicians to learn new parts, the researchers behind PixelPlayer see it being used in to help audio engineers improve or even swap out certain instruments in old concert footage.
Previous attempts at a system like this focused solely on audio, which required exhaustive human labeling. By focusing on the visuals associated with certain instruments, PixelPlayer can train without supervision.
“We were surprised that we could actually spatially locate the instruments at the pixel level,” Zhao said. “Being able to do that opens up a lot of possibilities, like being able to edit the soundtrack audio of individual instruments by a single click on the video.”