Skip to page content

This New Tech from MIT Could Make Life Easier for Musicians


MIT-CSAIL's Music Software
Image credit: Juan Pizarro / EyeEm via Getty Images.

Have you ever tried to learn to play a specific instrument’s part on your favorite song? Isolating the audio of different musical instruments, especially from a video of them being played together, has always been tricky.

However, MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) recently developed technology that allows users to make the audio of distinct instruments louder or softer, just by clicking on them being played in a video.

The “deep learning” system behind this technology was trained on over 60 hours of video footage and does not require any human annotation telling it what an instrument is or what it’s supposed to sound like. The software, dubbed PixelPlayer by its creators, learned on its own.

It can identify the sounds of over 20 musical instruments by examining every pixel in a video of them being played and pinpointing which sound should be associated with each pixel.

PixelPlayer’s deep learning works by training three “neural networks” on certain videos. One network analyzes the video’s audio, another examines the visuals, and a third called the “synthesizer” puts it all together by linking different soundwaves with the pixels of their instruments.

PixelPlayer was created by a team led by Hang Zhao, a PhD student at CSAIL. With more data and time to examine it, Zhao said it will be able to learn and single out even more instruments.

In addition to being used by musicians to learn new parts, the researchers behind PixelPlayer see it being used in to help audio engineers improve or even swap out certain instruments in old concert footage.

Previous attempts at a system like this focused solely on audio, which required exhaustive human labeling. By focusing on the visuals associated with certain instruments, PixelPlayer can train without supervision.

“We were surprised that we could actually spatially locate the instruments at the pixel level,” Zhao said. “Being able to do that opens up a lot of possibilities, like being able to edit the soundtrack audio of individual instruments by a single click on the video.”


Keep Digging

Coolidge Corner Theatre Science on Screen
News
Ocean floor mROVs
News
CELLTREAT 3 Nemco Way Ayer MA (1)
News
PSU Robotics opening
News
Spark Charge Roadie
News


SpotlightMore

See More
See More
See More
See More

Upcoming Events More

Nov
18
TBJ
Oct
10
TBJ
Oct
29
TBJ

Want to stay ahead of who & what is next? Sent daily, the Beat is your definitive look at Boston’s innovation economy, offering news, analysis & more on the people, companies & ideas driving your city forward. Follow the Beat.

Sign Up