Skip to main content

Unsupervised machine learning: Source separation


Source separation is a way of figuring out whether a set of observations may have resulted from multiple signal 'sources' in the environment whose signals are mixed onto measurement variables, one example being multiple simultaneous voices recorded on multiple microphones in a room.  Each microphone may record each voice, although at different volumes depending on relative locations of voices and microphones. Source separation seeks to reconstruct the original sources in isolation from each other.

One family of approaches to solving this problem is called independent components analysis or ICA.  ICA works by learning an alternate way of representing a set of observations, using a 'vocabulary' of learned patterns.  New observations are represented by adding together various amounts or intensities of each learned pattern until they all add up to the observation.  In the example of multiple voices in a room, one such pattern might be the respective strength of one voice on all the microphones, and the intensity of that pattern over time would be the fluctuating intensity of that one voice, isolated from the others.

ICA learns patterns whose intensities have as little statistically to do with each other as possible over a set of observations (thus 'independent').  By trying to be different from each other, but still reconstruct each observation accurately, the stored patterns or components learn to reveal sources if they are present.

Figures A and B show histograms of separate segments of speech from two people.  Figure C shows a plot of the intensities seen on two microphones in the room when the two speakers never speak at the same time.  The lines are diagonal because one speaker is closer to microphone 1, and the other speaker is closer to microphone 2.  Figure D plots the added intensities seen on the two microphones when the two people speak at the same time.  Source separation involves learning from the mixed data in D how to find the directions seen in C, and subtracting the effects of these directions from each other to isolate each voice from the mixture.

The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions, or viewpoints expressed by the author.