Blind Source Separation from Single Channel Audio Recording Using ICA Algorithms

Blind Source Separation from Single Channel Audio
Recording Using ICA Algorithms
Juan S. Calderón Piedras
Universidad Distrital F. J. C.
Bogotá Colombia
jscalderonp@correo.udistrital.edu.co
Álvaro. D. Orjuela-Cañón.
IEEE CIS – COLOMBIA Chair.
Bogotá Colombia
dorjuela@ieee.org
Abstract—FastICA method has been proposed for blind
identification and separation characteristics of components, this
paper has made a study of this method in order to measure its
performance in the task of separating real audio signals that share
the same channel simultaneously .We propose an SCICA
algorithm based on FastICA, which allows finding the mixing
matrix and its inverse. In this way, it is possible to find
representative bases, which after a clustering process, are used as
impulse response filters to discriminate source signals. Parameters
used in the process identifying sources are studied to improve the
results.
Keywords — Independent component analysis, single channel
ICA, blind source separation, power spectrum, filters, audio signals.
I.
INTRODUCTION
Applications in audio engineering are currently demanding
in terms of advances in digital signal processing. Processes for
speech segregation and recognition, automatic music
transcription, musical information systems, forensic audio and
signal separation need powerful techniques to complete the
application required.
In this way, Independent Component Analysis (ICA)
technique is useful in audio engineering tasks [1],[2], where
Blind Separation of Signals (BSS) from mixtures procedures
have extensive treated some time ago [3]. For applications cited
above, the need for separation usually happen in single channel
mixes, and this problem cannot be solved with traditional ICA
techniques. In this case, it is necessary to develop new and
complementary algorithms to separate the signals, known as
Single Channel ICA (SCICA).
In SCICA as in the basic ICA model, where collection of
perceptually motivated techniques jointly called Computational
Auditory Scene Analysis (CASA) are widely used [4]. The
solutions in SCICA are generally limited, due to require strong
additional assumptions, where the separation is often only
achieved through fairly intensive computational procedures [5 –
7]. Improving these limitations, information extracted from
differences between time-frequency (t-f) distributions of
sources are frequently used. Studies with these characteristics
can be found in [8 – 10].
David A. Sanabria Quiroga
Universidad Distrital F. J. C.
Bogotá Colombia
dasanabriaq@correo.udistrital.edu.co
Although SCICA method can solve the problem, is
necessary to know under which SCICA parameters we have the
best results. SCICA approach, as noted in [11] is a special case
of the analysis of multidimensional independent components
(MICA) [12], in SCICA the input vector must be delayed N
times, that implies multiple independent components (IC's)
associated with a single independent source, which makes it
necessary to group the IC’s in order to reconstruct the
corresponding initial signals. As restriction we must have
disjoint spectra to discriminate between signals this through
filters with coefficients provided by ICA.
The present work describes a methodology to separate two
signals mixed in one channel. SCICA is used to find and
separate the sources. Next section shows characteristics from
used signals and details about methodology implemented. A
discussion and conclusions extracted from the results are
presented in sections IV and V.
II.
MATERIALS AND METHODS
This section describes the database and methodology used
to obtain the sources. We implemented a methodology to get
independent components, based on ICA and SCICA theory that
will be the signal source to be separated.
A – Database
Two sets of signals were used to test the algorithm; the
first signal created was a Synthetic signal with uniform
distribution, zero mean and unit variance; the signals has a
spectral and statistics characteristics that are well adapted to the
method, that means: There is at least one Gaussian process, all
the independent random processes must be band limited with
disjoint spectral support; and no non-Gaussian process can be
further decomposed into multiple independent or spectrally
disjoint processes (maximal decomposition).
The second set of signals are a guitar and the voice of a
man, these signals were acquired individually. A Shure SM-57
microphone was used with a sample rate of 16 KHz. To create
these signals we avoid all the environment noise.
B – Independent Component Analysis (ICA) and Single Channel
Independent Component Analysis.
Single channel ICA is part of the ICA method, where
analysis has an initial search approaching to independent
component, the ICA model is constructed by following.
𝒙 = 𝐀𝒔
(1)
Where x corresponds to the mixed signals and s are
independent random vectors, or source signals.
The main concept of ICA is finding A, which
represents the mixing matrix with which the initial random
vectors have been mixed. From (2) is possible to construct an
immediate solution to the problem of blind source separation.
𝐬=𝐖∗𝐱
(2)
Where W represents the inverse matrix of A, that is
𝐀−𝟏 = 𝐖, then ICA is a method which focuses on the mixing
matrix calculating A by maximizing the non-Gaussianity of the
signal. The SCICA model can be seen as an extension of ICA
since it uses the matrices A and W to make the filter separating
source signals. ICA is a method that requires a source number
equal to the number of available mixtures signs while SCICA
only needs two signals for a mixture to be separated.
The final SCICA method is a blind method which can be use
with artificial or real audio signals recorded by any microphone,
so the input matrix for the method is constructed as shown in
(3).
𝐗=
𝐱n
𝐱 𝐧−𝟏
⋮
𝐱 𝐧−𝐍+𝟏
(3)
FastICA is one of the fastest and robust algorithms,
where a principal component analysis and whitening
preprocessing are implemented [13]. Due to (3), it was
necessary to find the right number of delays to develop the
SCICA technique. In this way, this number was found
experimentally, using different types of signals and study the
correlation between the source signals and the signals found for
different delays.
SCICA needs an input matrix as in (3), then you can
use any of the algorithms of ICA to make a decomposition into
independent sub components, the problem is that ICA will get
many solutions as delays (N) because it was made over mix
signal, that means, the source signals which are the independent
components have been broken, in other words, the independent
components are divided. The characteristics of each
independent sub-component can be observed from the mixture
matrix by finding the FFT from the A columns, which gives us
a different wave form to discriminate between signals.
K-means algorithm allows cluster waveforms that are
similar to each other, this components are grouped into subgroups 𝛾𝑝 , Number of clusters 𝛾𝑝 was determined using the
Davies-Bouldin index, which asses the quality of clusters
through distance within and inter clusters relation [15], then
will require a re-organization of the mixing matrix and its
inverse (A and W respectively) according to the groups of submatrices 𝛾𝑝 where 𝑎𝑖 are the columns of A and 𝑤𝑖 are de rows
of W. Finally, the separation - reconstructing filter in the initial
sources is defined as:
𝑓𝑖 =
1
𝑁
𝑎𝑖 −𝑡 ∗ 𝑤𝑖 𝑡
(4)
𝑖 ∈ 𝛾𝑝
Filter impulse response coefficients were obtained
using (4) with 𝑓𝛾𝑝 = [𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 ]′, this represents the
frequency response for the 4 clusters created by the K-means
algorithm. Diagram of the method is shown in Fig. (1).
Where n is equal to the length of the X signal, which is
the input matrix system to SCICA. Once you have the X matrix
with N delays, we proceed to perform a dimensionality
reduction and whitening; these methods are not strictly
necessary but they are a very effective block pre-processing to
shorten the waiting time and the searching process time of the
independent components, is therefore highly recommended to
use such methods for implementing SCICA.
Then the dimensionality and whitened input matrix
runs on ICA in order to make the search for finding the separate
components by A and W matrix, ICA measures the nongaussian to the signals carried by the kurtosis as a function of
contrast to the measurement of this parameter.
C – Implemented methodology
Fig. 1 describes the methodology implemented. First
the signal is delayed and a reduction is implemented. Then an
ICA algorithm was used to find N independent components,
which were clustered by K-means algorithm [14]. Finally,
filters were built using the found clusters as impulse response.
Fig. 1.
Flow diagram of SCICA method.
Finally to validate the results obtained have been three
measures, correlation index, the signal noise ratio and the mean
square error, these evaluate the similarity of the source signals
with those found by the method SCICA.
III.
RESULTS
Consider Fig. 2 there are two artificial signals, the input
vector from experiment 1 is the mix by the addition of this two
signals.
Fig. 3. Correlation index for S1/ IC1 and S2/IC2.
Fig. 2.
SCICA Source Signals.
One of the most important parameters for SCICA
method is the number of delays from the single mix signal. To
define the number of delays we make a sweep to measure the
correlation index between the obtained signal and the initial
source, this results are shown in Fig. 3.
We found the highest correlation value for the two
signals (Fig. 3) between the ranges of 22 to 32 shifts. Consider
now that X has been delay 25 times, and s is the output of ICA
system and represents the independent components, the matrix
W is of size [25x25] that is the inverse of the mixing matrix A,
x [25xn] with n equal to the number samples of the source
signals, this assures that the output of the system will have a
dimension, s[25xn] corresponding to 25 possible solutions or
25 independent sub-components with a length equal to the
length of the vector mixture.
Through some method of clustering is necessary to
group the 25 signals with a similar waveform, these
characteristics can be observed in the time domain or the
frequency domain. For this we use the k-means algorithm with
four clusters, supported by this Davies-Bouldin index as shown
in Fig. 5, where the lowest index in a range given by 2 and
0.04
1000
2000
3000
4000
5000
6000
7000
8000
The key concept to SCICA is the independent subcomponents decomposition; therefore it is necessary to consider
a source signal as the sum of several subcomponents found in
Table 3. 3 measurement parameters are shown to observe the
performance of the method. Correlation index shown is high,
above 96% similarity of the resulting signal compared to the
-1
0
50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
300
350
400
450
500
1
10
0
0
1000
2000
3000
4000
5000
6000
7000
8000
0.04
-1
1
0.02
0
In the second experiment we used the acquired signals
from guitar and voice, these signals have a peculiarity is that in
general can be considered independent random vectors from
one another, an essential requirement for good performance of
the method. As a second restriction is necessary that the source
signals have disjoint spectra. We run SCICA to find the
separation- reconstruction filters. First signal corresponds to the
IC1 component shown in Fig. 6. b. (2), to construct the second
source signal is necessary to add the components IC1, IC3, IC4
shown in Fig. 6. b. (1, 3, 4).
0
0
20
0
In Fig.4 (a) are shown the impulse response of the
filters constructed by (4). Fig. 4 (b) is the result of applying the
filter to the mix, as is evident has been recovered the source
signals.
1
0.02
0
5 is in 𝛾𝑝 = 4 cluster. This interval is necessary to narrow it
down to a small number to avoid an excessive number of
solutions that may contaminate the reconstruction of the IC's
(Independent components).
0
0
1000
2000
3000
4000
5000
6000
7000
8000
0.1
-1
1
0.05
0
0
0
1000
2000
3000
4000
a.
Fig. 4.
5000
6000
7000
8000
-1
b.
SCICA (a) Impulse response to 4 filters; (b) Waveform of the mixture after being filtered.
Normalization was done by the maximum value found in
the signal, It was also necessary to delay the output signals in
order to find the correct phase of each. The number of delays
made on the initial mixture determines the order of the filter
building by method to discriminate the signals source, the range
for the model proposed here is between 22 and 32 shifts.
Another important technical performance parameter is the
number of clusters; the number was fixed in 4 groups that allow
good discrimination and classification of separate subcomponents, A good number of delays and cluster helps us to
build a filter with a spectral response equal to the target signal.
Fig. 5.
I.
Davies-Bouldin index for Experiment 2.
source signal, it is important to highlight that the SCICA
method here developed is a method of filtering the mixed signal
so if there overlapping the frequency spectra of the source
signals, it will be impossible to separate these signals.
CONCLUSIONS
The SCICA technique is a solution to the problem of the
blind source separation, as long as the spectra of the signals to
separate are not overlap in frequency, so it is a method that
works for very specific situations in a wide variety of sounds
that the human being is entitled to recognize.
The number of delays to the initial mixture and the number
of clusters to group correctly independent sub-components are
the most relevant parameters for the method SCICA, here we
find the suitable range for the discrimination of two signals with
only one mixture
The signals found by the method SCICA not preserve the
amplitude and the phase of the initial signals, depending on the
application will be necessary to consider the importance of
these features because it is not possible to access to any kind of
data that give us information about the initial phase and
amplitude.
TABLE 1. SCICA results.
IV.
DISCUSSION OF RESULTS
The filters constructed by the SCICA method has non-unity
gain and non-zero phase, therefore it is necessary to normalize
the signals before obtaining the measurements shown in
Table.1.
-3
1
1
x 10
0
0.5
0
1
-1
0
1000
-3
x 10
2000
3000
4000
5000
6000
7000
1
1
1
2
3
4
5
6
4
x 10
1
0
-1
0
1000
-3
x 10
2000
3000
4000
5000
6000
7000
0
1
2
3
4
5
8000
6
4
x 10
1
0
0.5
0
0
8000
0.5
0
Audio signals in general has a small range of frequencies
located between 20 Hz and 20 kHz, this requires the separationreconstruction filters has higher orders. In SCICA the order of
the filter is directly related to the number of delays to the initial
mixture, therefore a number of delays greater than 20 are
necessary to obtain a good performance in the separation of real
audio signals.
-1
0
1000
-3
x 10
2000
3000
4000
5000
6000
7000
8000
0
1
2
3
4
5
6
4
x 10
1
0.5
0
0
-1
0
1000
2000
3000
4000
5000
6000
7000
8000
0
1
2
3
4
5
6
4
a.
b.
x 10
Fig.6. Filter and resulting signals; (a) Impulse response of the separation filters, (b) resulting signals after applying the mixture to the separation filter.
REFERENCES
[1]
P. Comon, Independent component analysis -a new concept? Signal
Processing. 287-314, (1994).
[2]
Hyvärinen and E. Oja. Independent Component Analysis: Algorithms
and Applications. Neural Networks, 13(4-5):411-430, (2000).
[3]
Cardoso J.-F., Blind Signal Separation: statistical
Proceedings of the IEEE, 9, 10, 2009–2025. (1998).
[4]
Wang D.L., Brown G.J. Computational auditory scene analysis,
Principles, Algorithms, and Applications. IEEE Press/WileyInterscience, Hoboken NJ. (2006).
[5]
M. E. Davies , N. Mitianoudis , A Simple mixture model for sparse over
complete ICA , IEE Proc. VISP 151 (1) 35-43,(2004).
[6]
M. Girolami, A variational method for learning sparse and over
complete representations, Neural Comput.( 2002 ) 2517-2532 .
[7]
M.S. Lewicki , TJ Sejnowski , Learning over complete representations ,
Neural Comput . 12 ( 2000 ) 337-365.
[8]
D. Mika, P. Kleczkowski, ICA-based Single Channel Audio-Separation:
New Bases and Measures of Distance, Archives of Acoustics, Vol 36,
No. 2, pp. 311-331, (2011).
[10]
E. Vincent, H. Sawada, P. Bofill, S. Makino, and J. Rosca, First Stereo
Audio Source Separation Evaluation Campaign: Data, Algorithms and
Results, Independent Component Analysis and Signal Separation,
Lecture Notes in Computer Sciences, Vol 4666, pp. 552-559, (2007).
[11]
M.E. Davies, C.J. James, Source separation using single channel ICA
,science direct, Signal Processing 87 (2007).
[12]
C. J. James , O. Gibson, M. E. Davies , On the analysis of single versus
multiple channels of electromagnetic brain-signals , Artif . Intell . Med
37 (2) 131-143,( 2006 ) .
[13]
Hyvärinen. Fast and Robust Fixed-Point Algorithms for Independent
Component Analysis. IEEE Transactionson Neural Networks
10(3):626-634, (1999).
[14]
Coates, Adam; Ng, Andrew Y. "Learning feature representations with
k-means". In G. Montavon, G. B. Orr, K.-R. Müller. Neural Networks:
Tricks of the Trade. 2nd edn, Springer. (2012).
[15]
Davies, David L.; Bouldin, Donald W. "A Cluster Separation Measure".
IEEE Transactions on Pattern Analysis and Machine Intelligence.
PAMI-1 (2): 224–227. (1979).
principles,