Context

Fusion or combination of various information sources is a main problem in the machine learning community. This problem is especiallly important in signal processing and pattern recognition applications when more than one source of informations is available to the classifier. Even if the independece assumption does not hold and we know that the modeling/estimation error could be different for each stream the joint PDF is assumed to be obtained by multipling the PDFs. The practical solution to this problem is to use "stream weights in order to reduce the total classification error.

Compute those weights is not a trivial problem seeing that they are dependent of the application and even of the training and test conditions. One manner to compute those weights is done directly from the streams through their performances. The parameters of a given model can be reestimated as a function of the weights. In a second approach the system is adapted to a given context. Here the parameters are related to the reliability of the streams in a given enviromental conditions based, in general, on the SNR. In a similar manner, the enviromental conditions can be estimated by the performance of the system in different models.

Some of the algorithms presented in the previous paragraph have problems in real applications. For example some of them use external information, which is not presented in the input signals, or need and extra database registered in the same test conditions to train their systems. In almost all the works the parameters, weights included, are computed in the training phase using a held out database in a supervised manner. In real conditions, specially when the training data does not reflect the characteristics of the test data, an unsupervised approach could improve the system performance. Therefore the main goal in our work is to compute the optimal stream weights for the multi-stream classification problem in an unsupervised manner.

 

Overview

 

Based on the assumption that the modeling/estimation error for the feature PDFs is a random variable the deviation of the decision boundary from the optimal Bayes boundary is also a random variable that we assume is a zero-mean Gaussian variable. The classification decision is then a function of that random variable. The classification error function can not be minimized directly, but an aproximation is to compute the weights that minimize the variance of the decision boundary deviation given by the variance of the random variable. Actually, it can be noticed that stream weights may reduce estimation error only when either the PDF estimation error of the single stream classifiers are different and/or the Bayes error of the single stream classifiers are dfferent. If the two streams have the same informativeness, equal Bayes classification error, the stream weights are inversaly proportional to the sum of the variances of the PDF estimation error for each of the classrs of that given stream :

 

Similarly, if two streams are equally reliable, the same estimation error variances, the stream weights should be approximately inversely proportional to the classification error of the single stream classifiers : 
Based on our theoretical results, for the two-class classification problem, we have proposed to use models and "anti-models" to estimate stream weights to move the multiple-class problem to a multiple two-class problem. And we have used inter- and intra-class distance to run over the knowledge of class membership, see next Figure. 

Figure 1: Representation, in two dimensions, of the two classes classification problem. Each axis represents one stream.

In in the field tests, once the models and anti-models are created, a k-mean algorithm is performed using the means of the models and the test file. Issue of this operation new centroids are obtained and a set of new weights are obtained for the given test file. The method is as follows (see Figure 2) :
  • Provide initial centroids, from the actual models, for the k-means,
  • Perform k-means using only the test data,
  • Compute inter- and intra-class distances,
  • Estimate final stream weights.

 

Figure 2: Practical stream weights estimation process.

 

In this manner the proposed method employs only the information contained in the trained models (which can be trained only with clean data) and requieres a single utterance to compute the stream weights. The proposed method achives comparable performance with the supervised minimum error estimation of the weights.

 

Applications

  • Audio-Visual Speech Classification
  • Audio-Visual Speech Recognition

 

Projects

NoE MUSCLE

 

Contributors

Eduardo Sánchez Soto
Khalid Daoudi (contact)

 

Main Publications