English 中文(简体)
How to Get a Quantitative Comparison of Two Signals
原标题:

I’m trying to implement a Blind Source Separation (BSS) algorithm and I’m running into trouble determining the efficacy of the algorithm.

I’m trying to create test cases where I work backwards and start with a signal vector s, which is normally unknown, and then create a mixing matrix A, which I use to transform s to create the observation vector x, which is what is normally observed by things like sensory equipment. Therefore I have a model that looks like

x = A * s.

I then put x into the BSS algorithm and get s’, which is a reconstruction of the signal vector.

Now this is where I have a lot of problems; how can I compare s’ and s, and get a quantitative measure of how similar the two vectors are? The algorithm I am studying can only reconstruct the signal vector up to a negative sign (therefore it is possible for s’ to be similar to -s, or stated in another way, for s to be similar in "shape" to s but flipped) and cannot guarantee the preservation of the amplitudes of the signals. So I want to compare the “shapes” of the signals to each other, while also anticipating the fact that while their "shapes" might be similar, they might be flipped.

Just to clarify, when I say signal, I really mean a matrix that could be something like 50 x 10000 (50 different channels with 10000+ data points taken over time). Another problem that arises from the BSS algorithm is that the ordering of the channels is not guaranteed to be preserved. So given s 1, s 2, s 3, ... , s N, which would be the different channels of s , might be reconstructed (again, perhaps flipped and of different amplitudes compared to the original channels in s), but the ordering is not guaranteed to be preserved. So s 1 might correlate instead to s23, and s 2 to s5, and so on and so forth.

So I was wondering if there s a fast and efficient way of comparing the similarity between two different matrices, which are assumed to be composed of vectors that should correlate to one another, although not in the same order, sign, or amplitude.

What would be the best way to solve this? Appreciate the help!

问题回答

Try to read the recent paper from Science

Detecting novel associations in large datasets D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander, M. Mitzenmacher, P. Sabeti Science 334, 6062 (2011) http://www.uvm.edu/~cmplxsys/newsevents/pdfs/2012/reshef-correlation-science-2011.pdf

The code usable using either JAVA (within MATLAB) or R (R-project) is available from http://www.exploredata.net/Usage-instructions

A simple "greedy" method, avoiding the combinatorial explosion

For reconstructed signals q_i and ground truth s_j, let NxN matrix

M[i,j] = abs(corr(q_i,s_j))

with "corr()" some correlation function giving results in [-1,1], e.g. pearson moment-product (MATLAB: corrcoef() ) or spearman rank-corrrelation.

Step 1) Find (i,j) = argmax(l,m) M[l,m]. This is a pair of reconstructed index i,j. Push it on a list. This is the best matching pair of what s currently eligible.

Step 2) Blank out row i and column j with NaN s or something.

Step 3) If you haven t done N of them yet, (matrix is not all NaN s) go to Step 1.

(equivalently you can delete a row & column, remembering the original indices).

For all the pairs in the list, average the original M[i,j]. This is the average correlation (absolute valued) of the pairs you found with the greedy algorithm .

If you don t know which channels are correlated, I assume you want to separate them out and compare them individually, and that this is what the question is about. (I don t know of an algorithm that can do it all in some exceptionally efficient way -- I ll just address the problem of the scale factor)

If there is an arbitrary (nonzero?) scale factor involved, you can take the dot product of the two vectors, and divide by the product of their norms. The result will be in the range of [-1,1], and you can choose the match with the largest absolute value.

You re suffering from combinatoric explosion. Ordering issue aside: I d suggest a base line removal and then use cross-corrlation to find the best (s) and least (-s) correlation signals. The flipping of the signal is thus resolved.

2D cross correlation is offered in many image processing libraries which could be used to determined quickly which columns are ordered properly - those with high (s) and low (-s) correlations. The columns in between need re-ordering.

If you view your matrix as an image, there are many image processing algorithms that can measure the similarity of two images that are invariant across various transforms (shear, scale,...). However your column mixing is a odd nonlinear transform, which doesn t fit the image processing paradigm well.





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签