English 中文(简体)
How to efficiently find correlation and discard points outside 3-sigma range in MATLAB?
原标题:

I have a data file m.txt that looks something like this (with a lot more points):

286.842995
3.444398
3.707202
338.227797
3.597597
283.740414
3.514729
3.512116
3.744235
3.365461
3.384880

Some of the values (like 338.227797) are very different from the values I generally expect (smaller numbers).

  • So, I am thinking that I will remove all the points that lie outside the 3-sigma range. How can I do that in MATLAB?

  • Also, the bigger problem is that this file has a separate file t.txt associated with it which stores the corresponding time values for these numbers. So, I ll have to remove the corresponding time values from the t.txt file also.

I am still learning MATLAB, and I know there would be some good way of doing this (better than storing indices of the elements that were removed from m.txt and then removing those elements from the t.txt file)

最佳回答

@Amro is close, but the FIND is unnecessary (look up logical subscripting) and you need to include the mean for a true +/-3 sigma range. I would go with the following:

%# load files 
m = load( m.txt ); 
t = load( t.txt ); 

%# find values within range
z = 3;
meanM = mean(m);
sigmaM = std(m);
I = abs(m - meanM) <= z * sigmaM;

%# keep values within range
m = m(I);
t = t(I); 
问题回答
%# load files
m = load( m.txt );
t = load( t.txt );

%# find outliers indices
z = 3;
idx = find( abs(m-mean(m)) > z*std(m) );

%# remove them from both data and time values
m(idx) = [];
t(idx) = [];




相关问题
How to manage a pageview DB

I am interested in tracking my users pageviews on my site. Being that traffic is expanding very quickly, I am worried about robots, etc, and I also want to be able to use tracked data live to alter ...

Statistics Question

Suppose I conduct a survey of 10 people asking whether to rank a movie as 0 to 4 stars. Allowable answers are 0, 1, 2, 3, and 4. The mean is 2.0 stars. How do I calculate the certainty (or ...

Calculating variance with large numbers

I haven t really used variance calculation that much, and I don t know quite what to expect. Actually I m not too good with math at all. I have a an array of 1000000 random numeric values in the ...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array ...

Multivariate time series modelling in R

I want do fit some sort of multi-variate time series model using R. Here is a sample of my data: u cci bci cpi gdp dum1 dum2 dum3 dx 16.50 14.00 53.00 45.70 80....

热门标签