English 中文(简体)
Generating correlated numbers
原标题:

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.

I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.

Any ideas?

--

here is the AS3 code for random correlated gaussians, to get the wheels turning:

public static function nextCorrelatedGaussians(r:Number):Array{             
         var d1:Number;
         var d2:Number;
         var n1:Number;
         var n2:Number;
         var lambda:Number;
         var r:Number;
         var arr:Array = new Array();
         var isNeg:Boolean; 

        if (r<0){
            r *= -1;
              isNeg=true;
        }            
        lambda= (   (r*r)  -  Math.sqrt(  (r*r) - (r*r*r*r)  )     )   /   ((  2*r*r ) - 1  );

        n1 = nextGaussian();
        n2 = nextGaussian();           
        d1 = n1;            
        d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));

        if (isNeg) {d2*= -1}           
        arr.push(d1);
        arr.push(d2);
        return arr;
    }
最佳回答

I ended up writing a short paper on this

It doesn t include your sorting method (although in practice I think it s similar to my first method, in a roundabout way), but does describe two ways that don t require iteration.

问题回答

Here is an implementation of of twolfe18 s algorithm written in Actionscript 3:

for (var j:int=0; j < size; j++) {
 xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;

for (var i:int=0; i < size; i++) {
 yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}

boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev). size is the size of the distribution.

Sample output

Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158

As you can see I m still a ways off. Any suggestions?

This apparently simple question has been messing up with my mind since yesterday evening! I looked for the topic of simulating distributions with a dependency, and the best I found is this: simulate dependent random variables. The gist of it is, you can easily simulate 2 normals with given correlation, and they outline a method to transform these non-independent normals, but this won t preserve correlation. The correlation of the transform will be correlated, so to speak, but not identical. See the paragraph "Rank correlation coefficents".

Edit: from what I gather from the second part of the article, the copula method would allow you to simulate / generate random variables with rank correlation.

start with the model y = x + e where e is the error (a normal random variable). e should have a mean of 0 and variance k.

long story short, you can write a formula for the expected value of the Pearson in terms of k, and solve for k. note, you cannot randomly generate data with the Pearson exactly equal to a specific value, only with the expected Pearson of a specific value.

i ll try to come back and edit this post to include a closed form solution when i have access to some paper.

EDIT: ok, i have a hand-wavy solution that is probably correct (but will require testing to confirm). for now, assume desired Pearson = p > 0 (you can figure out the p < 0 case). like i mentioned earlier, set your model for Y = X + E (X is uniform, E is normal).

  1. sample to get your x s
  2. compute var(x)
  3. the variance of E should be: (1/(rsd(x)))^2 - var(x)
  4. generate your y s based on your x s and sample from your normal random variable E

for p < 0, set Y = -X + E. proceed accordingly.

basically, this follows from the definition of Pearson: cov(x,y)/var(x)*var(y). when you add noise to the x s (Y = X + E), the expected covariance cov(x,y) should not change from that with no noise. the var(x) does not change. the var(y) is the sum of var(x) and var(e), hence my solution.

SECOND EDIT: ok, i need to read definitions better. the definition of Pearson is cov(x, y)/(sd(x)sd(y)). from that, i think the true value of var(E) should be (1/(rsd(x)))^2 - var(x). see if that works.

To get a correlation of 1 both X and Y should be the same, so copy X to Y and you have a correlation of 1. To get a -1 correlation, make Y = 1 - X. (assuming X values are [0,1])

A strange problem demands a strange solution -- here is how I solved it.

-Generate array X

-Clone array X to Create array Y

-Sort array X (you can use whatever method you want to sort array X -- quicksort, heapsort anything stable.)

-Measure the starting level of pearson s R with array X sorted and array Y unsorted.

WHILE the correlation is outside of the range you are hoping for

   IF the correlation is to low
         run one iteration of CombSort11 on array Y then recheck correlation
   ELSE IF the correlation is too high
         randomly swap two values and recheck correlation

And thats it! Combsort is the real key, it has the effect of increasing the correlation slowly and steadily. Check out Jason Harrison s demo to see what I mean. To get a negative correlation you can invert the sort or invert one of the arrays after the whole process is complete.

Here is my implementation in AS3:

public static function nextReliableCorrelatedUniforms(r:Number, size:int, error:Number):Array {
        var yValues:Array = new Array;
        var xValues:Array = new Array;
        var coVar:Number = 0;
        for (var e:int=0; e < size; e++) { //create x values            
            xValues.push(Math.random());
    }
    yValues = xValues.concat();
    if(r != 1.0){
        xValues.sort(Array.NUMERIC);
    }
    var trueR:Number = Util.getPearson(xValues, yValues);

        while(Math.abs(trueR-r)>error){
            if (trueR < r-error){   // combsort11 for y     
                var gap:int = yValues.length;
                var swapped:Boolean = true; 
                while (trueR <= r-error) {
                    if (gap > 1) {
                        gap = Math.round(gap / 1.3);
                    }
                    var i:int = 0;
                    swapped = false;
                    while (i + gap < yValues.length  &&  trueR <= r-error) {
                        if (yValues[i] > yValues[i + gap]) {
                            var t:Number = yValues[i];
                            yValues[i] = yValues[i + gap];
                            yValues[i + gap] = t;
                            trueR = Util.getPearson(xValues, yValues)
                            swapped = true;
                        }
                        i++;
                    }
                }
            }

            else { // decorrelate
                while (trueR >= r+error) {
                    var a:int = Random.randomUniformIntegerBetween(0, size-1);
                    var b:int = Random.randomUniformIntegerBetween(0, size-1);
                    var temp:Number = yValues[a];
                    yValues[a] = yValues[b];
                    yValues[b] = temp;
                    trueR = Util.getPearson(xValues, yValues)
               }                
            }
        }
        var correlates:Array = new Array;
        for (var h:int=0; h < size; h++) {
            var pair:Array = new Array(xValues[h], yValues[h]);
            correlates.push(pair);}
        return correlates;
    }




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签