The real goal here is to find the quantile means (or sums, or median, etc.) in Python. Since I m not a power user of Python but have used R for a while, my chosen route is via Rpy. However, I ran into the problem that the returned list of means are not correspondent to the order of the quantiles. In particular, I have the followings in R:
> a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> b = c(2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000)
> prob = seq(0,5)/5
> br = quantile(a,prob)
> rcut = cut(a, br, include.lowest = TRUE)
> quintile_means = tapply(b, rcut, mean)
> quintile_means
[1,2.8] (2.8,4.6] (4.6,6.4] (6.4,8.2] (8.2,10]
3 30 300 3000 30000
which is all very good. However, if I translate the code into Rpy, I got
>>> import rpy
>>> from rpy import r
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = [2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000]
>>> prob = [ x / 5.0 for x in range(6)]
>>> br = r.quantile(a, prob)
>>> rcut = r.cut(a, br, include_lowest=r.TRUE)
>>> quintile_means = r.tapply(b, rcut, r.mean)
>>> print quintile_means
[30.0, 300.0, 3000.0, 30000.0, 3.0]
Note the final list is mis-ordered (we know it because a
and b
are both ordered in this case). In general, I just have no way to recover the correct order from the lowest to highest quantile in Rpy. Any suggestions?
In addition (not in substitution, as I d like to know the answer to the above question), if you can suggest a way to directly perform the analysis in python, that will be great too. (I don t have numpy or scipy installed.) Thx!
EDIT: To clarify, a
and b
are paired but not necessarily ordered. For example, a
is the size of eyes and b
is the size of nose. I m trying to find out that in the various quantiles of a
, what are the means of the correspondent b
s. Thanks.