English 中文(简体)
概率逻辑统计
原标题:probability logic statistics

I am not sure whether this is the right place to ask this question. As this is more like a logic question.. but hey no harm in asking. Suppose I have a huge list of data (customers) and they all have a data_id Now I want to select lets say split the data in ratio lets say 10:90 split. Now rather than stating a condition that (example)

the sum of digits is even...go to bin 1
the sum of digits is odd.. go to bin 2
or sum of last three digits are x then go to bin 1
sum of last three digits is not x then go to bin 2

现在这可能导致数据收集不均,有时可能找到数据。

Is there a way (probabilistically speaking) which says.. sample size is always greater than x%

增 编

最佳回答

您希望通过统一分发的特征来分门别类。 哈希斯的职能是拥有这一财产......这样,如果你计算出客户身份的散列,然后按先分位分位分出2^n bins,每个双亲的物品数量大致相同。 (你然后可以选择90%的双目,以获得90%的数据。) 希望这一帮助。

问题回答

暂无回答




相关问题
How to manage a pageview DB

I am interested in tracking my users pageviews on my site. Being that traffic is expanding very quickly, I am worried about robots, etc, and I also want to be able to use tracked data live to alter ...

Statistics Question

Suppose I conduct a survey of 10 people asking whether to rank a movie as 0 to 4 stars. Allowable answers are 0, 1, 2, 3, and 4. The mean is 2.0 stars. How do I calculate the certainty (or ...

Calculating variance with large numbers

I haven t really used variance calculation that much, and I don t know quite what to expect. Actually I m not too good with math at all. I have a an array of 1000000 random numeric values in the ...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array ...

Multivariate time series modelling in R

I want do fit some sort of multi-variate time series model using R. Here is a sample of my data: u cci bci cpi gdp dum1 dum2 dum3 dx 16.50 14.00 53.00 45.70 80....

热门标签