English 中文(简体)
Statistics Question
原标题:
  • 时间:2009-11-13 03:28:27
  •  标签:
  • statistics

Suppose I conduct a survey of 10 people asking whether to rank a movie as 0 to 4 stars. Allowable answers are 0, 1, 2, 3, and 4.

The mean is 2.0 stars.

How do I calculate the certainty (or uncertainty) about this 2.0 star rating? Ideally, I would like a number between 0 and 1, where 0 represents complete uncertainty and 1 represents complete certainty.

It seems clear that the case where the 10 people choose ( 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 ) would be the most certain, while the case where the 10 people choose ( 0, 0, 0, 0, 0, 4, 4, 4, 4, 4 ) would be the least certain. ( 0, 1, 1, 2, 2, 2, 2, 3, 3, 4 ) would be somewhere in the middle.

最佳回答

The standard deviation does not have the properties requested. It is zero when everyone chooses the same answer, and can be as great as sqrt(40/9) = 2.11 when there are five 0s and five 4s.

I suggest you use 1-stdev(x)/sqrt(40/9) which will take value 1 when everyone agrees, and value 0 when there are five 0s and five 4s.

问题回答

The function you re after here is the standard deviation.

The standard deviations of your three examples are 0 (meaning no deviation), 2.1 (large deviation) and 1.15 (in between).

What you want is called the standard deviation.

You should consider whether or not the mean value is an appropriate statistic for this kind of information. ie Is a movie rated 2 stars twice as good as one rated 4 stars?

You may be better served by using a percentile measure (such as the median) to represent the central tendency, and a percentile range (such as the IQR) to measure certainty . As in the answers above, certainty would be greatest with a value of 0, as you are really making a measurement of deviation from the central tendency.

Incidentally, a survey of 10 people is too small to perform much in the way of meaningful statistical analysis.





相关问题
How to manage a pageview DB

I am interested in tracking my users pageviews on my site. Being that traffic is expanding very quickly, I am worried about robots, etc, and I also want to be able to use tracked data live to alter ...

Statistics Question

Suppose I conduct a survey of 10 people asking whether to rank a movie as 0 to 4 stars. Allowable answers are 0, 1, 2, 3, and 4. The mean is 2.0 stars. How do I calculate the certainty (or ...

Calculating variance with large numbers

I haven t really used variance calculation that much, and I don t know quite what to expect. Actually I m not too good with math at all. I have a an array of 1000000 random numeric values in the ...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array ...

Multivariate time series modelling in R

I want do fit some sort of multi-variate time series model using R. Here is a sample of my data: u cci bci cpi gdp dum1 dum2 dum3 dx 16.50 14.00 53.00 45.70 80....

热门标签