English 中文(简体)
Running average in Python
原标题:

Is there a pythonic way to build up a list that contains a running average of some function?

After reading a fun little piece about Martians, black boxes, and the Cauchy distribution, I thought it would be fun to calculate a running average of the Cauchy distribution myself:

import math 
import random

def cauchy(location, scale):
    p = 0.0
    while p == 0.0:
        p = random.random()
    return location + scale*math.tan(math.pi*(p - 0.5))

# is this next block of code a good way to populate running_avg?
sum = 0
count = 0
max = 10
running_avg = []
while count < max:
    num = cauchy(3,1)
    sum += num
    count += 1
    running_avg.append(sum/count)

print running_avg     # or do something else with it, besides printing

I think that this approach works, but I m curious if there might be a more elegant approach to building up that running_avg list than using loops and counters (e.g. list comprehensions).

There are some related questions, but they address more complicated problems (small window size, exponential weighting) or aren t specific to Python:

最佳回答

You could write a generator:

def running_average():
  sum = 0
  count = 0
  while True:
    sum += cauchy(3,1)
    count += 1
    yield sum/count

Or, given a generator for Cauchy numbers and a utility function for a running sum generator, you can have a neat generator expression:

# Cauchy numbers generator
def cauchy_numbers():
  while True:
    yield cauchy(3,1)

# running sum utility function
def running_sum(iterable):
  sum = 0
  for x in iterable:
    sum += x
    yield sum

# Running averages generator expression (** the neat part **)
running_avgs = (sum/(i+1) for (i,sum) in enumerate(running_sum(cauchy_numbers())))

# goes on forever
for avg in running_avgs:
  print avg

# alternatively, take just the first 10
import itertools
for avg in itertools.islice(running_avgs, 10):
  print avg
问题回答

You could use coroutines. They are similar to generators, but allows you to send in values. Coroutines was added in Python 2.5, so this won t work in versions before that.

def running_average():
    sum = 0.0
    count = 0
    value = yield(float( nan ))
    while True:
        sum += value
        count += 1
        value = yield(sum/count)

ravg = running_average()
next(ravg)   # advance the corutine to the first yield

for i in xrange(10):
    avg = ravg.send(cauchy(3,1))
    print  Running average: %.6f  % (avg,)

As a list comprehension:

ravg = running_average()
next(ravg)
ravg_list = [ravg.send(cauchy(3,1)) for i in xrange(10)]

Edits:

  • Using the next() function instead of the it.next() method. This is so it also will work with Python 3. The next() function has also been back-ported to Python 2.6+.
    In Python 2.5, you can either replace the calls with it.next(), or define a next function yourself.
    (Thanks Adam Parkin)

I ve got two possible solutions here for you. Both are just generic running average functions that work on any list of numbers. (could be made to work with any iterable)

Generator based:

nums = [cauchy(3,1) for x in xrange(10)]

def running_avg(numbers):
    for count in xrange(1, len(nums)+1):
        yield sum(numbers[:count])/count

print list(running_avg(nums))

List Comprehension based (really the same code as the earlier):

nums = [cauchy(3,1) for x in xrange(10)]

print [sum(nums[:count])/count for count in xrange(1, len(nums)+1)]

Generator-compatabile Generator based:

Edit: This one I just tested to see if I could make my solution compatible with generators easily and what it s performance would be. This is what I came up with.

def running_avg(numbers):
    sum = 0
    for count, number in enumerate(numbers):
        sum += number
        yield sum/(count+1)

See the performance stats below, well worth it.

Performance characteristics:

Edit: I also decided to test Orip s interesting use of multiple generators to see the impact on performance.

Using timeit and the following (1,000,000 iterations 3 times):

print "Generator based:",  ,  .join(str(x) for x in Timer( list(running_avg(nums)) ,  from __main__ import nums, running_avg ).repeat())
print "LC based:",  ,  .join(str(x) for x in Timer( [sum(nums[:count])/count for count in xrange(1, len(nums)+1)] ,  from __main__ import nums ).repeat())
print "Orip s:",  ,  .join(str(x) for x in Timer( list(itertools.islice(running_avgs, 10)) ,  from __main__ import itertools, running_avgs ).repeat())

print "Generator-compatabile Generator based:",  ,  .join(str(x) for x in Timer( list(running_avg(nums)) ,  from __main__ import nums, running_avg ).repeat())

I get the following results:

Generator based: 17.653908968, 17.8027219772, 18.0342400074
LC based: 14.3925321102, 14.4613749981, 14.4277560711
Orip s: 30.8035550117, 30.3142540455, 30.5146529675

Generator-compatabile Generator based: 3.55352187157, 3.54164409637, 3.59098005295

See comments for code:

Orip s genEx based: 4.31488609314, 4.29926609993, 4.30518198013 

Results are in seconds, and show the LC new generator-compatible generator method to be consistently faster, your results may vary though. I expect the massive difference between my original generator and the new one is the fact that the sum isn t calculated on the fly.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签