English 中文(简体)
Building an interleaved buffer for pyopengl and numpy
原标题:

I m trying to batch up a bunch of vertices and texture coords in an interleaved array before sending it to pyOpengl s glInterleavedArrays/glDrawArrays. The only problem is that I m unable to find a suitably fast enough way to append data into a numpy array.

Is there a better way to do this? I would have thought it would be quicker to preallocate the array and then fill it with data but instead, generating a python list and converting it to a numpy array is "faster". Although 15ms for 4096 quads seems slow.

I have included some example code and their timings.

#!/usr/bin/python

import timeit
import numpy
import ctypes
import random

USE_RANDOM=True
USE_STATIC_BUFFER=True

STATIC_BUFFER = numpy.empty(4096*20, dtype=numpy.float32)

def render(i):
    # pretend these are different each time
    if USE_RANDOM:
        tex_left, tex_right, tex_top, tex_bottom = random.random(), random.random(), random.random(), random.random()
        left, right, top, bottom = random.random(), random.random(), random.random(), random.random()
    else:
        tex_left, tex_right, tex_top, tex_bottom = 0.0, 1.0, 1.0, 0.0
        left, right, top, bottom = -1.0, 1.0, 1.0, -1.0

    ibuffer = (
        tex_left, tex_bottom,   left, bottom, 0.0,  # Lower left corner
        tex_right, tex_bottom,  right, bottom, 0.0, # Lower right corner
        tex_right, tex_top,     right, top, 0.0,    # Upper right corner
        tex_left, tex_top,      left, top, 0.0,     # upper left
    )

    return ibuffer



# create python list.. convert to numpy array at end
def create_array_1():
    ibuffer = []
    for x in xrange(4096):
        data = render(x)
        ibuffer += data

    ibuffer = numpy.array(ibuffer, dtype=numpy.float32)
    return ibuffer

# numpy.array, placing individually by index
def create_array_2():
    if USE_STATIC_BUFFER:
        ibuffer = STATIC_BUFFER
    else:
        ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
    index = 0
    for x in xrange(4096):
        data = render(x)
        for v in data:
            ibuffer[index] = v
            index += 1
    return ibuffer

# using slicing
def create_array_3():
    if USE_STATIC_BUFFER:
        ibuffer = STATIC_BUFFER
    else:
        ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
    index = 0
    for x in xrange(4096):
        data = render(x)
        ibuffer[index:index+20] = data
        index += 20
    return ibuffer

# using numpy.concat on a list of ibuffers
def create_array_4():
    ibuffer_concat = []
    for x in xrange(4096):
        data = render(x)
        # converting makes a diff!
        data = numpy.array(data, dtype=numpy.float32)
        ibuffer_concat.append(data)
    return numpy.concatenate(ibuffer_concat)

# using numpy array.put
def create_array_5():
    if USE_STATIC_BUFFER:
        ibuffer = STATIC_BUFFER
    else:
        ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
    index = 0
    for x in xrange(4096):
        data = render(x)
        ibuffer.put( xrange(index, index+20), data)
        index += 20
    return ibuffer

# using ctype array
CTYPES_ARRAY = ctypes.c_float*(4096*20)
def create_array_6():
    ibuffer = []
    for x in xrange(4096):
        data = render(x)
        ibuffer += data
    ibuffer = CTYPES_ARRAY(*ibuffer)
    return ibuffer

def equals(a, b):

    for i,v in enumerate(a):
        if b[i] != v:
            return False
    return True



if __name__ == "__main__":
    number = 100

    # if random, don t try and compare arrays
    if not USE_RANDOM and not USE_STATIC_BUFFER:
        a =  create_array_1()
        assert equals( a, create_array_2() )
        assert equals( a, create_array_3() )
        assert equals( a, create_array_4() )
        assert equals( a, create_array_5() )
        assert equals( a, create_array_6() )

    t = timeit.Timer( "testing2.create_array_1()", "import testing2" )
    print  from list: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_2()", "import testing2" )
    print  array: indexed: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_3()", "import testing2" )
    print  array: slicing: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_4()", "import testing2" )
    print  array: concat: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_5()", "import testing2" )
    print  array: put: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_6()", "import testing2" )
    print  ctypes float array: , t.timeit(number)/number*1000.0,  ms 

Timings using random numbers:

$ python testing2.py
from list: 15.0486779213 ms
array: indexed: 24.8184704781 ms
array: slicing: 50.2214789391 ms
array: concat: 44.1691994667 ms
array: put: 73.5879898071 ms
ctypes float array: 20.6674289703 ms

edit note: changed code to produce random numbers for each render to reduce object reuse and to simulate different vertices each time.

edit note2: added static buffer and force all numpy.empty() to use dtype=float32

note 1/Apr/2010: still no progress and I don t really feel that any of the answers have solved the problem yet.

问题回答

The reason that create_array_1 is so much faster seems to be that the items in the (python) list all point to the same object. You can see this if you test:

print (ibuffer[0] is ibuffer[1])

inside the subroutines. In create_array_1 this is true (before you create the numpy array), while in create_array_2 this is always going to be false. I guess this means that data conversion step in the array conversion only has to happen once in create_array_1, while it happens 4096 times in create_array_2.

If this is the reason, I guess the timings will be different if you make render generate random data. Create_array_5 is slowest as it makes a new array each time you add data to the end.

The benefit of numpy is not realized by simply storing the data in an array, it is achieved by performing operations across many elements in an array instead of one by one. Your example can be boiled down and optimized to this trivial solution with orders of magnitude speedup:

numpy.random.standard_normal(4096*20)

...that s not very helpful, but it does kind of hint at where the costs are.

Here is an incremental improvement that beats the list append solution (but only slightly) by eliminating the iteration over 4096 elements.

xs = numpy.arange(4096)
render2 = numpy.vectorize(render)

def create_array_7():
    ibuffer = STATIC_BUFFER
    for i, a in enumerate(render2(xs)):
        ibuffer[i::20] = a
    return ibuffer

... but not the speedup we are looking for.

The real savings will be realized by a recasting of the render routine so that you don t have to create a python object for every value that ends up being placed in the buffer. Where does tex_left, tex_right...etc. come from? Are they calculated or read?

I know it seems strange, but have you tried fromfile?





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签