Question

I m trying to batch up a bunch of vertices and texture coords in an interleaved array before sending it to pyOpengl s glInterleavedArrays/glDrawArrays. The only problem is that I m unable to find a suitably fast enough way to append data into a numpy array.

Is there a better way to do this? I would have thought it would be quicker to preallocate the array and then fill it with data but instead, generating a python list and converting it to a numpy array is "faster". Although 15ms for 4096 quads seems slow.

I have included some example code and their timings.

#!/usr/bin/python

import timeit
import numpy
import ctypes
import random

USE_RANDOM=True
USE_STATIC_BUFFER=True

STATIC_BUFFER = numpy.empty(4096*20, dtype=numpy.float32)

def render(i):
    # pretend these are different each time
    if USE_RANDOM:
        tex_left, tex_right, tex_top, tex_bottom = random.random(), random.random(), random.random(), random.random()
        left, right, top, bottom = random.random(), random.random(), random.random(), random.random()
    else:
        tex_left, tex_right, tex_top, tex_bottom = 0.0, 1.0, 1.0, 0.0
        left, right, top, bottom = -1.0, 1.0, 1.0, -1.0

    ibuffer = (
        tex_left, tex_bottom,   left, bottom, 0.0,  # Lower left corner
        tex_right, tex_bottom,  right, bottom, 0.0, # Lower right corner
        tex_right, tex_top,     right, top, 0.0,    # Upper right corner
        tex_left, tex_top,      left, top, 0.0,     # upper left
    )

    return ibuffer



# create python list.. convert to numpy array at end
def create_array_1():
    ibuffer = []
    for x in xrange(4096):
        data = render(x)
        ibuffer += data

    ibuffer = numpy.array(ibuffer, dtype=numpy.float32)
    return ibuffer

# numpy.array, placing individually by index
def create_array_2():
    if USE_STATIC_BUFFER:
        ibuffer = STATIC_BUFFER
    else:
        ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
    index = 0
    for x in xrange(4096):
        data = render(x)
        for v in data:
            ibuffer[index] = v
            index += 1
    return ibuffer

# using slicing
def create_array_3():
    if USE_STATIC_BUFFER:
        ibuffer = STATIC_BUFFER
    else:
        ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
    index = 0
    for x in xrange(4096):
        data = render(x)
        ibuffer[index:index+20] = data
        index += 20
    return ibuffer

# using numpy.concat on a list of ibuffers
def create_array_4():
    ibuffer_concat = []
    for x in xrange(4096):
        data = render(x)
        # converting makes a diff!
        data = numpy.array(data, dtype=numpy.float32)
        ibuffer_concat.append(data)
    return numpy.concatenate(ibuffer_concat)

# using numpy array.put
def create_array_5():
    if USE_STATIC_BUFFER:
        ibuffer = STATIC_BUFFER
    else:
        ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
    index = 0
    for x in xrange(4096):
        data = render(x)
        ibuffer.put( xrange(index, index+20), data)
        index += 20
    return ibuffer

# using ctype array
CTYPES_ARRAY = ctypes.c_float*(4096*20)
def create_array_6():
    ibuffer = []
    for x in xrange(4096):
        data = render(x)
        ibuffer += data
    ibuffer = CTYPES_ARRAY(*ibuffer)
    return ibuffer

def equals(a, b):

    for i,v in enumerate(a):
        if b[i] != v:
            return False
    return True



if __name__ == "__main__":
    number = 100

    # if random, don t try and compare arrays
    if not USE_RANDOM and not USE_STATIC_BUFFER:
        a =  create_array_1()
        assert equals( a, create_array_2() )
        assert equals( a, create_array_3() )
        assert equals( a, create_array_4() )
        assert equals( a, create_array_5() )
        assert equals( a, create_array_6() )

    t = timeit.Timer( "testing2.create_array_1()", "import testing2" )
    print  from list: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_2()", "import testing2" )
    print  array: indexed: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_3()", "import testing2" )
    print  array: slicing: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_4()", "import testing2" )
    print  array: concat: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_5()", "import testing2" )
    print  array: put: , t.timeit(number)/number*1000.0,  ms 

    t = timeit.Timer( "testing2.create_array_6()", "import testing2" )
    print  ctypes float array: , t.timeit(number)/number*1000.0,  ms

Timings using random numbers:

$ python testing2.py
from list: 15.0486779213 ms
array: indexed: 24.8184704781 ms
array: slicing: 50.2214789391 ms
array: concat: 44.1691994667 ms
array: put: 73.5879898071 ms
ctypes float array: 20.6674289703 ms

edit note: changed code to produce random numbers for each render to reduce object reuse and to simulate different vertices each time.

edit note2: added static buffer and force all numpy.empty() to use dtype=float32

note 1/Apr/2010: still no progress and I don t really feel that any of the answers have solved the problem yet.

Answer 1

The reason that create_array_1 is so much faster seems to be that the items in the (python) list all point to the same object. You can see this if you test:

print (ibuffer[0] is ibuffer[1])

inside the subroutines. In create_array_1 this is true (before you create the numpy array), while in create_array_2 this is always going to be false. I guess this means that data conversion step in the array conversion only has to happen once in create_array_1, while it happens 4096 times in create_array_2.

If this is the reason, I guess the timings will be different if you make render generate random data. Create_array_5 is slowest as it makes a new array each time you add data to the end.

Answer 2

The benefit of numpy is not realized by simply storing the data in an array, it is achieved by performing operations across many elements in an array instead of one by one. Your example can be boiled down and optimized to this trivial solution with orders of magnitude speedup:

numpy.random.standard_normal(4096*20)

...that s not very helpful, but it does kind of hint at where the costs are.

Here is an incremental improvement that beats the list append solution (but only slightly) by eliminating the iteration over 4096 elements.

xs = numpy.arange(4096)
render2 = numpy.vectorize(render)

def create_array_7():
    ibuffer = STATIC_BUFFER
    for i, a in enumerate(render2(xs)):
        ibuffer[i::20] = a
    return ibuffer

... but not the speedup we are looking for.

The real savings will be realized by a recasting of the render routine so that you don t have to create a python object for every value that ends up being placed in the buffer. Where does tex_left, tex_right...etc. come from? Are they calculated or read?

Answer 3

I know it seems strange, but have you tried fromfile?

友情链接