Column-Slicing in PyCUDA
PyCUDA’s GPUArray
class is designed to be an analogue to the Numpy ndarray, but while
PyCUDA is still under heavy development it is still missing some
crucial functionality that would make it a real drop-in replacement
for Numpy. Most importantly, slicing of GPUArrays has only been
recently implemented in version 2013.1, but unfortunately, the
implementation is a little lacking in that it seems to implement
general slicing, but actually doesn’t. There is also potential for a
quite subtle bug that may alter the content of your array if you don’t
pay attention.
First, let’s initialize our GPU and create a GPUArray to work on:
1 2 3 4 5 6 7 8 9 10 11 12 | |
If we take a column-slice of this array, the returned slice is no longer a contiguous block of memory:
1 2 3 | |
Unfortunately, most of the operations in GPUArray are not
implemented for non-contiguous arrays, so using the slicing operator
to get a column slice actually doesn’t have much utility yet. However,
what’s worse, if we get a new view on the non-contiguous array, the
flag signaling that the array is non-contiguous is discarded and the
view treats the memory as contiguous:
1 2 3 4 5 6 7 8 9 10 11 | |
Ouch, that’s not what wanted. The GPUArray.view() function does not
remember the actual memory layout of Y and therefore the data in all
rows after the first is wrong.
To work around this, you can use the pycuda.driver.Memcpy2D function
to copy the data to a new contiguous array. Here is a function that
creates a new GPUArray and performs a memory copy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Now we can use the function as follows:
1 2 3 | |