Column-Slicing in PyCUDA
PyCUDA’s GPUArray
class is designed to be an analogue to the Numpy ndarray
, but while
PyCUDA is still under heavy development it is still missing some
crucial functionality that would make it a real drop-in replacement
for Numpy. Most importantly, slicing of GPUArrays has only been
recently implemented in version 2013.1
, but unfortunately, the
implementation is a little lacking in that it seems to implement
general slicing, but actually doesn’t. There is also potential for a
quite subtle bug that may alter the content of your array if you don’t
pay attention.
First, let’s initialize our GPU and create a GPUArray to work on:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
If we take a column-slice of this array, the returned slice is no longer a contiguous block of memory:
1 2 3 |
|
Unfortunately, most of the operations in GPUArray
are not
implemented for non-contiguous arrays, so using the slicing operator
to get a column slice actually doesn’t have much utility yet. However,
what’s worse, if we get a new view on the non-contiguous array, the
flag signaling that the array is non-contiguous is discarded and the
view treats the memory as contiguous:
1 2 3 4 5 6 7 8 9 10 11 |
|
Ouch, that’s not what wanted. The GPUArray.view()
function does not
remember the actual memory layout of Y
and therefore the data in all
rows after the first is wrong.
To work around this, you can use the pycuda.driver.Memcpy2D
function
to copy the data to a new contiguous array. Here is a function that
creates a new GPUArray and performs a memory copy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Now we can use the function as follows:
1 2 3 |
|