IPython Slideshows Will Change the Way You Work

If you haven’t been gripped by the IPython Notebook craze yet, let me quickly fill you in: IPython is a whole suite of tools whose goal it is to cover the whole scientific workflow from interactive data analysis to publication. It’s main focus, as is easy to guess, is Python, but they are ambitious about including other languages and already IPython can magically whizz data back and forth between Python, R, Octave, and soon, Julia.

Just this last Friday, after 11 years since the very first version of IPython and a breathtaking surge in popularity during the last year, the IPython team released version 1.0. This is a major milestone, mainly because of one feature, which many people might easily overlook: ipython nbconvert.

Let me explain: IPython Notebook is an interactive notebook that runs in your browser. It has a cell format, where each cell can either contain formatted text, or executable code. You can insert typeset mathematics, images, videos, arbitrary HTML, and pretty much everything you can imagine. Now, nbconvert will take a notebook and convert it to one of many output formats. So you can export to a static HTML page, a LaTeX document, Markdown, or even a slideshow running in your browser, enabled by Reveal.js. This may not sound like too much, but try to realize what this means: you now have an interactive programming environment that let’s you use the combined universes of all the Python, R, and Octave (Matlab) packages and use them for interactive data analysis. IPython Notebook keeps all your results, figures, and outputs in a single file, in a plain-text based format, which you can put under version-control, email, edit, and view on any platform. After your analysis is done you can run your notebook through a simple tool which will produce a publishable document in a myriad of formats. If you’re a scientist and you work with data, that’s just enormous.

Now one of the coolest new features are the Reveal.js based slideshows. Here is an example by the developer of the slideshow feature Damián Avila which shows how to turn any IPython Notebook into a slideshow and how to include math, images, videos, tables, etc. However, what’s slightly annoying about all the IPython Notebook formats, including the slideshow, is that there is no way to hide the code used to generate certain output. So, if you create a plot with matplotlib, your readers will always see the code you used to generate that figure. I don’t know why such a seemingly obvious feature hasn’t been implemented yet, but apparently there are some unspecified legal issues as hinted at by IPython developer Brian Granger.

In practice, your slides will look similar to this:

That’s great if you want to teach a class on Matplotlib, but quite often you’d want to hide that code cell.

Indeed, it is quite straight-forward to use a little bit of Javascript to simply hide the input cells in the browser when using the HTML or Reveal.js output. Here is a Javascript function (from Stack Overflow user Felix Kling) that hides any arbitrary class in a HTML document:

1
2
3
4
5
6
7
function hideElements(elements, start) {
    for(var i = 0, length = elements.length; i < length;i++) {
        if(i >= start) {
            elements[i].style.display = "none";
        }
    }
}

To hide the input cells we only need to hide the #input and optionally the #prompt classes. This is easily done by simply appending the following HTML to the output of nbconvert:

1
2
3
4
5
6
7
8
9
10
11
12
13
<script type="text/javascript">
function hideElements(elements, start) {
    for(var i = 0, length = elements.length; i < length;i++) {
        if(i >= start) {
            elements[i].style.display = "none";
        }
    }
}
var input_elements = document.getElementsByClassName('input');
hideElements(input_elements, 0);
var prompt_elements = document.getElementsByClassName('prompt');
hideElements(prompt_elements, 0);
</script>

Here’s the result on the slide from before:

Now, magically, you have a clean document, scrubbed of all code that you can go on to publish or present. Still, your code is still present in the document and the notebook, making your results always reproducable. The code is only hidden in the output.

To make this really simple, I created a small command line tool ipy_hide_input (download from gist). You can use the tool either in-place on a file, or on stdin like so:

1
2
3
4
ipy_hide_input input_file.slides.html # modify file in-place

# work on stdin
ipython nbconvert --to slids input.ipynb --stdout | ipy_hide_input > input.slides.html

Column-Slicing in PyCUDA

PyCUDA’s GPUArray class is designed to be an analogue to the Numpy ndarray, but while PyCUDA is still under heavy development it is still missing some crucial functionality that would make it a real drop-in replacement for Numpy. Most importantly, slicing of GPUArrays has only been recently implemented in version 2013.1, but unfortunately, the implementation is a little lacking in that it seems to implement general slicing, but actually doesn’t. There is also potential for a quite subtle bug that may alter the content of your array if you don’t pay attention.

First, let’s initialize our GPU and create a GPUArray to work on:

1
2
3
4
5
6
7
8
9
10
11
12
>>> import numpy as np
>>> import pycuda.autoinit
>>> from pycuda import gpuarray
>>> from pycuda.curandom import rand as curand

>>> float = np.float32
>>> height = 100
>>> width = 200
>>> X = curand((height, width), float)

>>> X.flags.c_contiguous               # New GPUArray is C-contiguous
True

If we take a column-slice of this array, the returned slice is no longer a contiguous block of memory:

1
2
3
>>> Y = X[:,:100]
>>> Y.flags.forc                       # Array is no longer contiguous
False

Unfortunately, most of the operations in GPUArray are not implemented for non-contiguous arrays, so using the slicing operator to get a column slice actually doesn’t have much utility yet. However, what’s worse, if we get a new view on the non-contiguous array, the flag signaling that the array is non-contiguous is discarded and the view treats the memory as contiguous:

1
2
3
4
5
6
7
8
9
10
11
>>> Y_view = Y.view()
>>> Y_view.flags.c_contiguous          # Magically, Y_view appears contiguous now
True
>>> Y_view.get() == X.get()[:,:100]       # compare to slicing on CPU
array([[ True,  True,  True, ...,  True,  True,  True],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]], dtype=bool)

Ouch, that’s not what wanted. The GPUArray.view() function does not remember the actual memory layout of Y and therefore the data in all rows after the first is wrong.

To work around this, you can use the pycuda.driver.Memcpy2D function to copy the data to a new contiguous array. Here is a function that creates a new GPUArray and performs a memory copy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def extract_columns(mat, start=0, stop=None):
    dtype = mat.dtype
    itemsize = np.dtype(dtype).itemsize
    N, M = mat.shape
    m = stop - start

    assert mat.flags.c_contiguous
    assert start >= 0 and start <= M and stop >= 0 and stop <= M and stop > start

    new_mat = gpuarray.empty((N, m), dtype)

    copy = drv.Memcpy2D()
    copy.set_src_device(mat.gpudata)
    copy.src_x_in_bytes = start * itemsize    # Offset of the first column in bytes
    copy.set_dst_device(new_mat.gpudata)
    copy.src_pitch = M * itemsize   # Width of a row in bytes in the source array
    copy.dst_pitch = copy.width_in_bytes = m * itemsize  # Width of sliced row
    copy.height = N
    copy(aligned=True)

    return new_mat

Now we can use the function as follows:

1
2
3
>>> Y = extract_columns(X, 0, 100)
>>> np.all(Y.get() == X.get()[:,:100]) # Indeed, we got the slice we wanted
True