ipython notebooks for literate modeling

In my previous post, I complained a bit about ipython notebooks not being the solution to mixing metadata, pre- or post-processor directives, and code. But, I mentioned an intriguing snippet of code that I would look into, and indeed this has led to some progress for me in picturing the use of ipython notebooks for literate modeling.

Extensions

YAML_magic is an ipython notebook extension by Nick Bollweg that enables raw YAML to be interpreted in a code cell when prefixed by %%yaml. This extension idea opened my eyes to simple ways in which an extension can enable literate modeling. For one thing, once I understood what to do with the file (no documentation was provided in the original Gist), I realized how easy it is to write custom extensions to hook into other kernel processes that could help me achieve my ideal exploratory modeling setup.

I also want to auto-load my extensions and modules for my notebooks but not for my regular ipython sessions (partly because those loads create warnings and pollute my default namespace). Autoreload is a built-in extension that simply reloads any module dependencies if they are edited outside of the session.

But first, the instructions. Write these config options (based on uncommenting the appropriate lines in the config file) into .ipython/profile_default/ipython_kernel_config.py, which is probably in your home directory:

c.InteractiveShellApp.extensions = ['yaml_magic', 'autoreload']

You can make a copy of an existing config file in that directory if the file doesn’t already exist, and just make sure only kernel-focused config options are uncommented. The precedence rule is for ipython to start by interpreting ipython_config.py and then the individual application components read their specific config file and overwrite any overlapping options.

Using the extensions

After some configuration fuss, I have ‘autoreload’ and ‘yaml_magic’ working and the results are quite inspirational. Autoreload is great for exploratory workflows when you edit module dependencies while still working in a dependent module/session.

yaml_magic is not fully documented (at the time of writing, but see comments below about the new repo and PyPI uploads!) but allows you to write YAML directly into a code cell, prefix it with %%yaml and have the YAML interpreted into a dictionary (as one might expect). Also, adding a name after the directive assigns this dictionary to a variable with that name in the local scope. For instance, I defined the cell:

%%yaml x
studycontext-metadata:
ID: 3
tag: initial visualization of linearized model's performance

studycontext-header:
tag: SC_imports

which displays nothing. But introspecting x yields the dictionary

{'studycontext-header': {'tag': 'SC_imports'},
 'studycontext-metadata': {'ID': 3,
  'tag': "initial visualization of linearized model's performance"}}

I see from the Gist that getting behavior like this is as simple as adding a couple of decorators to a class that invokes a little Javascript. Based on this example, I can see a way to implement a backend project management system that I alluded to in my previous post.

Importing from other notebooks

A big issue for me is that my modeling work involves multiple files and the continued editing of common library files in the local project. If I’m going to adopt ipython notebooks as the basis for my workflow then I need to make most, or all, of these files into .ipynb notebook files so that they can all be marked up and processed similarly. This would normally prohibit importing them directly into other notebooks when they are needed.

However, I found this snippet, which I turned into a module I named notebook_importing.py. I placed it in a local library folder on my python path for safe keeping. I then edited my kernel config file’s option to execute commands on launch:

c.InteractiveShellApp.exec_lines = [
    'import notebook_importing'
]

This ensures that ‘import’ is overloaded to check for the ipynb file extension and pre-process it appropriately.

Debugging

One of the most useful things I can do in my IDE is to graphically set breakpoints and trace bugs using the live interpreter. There is the opportunity to interact post-mortem after a bug, using %debug in the next cell after the traceback. This works well enough for that aspect of tracing, at least.

Remaining issues

I can’t easily see contents of a notebook-based script once outside of the notebook. I’m not always online to use nbviewer, and a local installation is not the easiest thing to do for an end user (it’s not packaged, per se). This approach requires docker but looks relatively simple. Anyway, this is an inconvenient extra step to quickly get a view of the code. (Addendum: Jupyter notebooks can now export to .py easily from the File menu)
There are no line numbers for reference, and if I can’t edit natively in my IDE then I can’t use breakpoints so easily.
Doing a diff on ipynb-formatted files makes it difficult for the end user to track changes with git. For my literate modeling workflow idea, there is therefore a need to flatten the json to create a regular .py file before doing the diff – but to also include the markup somehow, in case of comment/tag changes in the YAML! (Addendum: see comments below!)

Nonetheless, with some caveats, I now see how I could use ipython notebooks for the interactive command line workflow along the same lines that I like to do in regular ipython.