In my previous post, I complained a bit about ipython notebooks not being the solution to mixing metadata, pre- or post-processor directives, and code. But, I mentioned an intriguing snippet of code that I would look into, and indeed this has led to some progress for me in picturing the use of ipython notebooks for literate modeling.
Extensions
YAML_magic is
an ipython notebook extension by Nick Bollweg that enables raw YAML to
be interpreted in a code cell when prefixed by %%yaml
. This
extension idea opened my eyes to simple ways in which an extension can
enable literate modeling. For one thing, once I understood what to do
with the file (no documentation was provided in the original Gist),
I realized how easy it is to write custom extensions to hook into other kernel processes that
could help me achieve my ideal exploratory modeling setup.
I also want to auto-load my extensions and modules for my notebooks but not for my regular ipython sessions (partly because those loads create warnings and pollute my default namespace). Autoreload is a built-in extension that simply reloads any module dependencies if they are edited outside of the session.
But first,
the instructions. Write these config options (based on uncommenting the appropriate
lines in the config file) into
.ipython/profile_default/ipython_kernel_config.py
, which is probably
in your home directory:
c.InteractiveShellApp.extensions = ['yaml_magic', 'autoreload']
You can make a copy of an existing config file in that directory if
the file doesn’t already exist, and just make sure only kernel-focused
config options are uncommented. The precedence rule is for ipython to
start by interpreting ipython_config.py
and then the individual
application components read their specific config file and overwrite
any overlapping options.
Using the extensions
After some configuration fuss, I have ‘autoreload’ and ‘yaml_magic’ working and the results are quite inspirational. Autoreload is great for exploratory workflows when you edit module dependencies while still working in a dependent module/session.
yaml_magic is not fully documented (at the time of writing, but see
comments below about the new repo
and PyPI uploads!) but allows you to write
YAML directly into a code cell, prefix it with %%yaml
and have the
YAML interpreted into a dictionary (as one might expect). Also, adding
a name after the directive assigns this dictionary to a variable with
that name in the local scope. For instance, I defined the cell:
%%yaml x
studycontext-metadata:
ID: 3
tag: initial visualization of linearized model's performance
studycontext-header:
tag: SC_imports
which displays nothing. But introspecting x
yields the dictionary
{'studycontext-header': {'tag': 'SC_imports'},
'studycontext-metadata': {'ID': 3,
'tag': "initial visualization of linearized model's performance"}}
I see from the Gist that getting behavior like this is as simple as adding a couple of decorators to a class that invokes a little Javascript. Based on this example, I can see a way to implement a backend project management system that I alluded to in my previous post.
Importing from other notebooks
A big issue for me is that my modeling work
involves multiple files and the continued editing of common library
files in the local project. If I’m going to adopt ipython notebooks as
the basis for my workflow then I need to make most, or all, of these
files into .ipynb
notebook files so that they can all be marked up
and processed similarly. This would normally prohibit importing them
directly into other notebooks when they are needed.
However, I found
this snippet,
which I turned into a
module I named notebook_importing.py
. I
placed it in a local library folder on my python path for safe
keeping. I then edited my kernel config file’s option to execute commands on
launch:
c.InteractiveShellApp.exec_lines = [
'import notebook_importing'
]
This ensures that ‘import’ is overloaded to check for the ipynb
file extension and pre-process it appropriately.
Debugging
One of the most useful things I can do in my IDE is to graphically set breakpoints and trace bugs using the live interpreter. There is the opportunity to interact post-mortem after a bug, using %debug
in the next cell after the traceback. This works well enough for that aspect of tracing, at least.
Remaining issues
- I can’t easily see contents of a notebook-based script once outside of the notebook. I’m not always online to use nbviewer, and a local installation is not the easiest thing to do for an end user (it’s not packaged, per se). This approach requires docker but looks relatively simple. Anyway, this is an inconvenient extra step to quickly get a view of the code. (Addendum: Jupyter notebooks can now export to .py easily from the File menu)
- There are no line numbers for reference, and if I can’t edit natively in my IDE then I can’t use breakpoints so easily.
- Doing a
diff
on ipynb-formatted files makes it difficult for the end user to track changes with git. For my literate modeling workflow idea, there is therefore a need to flatten the json to create a regular.py
file before doing thediff
– but to also include the markup somehow, in case of comment/tag changes in the YAML! (Addendum: see comments below!)
Nonetheless, with some caveats, I now see how I could use ipython notebooks for the interactive command line workflow along the same lines that I like to do in regular ipython.