CausalNex, a toolkit for causal reasoning

CausalNex is a library developped by QuantumBlack to facilitate the causal analysis of a dataset. At its root, CausalNex relies on Bayesian Networks.
For more on Bayesian Networks, have a look at Wikipedia and a tutorial by Kevin Murphy. The training of these Bayesian Networks (causal inference) uses the algorithm introduced in the paper DAGs with NO TEARS.

Installation

The documentation is pretty clear. The library can be easily installed by doing pip install causalnex. Note that for some reason (not clear to me), we can’t install via poetry. Also, causalnex requires pandas=0.24.0, which seems to be a problem with the current project.

Last, but not least, causalnex requires the library pygraphviz which has to be installed separately. And of course, pip install pygraphviz returns an error. I ended up having to install everything but causalnex via conda the pip install causalnex. But this may not be convenient for everyone, and it’s weird that the library is so finicky.

Tutorial

The documentation contains (for now) a single tutorial that I will go through. The first thing we need to do is download the dataset, and unzip it.

Jupyter Extensions

Jupyter Notebook offer really neat extensions that can honestly transform your experience working with notebooks.

How to install

First step is to install. There are different ways (conda, pip, poetry,…). You can check out the documentation here. When you install jupyter_contrib_nbextensions, it will automatically install jupyter_nbextensions_configurator (see here), which provides a nice GUI to enable/disable the extensions.

The whole process is pretty easy, but there are 2 actions that you need to take before having the luxury of enjoying all the goodies:

  1. Activate the configurator
    jupyter nbextensions_configurator enable --user
    

    then

  2. Activate the extensions
    jupyter contrib nbextension install --user
    

    I’m honestly not sure of the order. I did in that order, but maybe it doesn’t matter.

What extensions?

A few useful extensions:

  • table of content
  • collapsible headings
  • move selected cells

Article about AI companies

Andreessen Horowitz published a great article that is getting everyone in the AI space to talk about. I’ll post some comments later, but I just want to bookmark this one for now.

Running a jupyter notebook on a remote server

Super simple, but because I have the memory of a squirrel I need to mark it down. So when you have session running on a remote server, you can start a jupyter notebook on that server. The catch is that you need to specify the ip of that remote server, otherwise you won’t connect to the server but locally. If you are in bash session on that server, you can do

jupyter notebook --ip=$(hostname -I)

If you want to start the notebook directly, without connecting to bash first, you can do something like

<server exec sessionid> -- bash -c 'jupyter notebook --ip=$(hostname -I)'

When running inside a Docker image, you need to take a few more steps. First, you need to publish the 8888 port of your machine, i.e.,

docker run -it -p 8888:8888 -v <...> image:version /bin/bash

Then inside your container

jupyter notebook --ip=$(hostname -I) --allow-root

Then on my laptop, the second url was the one that worked (http://127.0.0.1:888/?token=...).

How-to Poetry

Poetry is a way to manage virtual environment in Python, a bit similar to Anaconda. The way it is being using in t-s is that there is a pyproject.toml file and a lock file in the root of the repo, which means you can just poetry install to create the virtualenv. By default, the “extras” dependencies will not be installed. To install those, you instaed need to do poetry install --extras "<name of package>"; this is equivalent to doing poetry install and on top of that installing the extra dependencies requested. When you update the version of certain dependencies, you can update you poetry environment by doing poetry install. To add new dependencies without modifying the pyproject.toml, you can do poetry add <name_of_dependency>.

Once you’re all set up, you can run some commands inside your virtual environment. For instance, to run ipython, you would do

poetry run ipython

To start a jupyter notebook session, you would do

poetry run jupyter-notebook

Their documentation is pretty good.

How to add a dependency

The first time you create a pyproject.toml file and you run poetry install, poetry will resolve all the conflicts and save the version of each dependency in a lock file, poetry.lock. You should version control both files.

If you want to add a new dependency, add it in the poetry.toml file then run poetry install, which will update the lock file and commit both.

Because poetry resolves conflicts for you, you will not necessarily have, in your lock file, the latest verison of all dependencies as requested in your pyproject.toml file. If you want to update your dependencies, you need to run poetry update, which will effectively delete your lock file and installing again.

Note that sometimes, adding directly into the pyproject.toml file doesn’t work (SolverProblemError…version solving failed) even though poetry should be able to find it. The workaround (which is a bug) is to install that dependency via poetry add <dependency>. This let poetry add that dependency to the .toml file then resolves conflicts in the .lock file.