07 Apr 2020
CausalNex is a library
developped by QuantumBlack to facilitate the causal analysis of a dataset. At
its root, CausalNex relies on Bayesian Networks.
For more on Bayesian Networks, have a look at
Wikipedia and a
tutorial by Kevin Murphy.
The training of these Bayesian
Networks (causal inference) uses the algorithm introduced in the paper DAGs
with NO TEARS.
Installation
The documentation is pretty clear.
The library can be easily installed by doing pip install causalnex
.
Note that for some reason (not clear to me), we can’t install via poetry
.
Also, causalnex
requires pandas=0.24.0
, which seems to be a problem with the
current project.
Last, but not least, causalnex
requires the library pygraphviz
which has to
be installed separately. And of course, pip install pygraphviz
returns an
error. I ended up having to install everything but causalnex
via conda
the
pip install causalnex
. But this may not be convenient for everyone, and it’s
weird that the library is so finicky.
Tutorial
The documentation contains (for now) a single
tutorial
that I will go through.
The first thing we need to do is download the
dataset, and unzip it.
31 Mar 2020
Jupyter Notebook offer really neat extensions that can honestly transform your
experience working with notebooks.
How to install
First step is to install. There are different ways (conda
, pip
,
poetry
,…). You can check out the documentation
here.
When you install jupyter_contrib_nbextensions
, it will automatically install
jupyter_nbextensions_configurator
(see
here),
which provides a nice GUI to enable/disable the extensions.
The whole process is pretty easy, but there are 2 actions that you need to take
before having the luxury of enjoying all the goodies:
- Activate the configurator
jupyter nbextensions_configurator enable --user
then
- Activate the extensions
jupyter contrib nbextension install --user
I’m honestly not sure of the order. I did in that order, but maybe it doesn’t
matter.
What extensions?
A few useful extensions:
- table of content
- collapsible headings
- move selected cells
24 Feb 2020
Andreessen Horowitz published a great
article
that is getting everyone in the AI space to talk about. I’ll post some comments
later, but I just want to bookmark this one for now.
21 Oct 2019
Super simple, but because I have the memory of a squirrel I need to mark it
down. So when you have session running on a remote server, you can start a
jupyter notebook on that server. The catch is that you need to specify the ip of
that remote server, otherwise you won’t connect to the server but locally. If
you are in bash session on that server, you can do
jupyter notebook --ip=$(hostname -I)
If you want to start the notebook directly, without connecting to bash first,
you can do something like
<server exec sessionid> -- bash -c 'jupyter notebook --ip=$(hostname -I)'
When running inside a Docker image, you need to take a few more steps. First,
you need to publish the 8888
port of your machine, i.e.,
docker run -it -p 8888:8888 -v <...> image:version /bin/bash
Then inside your container
jupyter notebook --ip=$(hostname -I) --allow-root
Then on my laptop, the second url was the one that worked
(http://127.0.0.1:888/?token=...
).
31 Jul 2019
Poetry is a way to manage virtual environment in Python, a bit similar to
Anaconda.
The way it is being using in t-s is that there is a pyproject.toml
file and a
lock file in the root of the repo, which means you can just poetry install
to
create the virtualenv. By default, the “extras” dependencies will not be
installed. To install those, you instaed need to do poetry install --extras "<name of
package>"
; this is equivalent to doing poetry install
and on top of that
installing the extra dependencies requested.
When you update the version of certain dependencies, you can update you poetry
environment by doing poetry install
.
To add new dependencies without modifying the pyproject.toml
, you can do
poetry add <name_of_dependency>
.
Once you’re all set up, you can run some commands inside your virtual
environment. For instance, to run ipython
, you would do
To start a jupyter notebook session, you would do
poetry run jupyter-notebook
Their documentation is pretty good.
How to add a dependency
The first time you create a pyproject.toml
file and you run poetry install
,
poetry will resolve all the conflicts and save the version of each dependency in
a lock file, poetry.lock
. You should version control both files.
If you want to add a new dependency, add it in the poetry.toml
file then run
poetry install
, which will update the lock file and commit both.
Because poetry
resolves conflicts for you, you will not necessarily have, in
your lock file, the latest verison of all dependencies as requested in your
pyproject.toml
file. If you want to update your dependencies, you need to run
poetry update
, which will effectively delete your lock file and installing
again.
Note that sometimes, adding directly into the pyproject.toml
file doesn’t work
(SolverProblemError…version solving failed) even though poetry
should be
able to find it. The workaround (which is a
bug) is to install that
dependency via poetry add <dependency>
. This let poetry
add that dependency
to the .toml
file then resolves conflicts in the .lock
file.