23 Jul 2020
If you have a Docker image running on a remote server, you can set up PyCharm to
use the python interpreter in that image locally.
To do so, you go to Preferences > Project > Project Interpreter
. You then
select the SSH Interpreter
option. Then you need to set up your connection,
indicate the right port.
There is more on this in this
article.
Anyway, once this all done properly, I was still having some issue with running
pytest
with this ssh interpreter. pytest
would point to my local path. I had
to define a path mapping between my local path and my remote path. To do so, you
go to Preferences > Project > Project Interpreter
. Below the project
interpreter, you see Path mappings
, and you can define one from your local to
your remote. After that, pytest
should be able to find your test.
21 Jul 2020
Installation of xgboost requires cmake of version at least 3.17.3. And apt-get
was only installing cmake 3.5.x, or something like that. A nice solution is
described in this post. I picked the second
approach, from the binary. This is the code I included in my Dockerfile:
RUN mkdir /opt/cmake && \
cd /opt/cmake && \
wget https://github.com/Kitware/CMake/releases/download/v3.17.3/cmake-3.17.3-Linux-x86_64.sh && \
bash cmake-3.17.3-Linux-x86_64.sh --skip-license && \
ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake && \
echo $(cmake --version)
The last line is just to check that it’s working.
Also, note that you can install different versions of cmake
. You can find your
favorite one on their website.
21 Jul 2020
It can be useful to pass environment variable from your local environment to
your Docker build. This situation happened when I had to pass pypi keys to
install specific packages.
But since Docker is encapsulated, you need to take a couple
of steps to make it happen. Note that I’m assuming you’re building your Docker
image through a Dockerfile
- You need to pass the environment variables to your build command using the
flag
--build-arg
. For instance,
docker build --build-arg DOCKER_ENV_VAR=$MY_LOCAL_ENV_VAR -f Dockerfile -t my_image:my_tag .
- You need to define these variables in your Dockerfile. Continuing on the
previous example, this would mean adding the following line in your
Dockerfile
:
ARG DOCKER_ENV_VAR
...
RUN apt-get install <something> --flag=$DOCKER_ENV_VAR
02 Jul 2020
There are a few ways to manage dependencies: conda, poetry, pipenv. I recently
discovered a different way, pip-tools.
It’s actually very easy to use and in particular easy to integrate with a docker
image. You simply create a requirements.in file which pip-compile converts to a
requirements.txt file that you can then pip install inside your image by doing
pip install -r requirements.txt
.
There are multiple comparisons of poetry, pipenv, and pip-tools out there,
including this one that
compares specifically in the context of combining with docker, and that
one
that did a dec 2019 update and still declares pip-tools the winner. I also found
that blog post useful as it shows
a quick example of how to write a requirements.in.
You can install pip-tools
through pip, pip install pip-tools
. The only
things you need to be careful with are the python version and OS you use to
convert you requirements.in file to requirements.txt file. These needs to be the
same as what you’ll use for your virtual environment. With Docker, this can be
controlled by applying pip-tools
inside a running container, then re-building
that image.
14 Apr 2020
Bayesian Networks are probabilistic graphical models that offer a convenient,
compact way of representing joint probability distribution. A Bayesian Network
consists of a Directed Acyclic Graph (DAG) that connects different parameters
(node), each edge indicating a dependence (an edge from node A to node B if
the variable A helps explain B). Each node (random variable) is associated a
distribution in the form of a Conditional Probability Distribution (CPD), the
condition being on all the parent of that node. By definition, Bayesian Networks
do not contain cycles. Which is not the case of Markov Random Fields. For that
reason, Bayesian Networks are most often used when one tries to understand a
causal relationship between the variables.
The construction of a Bayesian Networks involve at least 2 steps:
- Generating the structure of the DAG (i.e., what nodes are connected and in
what direction). That is what the DAGs with NO
TEARS
algorithm does, in an efficient wayi (along with
code).
- Estimating the CPDs. This can be done by MLE or Bayesian estimation.
The website for the Quantum Black library causalnex contains a brief
introduction to Bayesian
Networks.
A longer, mode in-depth explanation can be found in this Stanford class on
Probabilistic Graphical Models.
An in-between solution might be to look at the slides for these two
presentations 1 and
2.
For sequential or temporal models, Dynamic Bayesian Networks were developped.
Kevin Muprhy has a tutorial
on his webpage.