03 Jan 2019
LSTM
RNNs suffer from memory problems. The solution is to use gated neuron units,
i.e, neurons that have a complex systems of masks/filters/… that process the
information flowing in. A classical first read on the subject is Colah’s blog
post.
The main difference with a typical cell is the presence, in addition to the
input and the hidden state, of a cell state, intended to capture the long-term
memory of the network. That cell state is then modified through 3 gates:
- forget gate layer: applies a pointwise multiplication to the cell state
with the output of a sigmoid, therefore deciding what values of the cell state to let flow.
- input gate layer: same step as above except that this will be applied to the
output of a tanh cell, then added (pointwise) to the output of the forget gate
layer.
- output gate layer: decides what is output as a hidden state. The cell state is
untouched after the forget and input gate layers, but the hidden state will be
passed through a tanh (between -1 and 1), then filtered by a sigmoid (pointwise
multliplication).
So in short, a LSTM unit decides what to keep from the cell state, how to update
some of the entries of the cell state, then from that state state what to output
as the hidden state.
There exists of course variants:
- peepholes: each or some of the gates can have direct access to the cell state
- couple forget & input: only update (input) the entries you forgot (weights for
forget and input are one minus the other)
- GRU: It combines the above idea (combine forget and input) with the
- combination of cell state and hidden state.
21 Dec 2018
Identify convex constraints
Inequality constraints
- If $f$ is convex, the constraint $f(x) \leq a \in \mathbb{R}$ is convex.
Take $x_1,x_2$ that satisfy that constraint and $\alpha \in [0,1]$,
àthen if $f$ is convex, we have
\(f(\alpha x_1 + (1-\alpha) x_2) \leq \alpha f(x_1) + (1-\alpha) f(x_2) \leq
a\).
04 Dec 2018
I received a Mac for work. It’s beautiful, it’s fancy, and I have no idea how to use it. I’m going to summarize here the steps I followed to set up my laptop, in particular getting git, python, and a compile environment.
Git
First up, git of course. Mac does not ship with an equivalent of apt-get, but you can install one of several package managers. I went for Homebrew.
I didn’t want to have to install it in sudo mode (not sure this was such a big deal, in the end), so I decided to install it in my personal folder (/Users/local).
But to make it possible to execute the softwares installed via Homebrew from any directory, I had to add the path to the directory where I installed Homebrew to my PATH.
This is done by modifying the file paths
Also, during the install, I had to install the command-line tools from XCode. This part was handled automatically by Mac App Store.
Once Homebrew was installed, I updated and upgraded
With Homebrew installed, I can next install git with a simple command
This installation of git ships with a bunch of other app, like gitk. The latter requires to have jdk installed, which I did from the website (can’t find the link now).
Next, I need autocomplete to make it work. For that you can simply install bash-completion
brew install bash-completion
Then you need to point you bashrc file to that command.
Update 2021-02-17
I didn’t have to change the file /etc/paths
. I directly changed $PATH
in my .zshrc
file. I suppose the modification of the /etc/paths
is more solid, but so far I haven’t run into any complications.
For the auto-completion, I added
autoload -Uz compinit && compinit
to my .zshrc
file.
To add the git branch in my terminal prompt (btw, brew install iterm
), I added
source /Users/bencrestel/Work/other/zsh_setup/zsh-git-prompt/zshrc.sh
after cloning the repo zsh-git-prompt. I fine-tuned the aspect of the prompt and ended up with
PROMPT='%B%F{blue}%n%f@%F{green}%m%f:%F{red}%~%b%f$(git_super_status) %# '
Docker
Still using Homebrew, I could install docker. However to have the nice Docker GUI with it,
I had to use cask,
Update 2021-02-17
To specify a cask, now you need to do
brew install --cask docker
I also installed the desktop app.
Jekyll
Jekyll is nice to power simple, efficient blogs. You install via Ruby, and the instructions provided here were sufficient, except for a few jekyll modules (jekyll-gist, jekyll-seo-tag,…) which I had to install using gem again. But it worked in the end.
Update 2021-02-17
This time I installed ruby 3.0, which seems to work a bit differently. The first steps in the above link are still required, but next to install missing dependencies (webrick
, kramdown-parser-gfm
, jekyll-watch
,…), I had to use bundle add <...>
. This install the missing dependencies locally, only for your project. The only piece that I was missing was a Gemfile
; you can simply create a text file with that name and add the single line source "https://rubygems.org"
.
Then everytime you do bundle add <...>
, it adds a new line to that Gemfile
with that new dependency.
pipenv
Next, and still using Homebrew, I installed pipenv, which seems to be a nice lightweight environment manager that can be useful for software development. A nice little intro video is posted on that website.
pyenv
Pyenv is meant to be a simpler way to define environments. I installed via Homebrew. Then to create a Python 3.6 environment, you do
pyenv init
pyenv install 3.6.0
Note: It never worked for me
01 Dec 2018
Running a Docker image
Docker is way to run a specific application without
having to install it, or compile it and deal with all the dependencies. In
Ubuntu, it was very easy to install the docker software from the command line.
On Mac, you can look at that post. Once this is done,
you need to identify the image you want to run. A Docker image is a container a
certain set of applications. To run that image, you can type in the command line
docker run <image>
If you have never run that image before, the first
time you execute that command, docker will download the image and all other
stuff it needs. If you want to download the image without running it, you can
instead do docker pull <image>
.
A few options make the whole Docker experience a lot more useful. In
particular, you typically want that image to be opened in an interactive shell.
For that, you need the options -it
. Another useful feature is to be able
to access some folders of your local hard drive from within the image; this can
be done with the option -v LOCAL_FOLDER:DOCKER_FOLDER
.
For example, when running the Fenics Docker image, I would do
docker run -t -i -v /home/ben/Work/fenicstools:/home/fenics/fenicstools -v /home/ben/Work/hippylib:/home/fenics/hippylib fenics20171
to have access to my local fenicstools
and hippylib
folders.
In another example, to run a tensorflow Docker image, I did
docker run -t -i -v
/home/ben/Work/Programmation/Python/mlds/tensorflow/:/home/tf/
tensorflow/tensorflow bash
The bash was required here as by default the
tensorflow image starts a notebook.
Actually you start in the /notebook
folder, and need to navigate to the
folder you defined cd ../home/tf
. But once you figure this out, everything
works great.
Running a jupyter notebook using a Docker image
I found the solution in this StackOverflow post. First you need to publish a port of the container to the host (your laptop),
docker run <...> -p 8888:8888 <your_image> bash
Inside your Docker image, you can start the jupyter notebook,
jupyter-notebook --ip 0.0.0.0 --no-browser --allow-root
Now on your local machine, from your browser, navigate to localhost:8888/tree
.
You will be prompted with a menu asking for a token. After starting the jupyter notebook,
you’ll get a http adress which contains the sequence :8888/?token=<...>
.
Your token is made of all the alphanumeric characters following the equal sign.
Creating a Docker image
You can save all the characteristics of the image you want to create in a Dockerfile, then build the corresponding image by doing, if you’re in the same directory as the Dockerfile,
docker build -t <name>:<tag> .
For instance docker build -t local/ben:latest .
.
You can also specify the modules you want installed in a Pipfile that you load as part of your Docker file, then install with pipenv
. In that case, you need to first generate the lock file, then install all the modules. For instance,
# Python dependencies
RUN pip install pipenv==2018.11.26
ADD ./Pipfile ./
RUN pipenv lock
RUN pipenv install --system
Another way to build the image which may be more flexible is to use a docker-compose.yml
file
(see here) and execute it through
docker-compose up <name_specified_in_yml>
The up
, by default will start the container after building and creating it. To prevent this from happening, you can pass the option --no-start
.
Note: I had a lot of trouble with docker-compose
.
Dockerfile and absolute path
Sometimes you want to create a docker image with files that are in a different
directory. However, you can’t do that with docker. When you define an absolute
path inside your Dockerfile, this refers to the absolute path in the build
context. A work-around is to create your docker image from a place where you can
access (in relative path) all the files you need. And if you want your
Dockerfile to be somewhere else, you can use the -f
option,
docker build -f /home/Dockerfile -t mytag .
Pruning Docker
Docker have a very conservative approach to garbage
collection, it seems, and keep everything unless you asked it to delete it. The
problem is that if you’re not careful, you can end up filling up all your
available memory, and you can’t build/pull any images. The solution is to either
(1) increase the amount of memory Docker can use, or (2) prune all the
images/containers/… that you don’t need anymore. To
prune everything at
one, just do
docker system prune --volumes
28 Nov 2018
First of a disclaimer: This post is not an extensive review of state of the art
techniques to compute eigenvalues or eigenvectors of a matrix. I’m just
summarizing a simple result on the power iteration. This being said, I think
it’s fair that I try to motivate the use of power iteration, given how much bad
press this algorithm gets.
The power iteration
is a simple way to compute the dominant eigenvector of a
diagonalizable matrix $A$. It is generally slow to converge. What are the
alternatives. Typically, you could use an eigenvalue-revealing factorizations,
then get the eigenvectors with the
Rayleigh quotient
iteration.
You could even stop the factorization early and refine the eigenvalue and
compute the eigenvector at the same time with the Rayleigh quotient iteration,
which converges at a cubic rate(!).
However, the eigenvalue-revealing factorizations (that I am aware of) all
require access to the entries of the matrix. And one of the steps in the
Rayleigh quotient iteration is an
inverse iteration,
which involves solving a linear system with the matrix $A$.
All of this to say that in some situations, e.g., if the matrix $A$ is not
assembled and you can only compute a matvec, and/or if the matrix $A$ is very
large and sparse such that the matvec is cheap but the inversion costly, you may
want to rely on the power iteration.
The algorithm is pretty simple. You sample a random vector $v$, then repeat the
following steps
- multiply by $A$, i.e., $v = A.v$
- normalize $v$, i.e., $v = v / | v|$
Then $v$ will converge to the dominant eigenvector. Why? Since $A$ is
diagonalizable, its eigenvectors form a basis. Let’s call these eigenvectors
$q_i$ and the corresponding eigenvalues $\lambda_i$. Then we can write any
random vector as \(v = \sum_i (v^T.q_i) q_i\). And then
\[A^n. v = \sum_i \lambda_i^n (v^T.q_i) q_i\]
After sufficiently many iterations, $A^n . v$ will point toward the dominant
eigenvector. To avoid blowing everything, we normalize $v$ after each step.
Now the next question is: how to apply power iteration to compute the first $k$
eigenvectors? No problem, we can do that. Let’s think about the second dominant
eigenvector. It will be the dominant eigenvector if we look in the hyperplane
defined by the dominant eigenvector, that is $q_1^\perp$. One idea would be to
first compute $q_1$ using the power iteration, then repeat the same procedure
but projecting $v$ onto $q_1^\perp$ at each step. That would be:
- multiply by $A$, i.e., $v = A.v$
- project onto $q_1^\perp$, i.e., $v = v - (v^T.q_1)q_1$
- normalize $v$, i.e., $v = v / | v|$
This algorithm would converge to $q_2$. After doing so, we could repeat the same
procedure but project onto $(q_1,q_2)^\perp$. And so on, so forth. Now, we can
actually do all the steps at the same time. Instead of sampling a single vector,
sample a matrix $V$ with as many columns as you want eigenvectors. Then after
each left-multiplication by $A$, instead of projecting then normalizing, simply
do a QR decomposition
of $V$ and keep the $Q$ matrix.
- left-multiply by $A$, i.e., $V = A.V$
- project and normalize with a QR decomposition, i.e., $V=Q$ where $Q,R = QR(V)$
That matrix will converge
toward the first $k$ dominant eigenvectors. Here is the code in Python
import numpy as np
def power_iteration_k(A, k, eps=1e-10):
"""
Inputs:
A = matrix (symmetric)
k = nb of eigenvectors to compute
eps = precision
Outputs:
v = matrix of the k dominant eigenvectors
"""
m, n = A.shape
v = np.random.randn(n*k).reshape((-1,k))
v,_ = np.linalg.qr(v)
for kk in range(1000):
v_old = v.copy()
v = A.dot(v)
v,_ = np.linalg.qr(v)
diff = np.max(np.sqrt(np.sum((v-v_old)**2, axis=0)))
if diff < eps:
return v