Recurrent Neural Networks

LSTM

RNNs suffer from memory problems. The solution is to use gated neuron units, i.e, neurons that have a complex systems of masks/filters/… that process the information flowing in. A classical first read on the subject is Colah’s blog post. The main difference with a typical cell is the presence, in addition to the input and the hidden state, of a cell state, intended to capture the long-term memory of the network. That cell state is then modified through 3 gates:

  • forget gate layer: applies a pointwise multiplication to the cell state with the output of a sigmoid, therefore deciding what values of the cell state to let flow.
  • input gate layer: same step as above except that this will be applied to the output of a tanh cell, then added (pointwise) to the output of the forget gate layer.
  • output gate layer: decides what is output as a hidden state. The cell state is untouched after the forget and input gate layers, but the hidden state will be passed through a tanh (between -1 and 1), then filtered by a sigmoid (pointwise multliplication). So in short, a LSTM unit decides what to keep from the cell state, how to update some of the entries of the cell state, then from that state state what to output as the hidden state.

There exists of course variants:

  • peepholes: each or some of the gates can have direct access to the cell state
  • couple forget & input: only update (input) the entries you forgot (weights for forget and input are one minus the other)
  • GRU: It combines the above idea (combine forget and input) with the
  • combination of cell state and hidden state.

Notes on convex optimization

Identify convex constraints

Inequality constraints

  • If $f$ is convex, the constraint $f(x) \leq a \in \mathbb{R}$ is convex.

Take $x_1,x_2$ that satisfy that constraint and $\alpha \in [0,1]$, àthen if $f$ is convex, we have \(f(\alpha x_1 + (1-\alpha) x_2) \leq \alpha f(x_1) + (1-\alpha) f(x_2) \leq a\).

Setting up my MacBook Pro

I received a Mac for work. It’s beautiful, it’s fancy, and I have no idea how to use it. I’m going to summarize here the steps I followed to set up my laptop, in particular getting git, python, and a compile environment.

Git

First up, git of course. Mac does not ship with an equivalent of apt-get, but you can install one of several package managers. I went for Homebrew. I didn’t want to have to install it in sudo mode (not sure this was such a big deal, in the end), so I decided to install it in my personal folder (/Users/local). But to make it possible to execute the softwares installed via Homebrew from any directory, I had to add the path to the directory where I installed Homebrew to my PATH. This is done by modifying the file paths

sudo vim /etc/paths

Also, during the install, I had to install the command-line tools from XCode. This part was handled automatically by Mac App Store. Once Homebrew was installed, I updated and upgraded

brew update
brew upgrade

With Homebrew installed, I can next install git with a simple command

brew install git

This installation of git ships with a bunch of other app, like gitk. The latter requires to have jdk installed, which I did from the website (can’t find the link now).

Next, I need autocomplete to make it work. For that you can simply install bash-completion

brew install bash-completion

Then you need to point you bashrc file to that command.

Update 2021-02-17

I didn’t have to change the file /etc/paths. I directly changed $PATH in my .zshrc file. I suppose the modification of the /etc/paths is more solid, but so far I haven’t run into any complications. For the auto-completion, I added

autoload -Uz compinit && compinit

to my .zshrc file. To add the git branch in my terminal prompt (btw, brew install iterm), I added

source /Users/bencrestel/Work/other/zsh_setup/zsh-git-prompt/zshrc.sh

after cloning the repo zsh-git-prompt. I fine-tuned the aspect of the prompt and ended up with

PROMPT='%B%F{blue}%n%f@%F{green}%m%f:%F{red}%~%b%f$(git_super_status) %# '

Docker

Still using Homebrew, I could install docker. However to have the nice Docker GUI with it, I had to use cask,

brew cask install docker

Update 2021-02-17

To specify a cask, now you need to do

brew install --cask docker

I also installed the desktop app.

Jekyll

Jekyll is nice to power simple, efficient blogs. You install via Ruby, and the instructions provided here were sufficient, except for a few jekyll modules (jekyll-gist, jekyll-seo-tag,…) which I had to install using gem again. But it worked in the end.

Update 2021-02-17

This time I installed ruby 3.0, which seems to work a bit differently. The first steps in the above link are still required, but next to install missing dependencies (webrick, kramdown-parser-gfm, jekyll-watch,…), I had to use bundle add <...>. This install the missing dependencies locally, only for your project. The only piece that I was missing was a Gemfile; you can simply create a text file with that name and add the single line source "https://rubygems.org". Then everytime you do bundle add <...>, it adds a new line to that Gemfile with that new dependency.

pipenv

Next, and still using Homebrew, I installed pipenv, which seems to be a nice lightweight environment manager that can be useful for software development. A nice little intro video is posted on that website.

pyenv

Pyenv is meant to be a simpler way to define environments. I installed via Homebrew. Then to create a Python 3.6 environment, you do

pyenv init
pyenv install 3.6.0

Note: It never worked for me

How to get started with Docker

Running a Docker image

Docker is way to run a specific application without having to install it, or compile it and deal with all the dependencies. In Ubuntu, it was very easy to install the docker software from the command line. On Mac, you can look at that post. Once this is done, you need to identify the image you want to run. A Docker image is a container a certain set of applications. To run that image, you can type in the command line docker run <image> If you have never run that image before, the first time you execute that command, docker will download the image and all other stuff it needs. If you want to download the image without running it, you can instead do docker pull <image>.

A few options make the whole Docker experience a lot more useful. In particular, you typically want that image to be opened in an interactive shell. For that, you need the options -it. Another useful feature is to be able to access some folders of your local hard drive from within the image; this can be done with the option -v LOCAL_FOLDER:DOCKER_FOLDER.

For example, when running the Fenics Docker image, I would do

docker run -t -i -v /home/ben/Work/fenicstools:/home/fenics/fenicstools -v /home/ben/Work/hippylib:/home/fenics/hippylib fenics20171

to have access to my local fenicstools and hippylib folders. In another example, to run a tensorflow Docker image, I did

docker run -t -i -v
/home/ben/Work/Programmation/Python/mlds/tensorflow/:/home/tf/
tensorflow/tensorflow bash

The bash was required here as by default the tensorflow image starts a notebook. Actually you start in the /notebook folder, and need to navigate to the folder you defined cd ../home/tf. But once you figure this out, everything works great.

Running a jupyter notebook using a Docker image

I found the solution in this StackOverflow post. First you need to publish a port of the container to the host (your laptop),

docker run <...> -p 8888:8888 <your_image> bash

Inside your Docker image, you can start the jupyter notebook,

jupyter-notebook --ip 0.0.0.0 --no-browser --allow-root

Now on your local machine, from your browser, navigate to localhost:8888/tree. You will be prompted with a menu asking for a token. After starting the jupyter notebook, you’ll get a http adress which contains the sequence :8888/?token=<...>. Your token is made of all the alphanumeric characters following the equal sign.

Creating a Docker image

You can save all the characteristics of the image you want to create in a Dockerfile, then build the corresponding image by doing, if you’re in the same directory as the Dockerfile,

docker build -t <name>:<tag> .

For instance docker build -t local/ben:latest .. You can also specify the modules you want installed in a Pipfile that you load as part of your Docker file, then install with pipenv. In that case, you need to first generate the lock file, then install all the modules. For instance,

# Python dependencies
RUN pip install pipenv==2018.11.26
ADD ./Pipfile ./
RUN pipenv lock
RUN pipenv install --system

Another way to build the image which may be more flexible is to use a docker-compose.yml file (see here) and execute it through

docker-compose up <name_specified_in_yml>

The up, by default will start the container after building and creating it. To prevent this from happening, you can pass the option --no-start. Note: I had a lot of trouble with docker-compose.

Dockerfile and absolute path

Sometimes you want to create a docker image with files that are in a different directory. However, you can’t do that with docker. When you define an absolute path inside your Dockerfile, this refers to the absolute path in the build context. A work-around is to create your docker image from a place where you can access (in relative path) all the files you need. And if you want your Dockerfile to be somewhere else, you can use the -f option,

docker build -f /home/Dockerfile -t mytag .

Pruning Docker

Docker have a very conservative approach to garbage collection, it seems, and keep everything unless you asked it to delete it. The problem is that if you’re not careful, you can end up filling up all your available memory, and you can’t build/pull any images. The solution is to either (1) increase the amount of memory Docker can use, or (2) prune all the images/containers/… that you don’t need anymore. To prune everything at one, just do

docker system prune --volumes

Power iteration for the $k$ dominant eigenvectors

First of a disclaimer: This post is not an extensive review of state of the art techniques to compute eigenvalues or eigenvectors of a matrix. I’m just summarizing a simple result on the power iteration. This being said, I think it’s fair that I try to motivate the use of power iteration, given how much bad press this algorithm gets.

The power iteration is a simple way to compute the dominant eigenvector of a diagonalizable matrix $A$. It is generally slow to converge. What are the alternatives. Typically, you could use an eigenvalue-revealing factorizations, then get the eigenvectors with the Rayleigh quotient iteration. You could even stop the factorization early and refine the eigenvalue and compute the eigenvector at the same time with the Rayleigh quotient iteration, which converges at a cubic rate(!). However, the eigenvalue-revealing factorizations (that I am aware of) all require access to the entries of the matrix. And one of the steps in the Rayleigh quotient iteration is an inverse iteration, which involves solving a linear system with the matrix $A$. All of this to say that in some situations, e.g., if the matrix $A$ is not assembled and you can only compute a matvec, and/or if the matrix $A$ is very large and sparse such that the matvec is cheap but the inversion costly, you may want to rely on the power iteration.

The algorithm is pretty simple. You sample a random vector $v$, then repeat the following steps

  • multiply by $A$, i.e., $v = A.v$
  • normalize $v$, i.e., $v = v / | v|$

Then $v$ will converge to the dominant eigenvector. Why? Since $A$ is diagonalizable, its eigenvectors form a basis. Let’s call these eigenvectors $q_i$ and the corresponding eigenvalues $\lambda_i$. Then we can write any random vector as \(v = \sum_i (v^T.q_i) q_i\). And then

\[A^n. v = \sum_i \lambda_i^n (v^T.q_i) q_i\]

After sufficiently many iterations, $A^n . v$ will point toward the dominant eigenvector. To avoid blowing everything, we normalize $v$ after each step.

Now the next question is: how to apply power iteration to compute the first $k$ eigenvectors? No problem, we can do that. Let’s think about the second dominant eigenvector. It will be the dominant eigenvector if we look in the hyperplane defined by the dominant eigenvector, that is $q_1^\perp$. One idea would be to first compute $q_1$ using the power iteration, then repeat the same procedure but projecting $v$ onto $q_1^\perp$ at each step. That would be:

  • multiply by $A$, i.e., $v = A.v$
  • project onto $q_1^\perp$, i.e., $v = v - (v^T.q_1)q_1$
  • normalize $v$, i.e., $v = v / | v|$

This algorithm would converge to $q_2$. After doing so, we could repeat the same procedure but project onto $(q_1,q_2)^\perp$. And so on, so forth. Now, we can actually do all the steps at the same time. Instead of sampling a single vector, sample a matrix $V$ with as many columns as you want eigenvectors. Then after each left-multiplication by $A$, instead of projecting then normalizing, simply do a QR decomposition of $V$ and keep the $Q$ matrix.

  • left-multiply by $A$, i.e., $V = A.V$
  • project and normalize with a QR decomposition, i.e., $V=Q$ where $Q,R = QR(V)$

That matrix will converge toward the first $k$ dominant eigenvectors. Here is the code in Python

import numpy as np
def power_iteration_k(A, k, eps=1e-10):
    """
    Inputs:
        A = matrix (symmetric)
        k = nb of eigenvectors to compute
        eps = precision
    Outputs:
        v = matrix of the k dominant eigenvectors
    """
    m, n = A.shape
    v = np.random.randn(n*k).reshape((-1,k))
    v,_ = np.linalg.qr(v)
    for kk in range(1000):
        v_old = v.copy()
        v = A.dot(v)
        v,_ = np.linalg.qr(v)
        diff = np.max(np.sqrt(np.sum((v-v_old)**2, axis=0)))
        if diff < eps:
            return v