Bias adjustment in Box-Cox transformation

The Box-Cox transformation is a parametric transformation that includes the logarithmic transfomration as a special case, and is defined as

\[w_t = \left\{ \begin{aligned} & \log(y_t) , \text{ if } \lambda = 0 \\ & \frac{(y_t)^\lambda - 1}{\lambda} , \text{ otherwise} \end{aligned} \right.\]

Quick node: the inclusion of the log-transformation is justified by the fact that $(y_t)^\lambda \approx 1 + \lambda \log(y_t)$ when $\lambda \rightarrow 0$.

Assuming you get better distribution with $w_t$, you can run your inference on that variable. But once this is done, you still need to revert it back to the quantity of interest, that is $y_t$. The inverse Box-Cox transformation is given by

\[y_t = \left\{ \begin{aligned} & \exp(w_t) , \text{ if } \lambda = 0 \\ & \left(\lambda w_t + 1 \right)^{1/\lambda} , \text{ otherwise} \end{aligned} \right.\]

However, one needs to be careful about the distribution of the inverted prediction~$y_t$. A general (non-parametric) way of handling this, and the way chosen by Hyndman, is to do a Taylor expansion around the mean. That is, calling $f$ the inverted Box-Cox transformation, and calling $\mu$ and $\sigma^2$ the mean and variance of the transformed variable $w_t$, we would have

\(y_t \approx f(w_t) = f(\mu) + (w_t - \mu) f'(\mu) + \frac12 (w_t - \mu)^2 f''(\mu).\) Then taking the mean of that expression, we get \(\mathbb{E}[y_t] = f(\mu) + \frac12 \sigma^2 f''(\mu).\)

However in the special case of the log-transformation, and if the transformed variable $w_t$ was assumed to be normal (large class of models will make that assumption when calculating the uncertainty around the mean), we can simply use the results of a log-Normal, which tells us that (1) $\exp(w_t)$ is the median of $y_t$, and (2) the mean of $y_t$ is given by $\exp(\mu + \sigma^2/2)$. In the case $\sigma^2 \ll 1$, we recover the expression derived from the Taylor expression.

[ statistics  transformation  ]