Processing math: 100%

Bias adjustment in Box-Cox transformation

The Box-Cox transformation is a parametric transformation that includes the logarithmic transfomration as a special case, and is defined as

wt={log(yt), if λ=0(yt)λ1λ, otherwise

Quick node: the inclusion of the log-transformation is justified by the fact that (yt)λ1+λlog(yt) when λ0.

Assuming you get better distribution with wt, you can run your inference on that variable. But once this is done, you still need to revert it back to the quantity of interest, that is yt. The inverse Box-Cox transformation is given by

yt={exp(wt), if λ=0(λwt+1)1/λ, otherwise

However, one needs to be careful about the distribution of the inverted prediction~yt. A general (non-parametric) way of handling this, and the way chosen by Hyndman, is to do a Taylor expansion around the mean. That is, calling f the inverted Box-Cox transformation, and calling μ and σ2 the mean and variance of the transformed variable wt, we would have

ytf(wt)=f(μ)+(wtμ)f(μ)+12(wtμ)2f(μ). Then taking the mean of that expression, we get E[yt]=f(μ)+12σ2f(μ).

However in the special case of the log-transformation, and if the transformed variable wt was assumed to be normal (large class of models will make that assumption when calculating the uncertainty around the mean), we can simply use the results of a log-Normal, which tells us that (1) exp(wt) is the median of yt, and (2) the mean of yt is given by exp(μ+σ2/2). In the case σ21, we recover the expression derived from the Taylor expression.

[ statistics  transformation  ]