- Since decades, probability scientists are continuously improving algorithms to draw random samples from pdf known up to a normalizing constant: \[ \pi(\theta) = \frac{f(\theta)}{\int_\theta f(\theta)d\theta} \] ( \(f(\theta)\) being a positive measure.)

Especially convenient for Bayesians \[ \pi(\theta) \propto \mathrm{prior}(\theta) \times \mathrm{Likelihood}(\theta)\] but also useful for frequentists working on latent class models with SEM based algorithms.

Unleashed Bayesian modelers tend to develop

*adhoc*complex models but at the same time wish to rely effortlessly on available inference toolboxes.Common softwaresâ€™ (BUGS, JAGS etc.) capabilities become rapidly saturated and the MCMC burn-in and cruising phases may take an unacceptably long while (because standard MCMC provide strongly correlated visits of the target distribution) : need for efficient algorithms (with less correlated trajectories such as the ones obtained from

*Piecewise Deterministic Markov Processes*) as well as quick inference (recourse to C++).A new class of gradient-based MC algorithms is becoming incredibly

*fashionable*in many domains of Applied Stats.Stan is a well documented statistical language, supported by many big names of the statistical community .

A word of warning:

*He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may be cast, L. Da Vinci*

â€¦, although most friends of mine (with the exception of some Britons, maybe) would go and sail, blindly relying on modern GPS.

Much of the following material has been stolen from:

MCMC using Hamiltonian Dynamics by

*Neal*R.M , chapter 5 from Handbook of Markov Chain Monte Carlo,edited by Brooks, Gelman, Jones & MengEfficient Bayesian Inference with Hamiltonian Monte Carlo (60 mn \(\times 2\)) by

*Betancourt*M., video MLSS Iceland 2014, part 1 & 2, youtube.com/watch?v=xWQpEAyI5s8Stan: A probabilistic programming language by

*Carpenter B., Gelman A., Hoffman M., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li J. & Riddell A.*in the Journal of Statistical Software, 2015Faster estimation of Bayesian models in ecology using HMC by

*Monnahan C. C., Thorson, J. T. & Branch, T. A.*in Methods Ecol Evol.,2016 doi:10.1111/2041-210X.12681Scalable Bayesian Inference with Hamiltonian Monte Carlo (40 minute video) by

*Betancourt M.*, ICERM Video Archive, 2016

Part 1 : a bit of theory

At the beginning was

**Physics**: Hamiltonian dynamicsFrom Physics to

**Maths**: Hamiltonian propertiesFrom Maths to

**Stats**: Hamiltonian MCMC

Part 2 : How to launch and run Stan (on a cherry tree)

Part 3 : Going further with Stan

With the *dot* notation for derivatives:

\[ \underset{\text{Energy}}{\underbrace{H(v,x)}}=\underset{\text{Potential}% }{\underbrace{U(x)}}+\underset{\text{Cinetic}}{\underbrace{K(v)}} \]

\[ \underset{\text{Energy}}{\underbrace{H(v,x)}}=\underset{\text{Potential}% }{\underbrace{U(x)}}+\underset{\text{Cinetic}}{\underbrace{K(v)}} \]

Usually, the total energy is written as:

\[\underset{\text{Energy}}{\underbrace{H(q,p)}}=\underset{\text{Potential}% }{\underbrace{U(q)}}+\underset{\text{Cinetic}}{\underbrace{K(p)}}\]

\(p\) : generalized momentum vector \(\in\mathbb{R}^{d}\)

\(q\) : generalized position vector \(\in\mathbb{R}^{d}\)

\[\begin{align} \frac{dq}{dt} & =\frac{\partial H(q,p)}{\partial p}\\ \frac{dp}{dt} & =-\frac{\partial H(q,p)}{\partial q} \end{align}\]What are the properties of a function \(H(p,q)\)â€¦ such that:

\[\begin{align*} \frac{dq}{dt} & =\frac{\partial H(q,p)}{\partial p}\\ \frac{dp}{dt} & =-\frac{\partial H(q,p)}{\partial q}% \end{align*}\]More globally, in just a single equation:

\[\frac{dz}{dt}=J\times\nabla H\]

with \(z=\left(\begin{array}[c]{c} q\\ p \end{array} \right)\), \(J=\left(\begin{array}[c]{cc} 0_{d\times d} & 1_{d\times d}\\ -1_{d\times d} & 0_{d\times d} \end{array} \right)\)

The mapping \(\mathcal{T}_{s}\) : \(\left( t,\left( \begin{array}[c]{c} q(t)\\ p(t) \end{array} \right) \right) \longmapsto\left( t+s,\left( \begin{array}[c]{c} q(t+s)\\ p(t+s) \end{array}\right)\right)\) is one-to-one, thus reversible.

If \(H(q,p)=U(q)+K(p)\) with \(K(p)=K(-p)\), \(\mathcal{T}_{-s}\) is obtained by the sequence of operations \((p\rightarrow-p),\mathcal{T}_{s},(p\rightarrow-p)\)

A vector field \(\mathcal{T}\) with \(0\) divergence preserves volume:

\[\begin{align*} \Delta\mathcal{T} & =\left( \frac{\partial}{\partial p}\right) \frac {dp}{dt}+\left( \frac{\partial}{\partial q}\right) \frac{dq}{dt}\\ & =\frac{\partial}{\partial p}\left( -\frac{\partial H(q,p))}{\partial q}\right) +\frac{\partial}{\partial q}\left( \frac{\partial H(q,p)}{\partial p}\right) \\ & =-\left( \frac{{\partial^{2}}H(q,p)}{{\partial p}{\partial q}}\right) +\left( \frac{{\partial^{2}}H(q,p)}{{\partial p}{\partial q}}\right) =0 \end{align*}\]i.e.Â preservation of pdf by a change of variables since the determinant of the Jacobian matrix \(J_{\delta}\) for a change of coordinates \(\mathcal{T}_{\delta}\) from \(s\) to \(s+\delta\) will be \(1\)!

How to practically evaluate \(\mathcal{T}_{\delta}\left(\begin{array}[c]{c} q(t)\\ p(t) \end{array}\right)\) for a small \(\delta\)?

if \(H(q,p)=U(q)+K(p)\) and \(K(p)=\frac{1}{2}p^{\prime}M^{-1}p\)

\[\begin{align*} \left(\begin{array}[c]{c} q(t+\delta)\\ p(t+\delta) \end{array} \right) & \approx\left( \begin{array}[c]{c} q(t)+\delta\frac{\nabla K(p)}{\partial p}\\ p(t)-\delta\frac{\nabla U(q)}{\partial q} \end{array} \right) \\ & \approx\left( \begin{array}[c]{c} q(t)+\delta\times M^{-1}p(t)\\ p(t)-\delta\times\frac{\partial U}{\partial q}(q(t)) \end{array} \right) \end{align*}\]Eulerâ€™s scheme may encounter problemsâ€¦Letâ€™s examine the example reproduced from FIG 5.1 of Neal with \(H(q,p)=q^{2}/2+p^{2}/2\) whose trajectories are circles. (The initial state is \(q=0,p=1\). \(L=20\) steps)

```
Eulerstep = function(epsilon, L, current_q, current_p) {
Q <- current_q
P <- current_p
for (i in 1:L) {
q = current_q + 1 * current_p * epsilon
p = current_p - 1 * epsilon * current_q
current_p = p
current_q = q
Q = c(Q, q)
P = c(P, p)
}
return(list(P = P, Q = Q))
}
angle = seq(0, 2 * pi, length.out = 1000)
plot(cos(angle), sin(angle), lwd = 4, type = 'l', lty = 3, las = 1,
xlab = "position q",ylab = "position q", xlim = c(-2, 2), ylim = c(-2, 2))
xy = Eulerstep(epsilon = 0.3, L = 20, current_q = 0, current_p = 1)
lines(xy$Q, xy$P, type = "b", pch = 19, col = 'red')
```

There are ways to improve the discretisation, for instance Eulerâ€™s shear transformation works as follows:

\[\begin{align*} q(t+\delta) & \approx q(t)+\delta\times M^{-1}p(\mathbf{t})\\ p(t+\delta) & \approx p(t)-\delta\times\frac{\partial U}{\partial q}(q(\mathbf{t+\delta})) \end{align*}\]```
EulerModifstep = function(epsilon, L, current_q, current_p){
Q = current_q
P = current_p
for (i in 1:L) {
current_q = current_q + 1 * current_p * epsilon
current_p = current_p - 1 * epsilon * current_q
Q = c(Q, current_q)
P = c(P, current_p)
}
return(list(P = P, Q = Q))
}
plot(cos(angle), sin(angle), lwd = 4, type = 'l', lty = 3, las = 1,
xlab = "position q", ylab = "position q", xlim = c(-2, 2), ylim = c(-2, 2))
xymod = EulerModifstep(epsilon = 0.3, L = 20, current_q = 0, current_p = 1)
lines(xymod$Q, xymod$P, col = 'blue', type = "b", pch = 19)
```