close
close

first Drop

Com TW NOw News 2024

GIST: Now with local step size adjustment for NUTS
news

GIST: Now with local step size adjustment for NUTS

After 12 years, we have finally figured out how to do NUTS-like local step-size adjustment for NUTS that preserves the detailed balance. This starts with Michael Betancourt’s revised multinomial NUTS, as used in Stan, with a preference for the last doubling.

We just published the arXiv paper and would love to hear feedback on it.

  • Nawaf Bou-Rabee, Bob Carpenter, Tore Selland Kleppe, Milo Marsden. 2024. Integrating local step size adaptivity into the No-U-Turn sampler using Gibbs Self Tuning. arXiv 2408.08259.

Nawaf recently gave a talk summarizing our work on GIST at the Bernoulli conference in Germany and I gave a talk on it at Sam Livingstone’s workshop in London in June.

It works!

Here’s a graph from the article showing the marginal draws of the logarithmic scale parameter of Neal’s funnel in both our new local stepwise adaptation of GIST and Stan’s implementation of NUTS.


GIST: Now with local step size adjustment for NUTS

Note that NUTS does not get into the neck of the funnel, where the log scale parameter is low. This would have been easier to see with a shaded histogram fill, so we need to fix that in the revised paper.

The idea behind GIST

In GIST we couple the tuning parameters (step size, number of steps, mass matrix) to position and momentum. We do this in the same way as HMC itself coupled momentum to position. More specifically, we design a conditional distribution of tuning parameters given position and momentum. We then resample the tuning parameters in a Gibbs step in the same way as we resample momentum. The Hamiltonian (in this case NUTS) component is then a simple Metropolis-in-Gibbs step (as in vanilla HMC). Unlike in HMC, where the momentum distribution is independent of the position distribution, in GIST we have a dependency between tuning parameters and position/momentum, so we need a non-trivial Metropolis-Hastings step (as opposed to NUTS). The trick is to design a conditional distribution of tuning parameters that does the right local adjustment and leads to a high acceptance rate – that is what the papers show how to do.

It may be easier to start with the first GIST paper, which introduces the framework and shows that NUTS, apogee-to-apogee, randomized and multinomial HMC can be framed as GIST samplers. A simpler alternative to NUTS is also introduced, based on a U-turn.

  • Nawaf Bou-Rabee, Bob Carpenter, Milo Marsden. 2024. GIST: Gibbs self-tuning for locally adaptive Hamiltonian Monte Carlo. arXiv 2404.15253.

Anyone interested in mass matrix customization?

I have a working proof-of-concept that performs local mass matrix fitting, and while it works almost perfectly for conditioning multivariable normals, I have not yet been able to get it to work for locally conditioning non-log-concave targets like Neal’s funnel.

Reproducible code

Code to reproduce the results of the paper and also for our ongoing experiments is available in our public GitHub repository.

  • github.com/bob-carpenter/adaptive-hmc.

Delayed rejection (generalized) HMC

This is a follow-up to other work I did with collaborators here at the Flatiron Institute on local step size adjustment using delayed rejection for Hamiltonian Monte Carlo,

  • Chirag Modi, Alex Barnett, Bob Carpenter. 2024. Delayed rejection Hamiltonian Monte Carlo for sampling multiscale distributions. Bayesian Analysis 19(3).

We recently extended this work to use generalized HMC, which is much more efficient and allows for more “local” adjustment of step size than what we do globally in GIST and delayed rejection HMC,

  • Gilad Turok, Chirag Modi, Bob Carpenter. 2024. Sampling multiscale densities with delayed rejection generalized Hamiltonian Monte Carlo. arXiv 2406.02741.

Chirag extends the DR-G-HMC work with GIST and an efficient L-BFGS-like approach for estimating the mass matrix. This seems to work better than what I have tried purely within GIST. Stay tuned!

Applicants

Milo Marsdenwho did much of the heavy lifting for the GIST articles is a PhD student in applied mathematics at Stanford who will graduate this year and is seeking a postdoc or faculty position.

Gilad Turokwho took the lead on the DR-G-HMC paper, was a Columbia University Applied Math intern who stuck around this year as a research analyst at Flatiron. He plans to apply for graduate school in Comp Stats/ML next year. Keep an eye out for our Blackjax package that implements Agrawal and Domke’s approach to normalizing realNVP flow VI.