I will describe a breakthrough strategy for “exact-approximate” Bayesian inference. The apparent contradiction in the terminology is due to the surprising result in Beaumont (2003) and Andrieu and Roberts (2009) where it is shown that plugging a non-negative and unbiased stochastic approximation of the likelihood function into an MCMC sampler is enough to obtain exact Bayesian inference for model parameters.
The pseudo-marginal approach
I have decided to write about pseudo-marginal MCMC methods since the original literature introducing these methods can be fairly intimidating. Though this need not be. I will show how it is very simple to prove the validity of the approach. Theoretical difficulties may arise depending on the type of pseudo-marginal method we choose to consider for a specific case study. However the basic idea is fairly simple to describe and I proceed at a slow step-by-step pace. Other accessible intro to the topic are in the Resources section at the end of this post.
Suppose we wish to use Markov chain Monte Carlo (MCMC) to sample from a distribution having probability density (probability mass) function . Suppose that
is not available in closed form and that its pointwise evaluation is not possible at any given
. Finally, suppose that a non-negative and “unbiased” random estimator of
is available (we define what we mean with unbiased estimator further below), and we denote this estimator with
. Then an intuitive thing to do is to use
in place of
into a Metropolis-Hastings algorithm. This way the (approximate) Metropolis-Hastings ratio for a proposal
becomes (see this introductory post to Metropolis-Hastings if necessary)
and is accepted with probability
. Although it is not evident yet, what we have just written is an example of a pseudo-marginal method, but I will clarify this point below. Therefore in practice applying the method is relatively simple.
The astonishing result provided by a pseudo-marginal method is that the Markov chain constructed as above — where proposals are accepted according to
instead of the true and unavailable acceptance ratio
(where
uses the exact
) — has
as stationary distribution, the “exact” target. Therefore, we can use the approximate
into an MCMC sampler, while still obtaining exact Monte Carlo sampling from
.
In this post whenever we write “exact sampling” we mean “exact up to Monte Carlo error”, where the latter can be arbitrarily decreased by increasing the number of MCMC iterations.
A simple construction
We want to show that the Markov chain consisting in proposals accepted with probability has stationary distribution
, that is as
the generated chain is sampled from
, with
the number of MCMC iterations.
First of all we need to obtain a non-negative and unbiased random estimator of . We now take the opportunity to define what we mean with “unbiased estimator” in this context.
Unbiasedness
An unbiased estimator of
should be such that
for every
, where
denotes expectation. However which probability measure do we consider for this expectation? Answering this question shows why the topic of this post is particularly relevant for sampling from intractable distributions or, in Bayesian inference, sampling from posterior distributions having intractable likelihoods.
Before trying to find unbiased estimators let’s first see what unbiasedness mean when using Monte Carlo sampling.
It is often the case that while is unknown in closed form, it is instead easy to write the joint distribution of
and some other random quantity
we are not really interested in (we call it a “disturbance parameter” or an “auxiliary variable”). Let’s
be their joint distribution. Assume the domain of the marginalized
to be continuous (without loss of generality), then marginalization implies integration and
where basically the “unwanted” source of variability is integrated out. But then again we assume that analytic integration is out of reach.
A possibility to approximate the integral above is given by Monte Carlo methods, resulting into a stochastic approximation that we show to be unbiased. We simulate iid samples
from the distribution of
which we denote with
The second identity rewrites the joint distribution using both conditional probabilities and . Finally we carry out Monte Carlo integration. Notice, we do not need to know the expression of
, but we need to be able to simulate from it. Clearly, the above assumes that we are able to evaluate
at every
(this construction naturally applies to state-space models, when we consider
as “data” and the
as “particles”).
From the second identity it should be clear that can be considered as the expectation of
taken with respect to
, as we have
and the notation
emphasizes this fact.
Finally, it is easy to show that the Monte Carlo approximation gives an unbiased estimator of , as for any simulated
we have shown that
and therefore by denoting with
the approximated density returned by Monte Carlo we define
and
this holding for any and for any number of samples
. Also, the estimate is clearly non-negative.
Having discussed unbiasedness for simple Monte Carlo it is now easy to show how to use this fact for statistical inference.
Pseudo-marginal MCMC for exact Bayesian inference with intractable likelihoods
In the previous section we have considered a generic . Here we use Metropolis-Hastings in a Bayesian context to simulate from a specific distribution, the posterior distribution
for a parameter
, given data
. From Bayes’ theorem we have
, with
the prior of
,
the likelihood function and
the “evidence” (if you are interested in state-space models you might want to check an earlier post).
In most realistic modelling scenario, the likelihood function is unavailable in closed form and we say that the statistical model we are considering has an “intractable likelihood”. However it might be possible to obtain an unbiased non-negative approximation of the likelihood . We show that in this case it is also possible to obtain
samples
from
, that is from the exact posterior of
.
The previous section used Monte Carlo integration, by independently sampling from a distribution employing , an “auxiliary” variable we do not care for. Also in this case we consider sampling from an “augmented” space, which is the space
over which the pair
is defined.
Therefore we now artifically sample from the augmented approximate posterior (it is approximate because we have an approximate likelihood) even if we are ultimately interested in
. In this context, in order to write the joint (augmented) posterior we need to introduce the prior for
, which we call
. Recall we are not interested in conducting inference for
, therefore the definition of its prior is only guided by convenience as we are required to be able to sample from
. Also, we assume
and
a-priori independent, that is their joint prior can be written as
.
As mentioned above, we assume the availability of a non-negative unbiased approximation to the likelihood which we denote . By introducing the usual auxiliary variable, this approximation can be written as
.
Notice we took bacause of the assumed a-priori independence between
and
.
Then we require the key assumption of unbiasedness. Again we consider as an
-expectation and we want that
.
Now that we have set these requirements on the likelihood function, we proceed to write the approximate augmented posterior:
[Notice we put not
at the denominator: this follows from the unbiasedness assumption as we have
and the latter is by definition the evidence
.]
We know the exact (unavailable) posterior of is
therefore the marginal likelihood (evidence) is
and then we plug the evidence into
:
Now, we know that applying an MCMC targeting then discarding the output pertaining to
corresponds to integrating-out
from the posterior, that is marginalize
out. Therefore, assuming we use MCMC to perform such marginalization, the draws
we are returned with have the following stationary distribution
.
Notice we have recovered the exact posterior of .
We have therefore performed a pseudo-marginal approach: “marginal” because we disregard ; pseudo because we use
not
.
In conclusion we have shown that using MCMC on an (artificially) augmented posterior, then discard from the output all the auxiliary variates we generated during the entire MCMC procedure, implies that the sampled parameter draws are the product of an exact Bayesian inference procedure for
.
Besides the theoretical construction we have described, in practice all that is needed is running a regular MCMC for , without caring of storing the generated samples for
.
Follow up
Pseudo-marginal MCMC methods are due to Beaumont (2003) and Andrieu and Roberts (2009) and have revolutionised the application of Bayesian inference for models having intractable likelihoods. In my opinion they constitute one of the most important statistical advancements of the last thirty years.
In particular, the methodology has found an enormous success in inference for state-space models, as sequential Monte Carlo produces unbiased approximations of the likelihood for any number of particles (though unbiasedness is not trivial to prove in this case, but see Pitt et al or section 4A in Naesseth et al. which are way more accessible than the original Proposition 7.4.2 of Del Moral (2004)). It is even possible to obtain exact inference simultaneously for the states and the parameters of state-space models.
However, as shown above, the generality of the approach makes pseudo-marginal methods appealing for a wide range of scenarios, including approximate Bayesian computation and other methods for likelihood-free inference such as synthetic likelihoods.
Of extreme importance is the fact that the results presented above hold irrespectively of how we have obtained the unbiased approximation of the likelihood. If has been approximated via Monte Carlo using a “large” number
of auxiliary variates or a “small”
, the theoretical result still hold and we sample exactly from
regardless of
. However, for practical applications tuning
appropriately is important, as its value influence the precision of the likelihood approximation which in turn has an effect on the mixing of the chain. See this post and references therein.
Some resources
- Dahlin and Schön 2017: intro paper focussed on state-space models and computer implementation in R;
- D. Wilkinson’s blog post on particle marginal methods for parameter inference;
- D. Wilkinson’s blog post on particle marginal methods for parameter and state inference in state-space models.
- Fasiolo et al. 2016 : paper comparing several algorithmic strategies for intractable likelihoods.
- Technical papers on tuning particle marginal algorithms: Doucet et al. 2012; Pitt et al 2012; Sherlock et al. 2015; Sherlock 2016;
- Some demo software: my Matlab example; my R code (see the pomp_ricker-pmcmc file);
- Some serious R software: the pomp package (function pmcmc); smfsb (type demo(“PMCMC”) at command line); Nimble.
- More serious software for state-space models: LibBi; Biips (not sure if this one is actively maintained).
Pingback: State space models and intractable likelihoods | Umberto Picchini's research blog
Pingback: Monte Carlo sampling for likelihood approximation in state space models | Umberto Picchini's research blog
Pingback: Sequential Monte Carlo and the bootstrap filter | Umberto Picchini's research blog