In this first series of posts, I introduce important tools to construct inference methods for the estimation of parameters in stochastic models. Stochastic models are characterized by randomness in their mathematical nature, and since at first I focus on models having dynamic features, these models are defined by stochastic processes.

I will start by introducing a class of dynamic models known as *state space models*.

For general (non-linear, non-Gaussian) state space models it is only relatively recently that a class of algorithms for exact parameter inference has been devised, in the Bayesian framework. In a series of 4-5 posts I will construct the simplest example of this class of *pseudo-marginal* algorithms, now considered the state-of-art tool for parameter estimation in nonlinear state space models. Pseudo-marginal methods are not exclusively targeting state space models, but are able to produce exact Bayesian inference whenever a positive and unbiased approximation of the likelihood function is available, no matter the underlying model.

I will first define a state space model, then introduce its likelihood function, which turns out to be *intractable*. I postpone to the next post the construction of Monte Carlo methods for approximating the likelihood function.

### State space models

A very important class of models for engineering applications, signal processing, biomathematics, systems biology, ecology etc., is the class of state-space models (SSM). *[In some literature the terms SSM and hidden Markov model (HMM) have been used interchangeably, though some sources make the explicit distinction that in HMM states are defined over a discrete space while in SSM states vary over a continuous space.]*

Suppose we are interested in modelling a system represented by a (possibly multidimensional) continuous-time stochastic process , where denotes the state of the system at a time . The notation denotes the ensemble of possible values taken by the system for a continuous time .

However, in many experimental situations the experimenter does not have access to measurements from but rather to noisy versions corrupted with “measurement error”. In other words the true state of the system is unknown, because is latent (unobservable), and we can only get to know something about the system via some noisy measurements. I denote the available measurements (data) with and use to denote the process producing the actual observations at discrete time points. For simplicity of notation I assume that measurements are obtained at integer observational times . Each can be multidimensional () but it does not need to have the same dimension of the corresponding , for example some coordinate of might be unobserved. Therefore, the only available information we get from our system is rather partial: (i) the system of interest is continuous in time but measurements are obtained at discrete times and (ii) measurements do not reflect the true state of the system , because the are affected with some measurement noise. For example we could have , with some random noise.

In general and can be either continuous– or discrete–valued stochastic processes. However in the following I assume both processes to be defined on continuous spaces.

I use the notation to denote a sequence . Therefore, data can be written . For the continuous time process I use to denote realizations of the process at times . Clearly, none of the values is known.

Assume that the dynamics of are *parametrized *by a model having a (vector) parameter . The value of is unknown and our goal is to learn something about using available data. That is to say, we wish to produce inference about . I could write in place of , but this makes the notation a bit heavy.

State space models are characterized by two properties: *Markovianity* of the latent states and *conditional independence* of measurements.

**Markovianity:** is assumed a Markov stochastic process, with transition density , for . That is, “given the present state, the past is independent of the future”, so if time is the “present”, then . Also in this case, for simplicity we assume implicit the conditioning on , instead of writing . Specifically for our inference goals, we are interested in transitions between states corresponding to contiguous (integer) observational times, that is . Also “the past is independent of the future, given the present”, meaning that .

**Conditional independence:** measurements are assumed conditionally independent given a realization of the corresponding latent state . That is, .

Markovianity and conditional independence can be represented graphically:

Notice each white node is only able to directly influence the next white node and . Also, each grey node is unable to influence other measurements *directly*. [This does not mean observations are independent: for example and evidently it results in . If equality did hold would be independent of .]

Notice is the initial state of the system at some arbitrary time prior to the first observational time . By convention we can set this arbitrary time to be .

In summary, a compact, probabilistic representation of a state-space model is given by the conditional distribution of the observable variable at given the latent state, and the distribution of the evolution of the (Markovian) state, that is the transition density. Optionally, the initial state can be a fixed deterministic value or have its own (unconditional) distribution which might depend or not on .

**Example: Gaussian random walk**

A trivial example of a (linear, Gaussian) state space model is

with . Therefore

### Parameter inference and the likelihood function for SSMs

As anticipated, I intend to cover tools for statistical inference for the vector of parameters , and in particular discuss Bayesian inference methods for SSM.

This requires introducing some quantities:

- is the likelihood function of based on measurements .
- In the Bayesian framework is a random quantity and is its prior density (I always assume continuous-valued parameters). It encodes our knowledge about before we “see” the current data .
- The Bayes theorem gives the
*posterior distribution*of , enclosing uncertainty about for given data:

. - is the marginal likelihood (evidence), given by .
- inference based on is called
*Bayesian inference.*

**Goal**: we wish to produce Bayesian inference on . In principle this involves writing down the analytic expression of and study its properties. However, for models of realistic complexity, what is typically performed is some kind of Monte Carlo sampling of pseudo-random draws from the posterior . Then we can have a finite-samples approximation of the marginal posteriors () compute the sample means of the marginals, quantiles etc. This way we perform uncertainty quantification for all components of , for a given model and available data.

Now, the main problem preventing a straightforward sampling from the posterior, is that for nonlinear, non-Gaussian SSM the likelihood function is not available in closed form nor it is possible to derive a closed-form approximation. It is *intractable*. Let’s see why.

In a SSM data are not independent, they are only *conditionally* independent. This means that we cannot write as a product of unconditional densities: instead we have

with the convention .

The problem is that all densities in the expression above are unavailable in closed form, hence unknown. If these were available we could either use an algorithm to find a (local) maximizer to (the maximum likelihood estimate of ), or plug the likelihood into the posterior and perform Bayesian inference.

The reason for the unavailability of a closed-form expression for the likelihood is the latency of process , on which data depend. In fact we have:

The expression above is *intractable* for two reasons:

- it is a -dimensional integral, and
- for most (nontrivial) models, the transition densities are
**unknown**.

Basically the only way to solve the integral when gets large is via Monte Carlo methods. A special case amenable for closed-form computation is the linear SSM with Gaussian noise (see the Gaussian random walk example): for this case the Kalman filter can be employed to return exact maximum likelihood inference. In the SSM literature important (Gaussian) approximations are given by the extended and unscented Kalman filters.

However, for nonlinear and non-Gaussian SSM, sequential Monte Carlo methods (or “particle filters”) are presently the state-of-art methodology. Sequential Monte Carlo (SMC) methods will be considered in a future post. SMC is a convenient tool to implement the pseudo-marginal methodology for exact Bayesian inference that I intend to outline in this first series of posts.

### Summary

I have introduced minimal notions to set the inference problem for parameters of state space models (SSM). The final goal is to summarize a methodology for exact Bayesian inference, the pseudo-marginal method. This will be outlined in future posts, with focus on SSM. I have also stated some of the computational issues preventing a straightforward implementation of likelihood based methods for the parameters of SSM. In the next two posts I consider some Monte Carlo strategies for approximating the likelihood function.

### Further reading

An excellent, accessible introduction to filtering and parameter estimation for state space models (and recommended for self study) is Simo Särkkä (2013) Bayesian Filtering and Smoothing, Cambridge University Press. The author kindly provides free access to a PDF version at his webpage. Companion software is available at the publisher’s page (see the Resources tab).

State space modelling is a fairly standard topic that can be found treated in most (all?) signal processing and filtering references, so it does not make sense to expand the list of references for this post. However, I will add more suggestions in future posts, in connection with the further topics I introduce.

Pingback: Sequential Monte Carlo and the bootstrap filter | Umberto Picchini's research blog

Pingback: Why and how pseudo-marginal MCMC work for exact Bayesian inference | Umberto Picchini's research blog