**Mar 05, 2009**

Statistics And Computing

This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X, theta)f(X | theta)f(theta), where Y is the (set of) observed data, theta is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X((t+1)) | X((t)), theta), where X((t)) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given theta, the transition model f(X((t+1)) | X((t)), theta) is known but the distribution of the stochastic process in equilibrium, that is f(X | theta), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time.We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model.As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.