Another recent paper that I’ve both enjoyed and found a lot of practical benefit from is Nathan Stein and Xiao-li Meng’s “Practical perfect sampling using composite bounding chains: the Dirichlet-multinomial model,” (Biometrika, 2013). In addition to constructing a perfect sampler for Dirichlet-multinomial (DM) distributions, this paper gives two easily constructed Gibbs samplers for DMs. What’s cool about these samplers is that they both use a variable augmentation strategy that places the DM within an urn-replacement scheme. This allows them to construct two different ways of looking at the DM that correspond to two different parameterizations of the distribution that naturally come up in a lot of situations.
The DM on categories is usually parameterized in terms of , where each is an inverse concentration parameter ‘repelling’ counts away from as increases. There is also a somewhat more intuitive presentation more reminiscent of the multinomial distribution with parameters . The ‘s are the expected frequencies each category (like the multinomial) and $\theta$ is an inverse-variance parameter. The relationship between the two is straight-forward: . What’s cool about Stein and Meng’s work (or at least the start of it; there’s a lot of even cooler stuff in the construction of the composite bounding chain) is that they can show that both of these presentations can be embedded in the same replacement scheme to realize two complementary Gibbs samplers. This means that folks can build MCMC schemes that can be reasonably efficient even for the generally difficult-to-sample DM distribution.
For applied folks such as myself, the upside is that you can use DM distributions in a lot more cases than you could before (large , large ). I’ve put these samplers to fairly good use in a couple of recent papers (shameless self-promotion: http://arxiv.org/abs/1511.05185 and http://biorxiv.org/content/early/2016/03/24/045468). However, the data sets in those papers made for easy work since there was no missingness: every sample had the potential to observe every one of the categories. Unfortunately, my current data sets (one in ecology, one in genomics, one in political science) all have the same underlying issue: they all have samples where some number of categories are not observed for structural reasons. All of which creates a big ole headache since I can’t seem to re-derive these samplers for the case of missing data. Which would be fantastic since – while DM-based models are definitely on the rise – not being able to deal thoughtfully with missing data is going to hold back their wide-deployment.
Any help, interwebs?