Stick-breaking made easy!

So, one of the papers I’ve gotten the most out of in the last month or so is this really superlative effort by Scott Linderman, Matt Johnson and Ryan Adams on multinomial-stick-breaking using Polya-Gamma augmentations (a draft of the paper can be found here). It’s excellent!

But why is it excellent? And how is going to help me better understand species or pottery or voter preference distributions? Well, a few years ago Nicholas Poulson and others showed how you could augment very specific likelihoods with additional variables (called Polya-Gamma augmentations; PG for short) to make them vastly easier to sample by (effectively) transforming them into Gaussians. That doesn’t sound like much until you realize that the logistic is one of those likelihoods (also Bernoullis, binomials, negative binomials). That models with logistic regression samples often have serious issues with sampling is a well-known issue in statistics and a lot of smart people (Chris Holmes, Leonard Held, Siddhartha Chib, David Mimmo) have tried their hand in different ways. I didn’t really understand this (even though I’ve both used and taught logistic regression) but I did understand that when I tried to build a logistic-regression framework for the DMM, convergence was horribly unreliable. Now I see why!

Basically, Poulson’s work shows how to rid statistics of this problem in a lot of practical contexts. What Linderman et al. do is extend this to multinomial distributions: they cleverly use an identity that comes up in a first-semester probability course (you can test yourself: show that a multinomial on K categories can be written as a product of K-1 binomials) to extend the PG idea to multinomials. Which has a lot of applications (they show some nice ones in the paper) and, having thought a bit about applications for the DMM, I can come up with a few more! Anyway, since I just spent the day implementing an extension to their Gaussian processes framework, I thought I’d do a shout-out so that others can look at this really cool work.

(While I do love the paper, be forewarned: there are a couple of typos in inconvenient places, so you can’t just use the formulas as written; you’ll have to re-derive a couple of them. Personally, that was helpful since it meant I had to really understand what they were saying but others might not find it so.)

 

 

5 comments

  1. Nice post, Jack. Good to hear that your sabbatical is going well. Thanks for reminding me of the Polya-Gamma augmentation. I’ve made a note to look into it a couple of years ago, but never did. The Linderman et al. article is full of nice goodies, will definitely put it on the journal club list.

    -BBV

  2. Hello,

    I just gave this paper a first read. Can you please share the typos that you discovered?

    Thanks!
    EE

    1. Hi EE,

      There’s a couple of others that don’t have much consequence that don’t need mentioning but I think that Equation (13) doesn’t quite fit with Equation (14), which does look right to me. Which is important if you’d like to get the mixture model working. See Equation (9), for comparison.

      Hope that helps,

      Jack

      1. Thank you for the information. Apparently I had an earlier version of the paper, as mine did not have equations 13 and 14 enumerated.

        That said, I am still unpacking the paper. I am trying to arrive at equation 9 through the grouping/cancelling of terms and completing the square, without luck. Would love a blog post deriving equation 9!

Leave a reply to johndobrien Cancel reply