Microsoft Research Community

error occurs when constructing a lda model with simple corpus

rated by 0 users
This post has 22 Replies | 7 Followers

Top 50 Contributor
Posts 14

I run the proposed LDA code and commented out the symmetry breaking codes. So the inference returns uniform results for the inferred variables.

I read the mixture of Gaussians tutorial and it states that breaking symmetry is a consequence of using approximate inference algorithms such as VMP. Can I confirm my understanding?

  1. We break symmetry because of the approximate inference algorithms.
  2. If we use exact algorithms, do we still break symmetry?
  3. Will exact inference algorithms such as Junction Trees be supported in the future?

Top 25 Contributor
Posts 31

The reason we need to break symmetry is that the model is not identifiable. Supposing we generated some data from the model with known phis e.g. corresponding to topics 1=education, 2=health and 3=economy. Now suppose we relabel the topics so that 1=health and 2=education and swap the parameters accordingly i.e. we swap phi1 and phi2 and we swap the first two elements of theta. If we generate from this new model, then we get data with exactly the same distribution as before the swap.  In fact, this will be true of any permutation of how we label the topics, because the model is symmetric with respect to the topics.

Now suppose we have some data and don't know the phis but wish to infer their posterior distribution.  The true posterior will be a multi-modal distribution with one mode for every possible permutation of the topics.  However, both EP and VMP can only capture a single posterior mode. Since there is no reason for the inference prodecure to favour one mode over another - the symmetry in the model means that the updates will be exactly the same for all phis and all elements of theta and the inference will get stuck in an unstable equilibrium where all phis are the same and theta is uniform.  To escape from this symmetrical fixed point, we need to perturb the system somehow, to arbitrarily break the symmetry between the different topic permutations. We can do this by making the initial messages slightly different for each topic - this mean that the algorithm is started slightly closer to one posterior mode than the others and it can converge on that mode, corresponding to some particular permutation of the topics.

In summary, there is nothing wrong with the phis being identically distributed in the model before we see data.  After seeing data, the non-identifiable model will have a set of posterior modes corresponding to all possible permutations of the topics.  Neither VMP nor EP can capture such highly multi-modal distributions so we need to nudge them slightly towards one of the modes.

Top 25 Contributor
Posts 31

To expand further on your second question, if we were able to perform exact inference, we would recover the full multi-modal posterior and symmetry breaking would not be necessary.  However, exact inference is not tractable in this model.  In general, exact inference is only tractable for relatively small, discrete models (with some exceptions).  Hence, most kinds of models that people are interested in using today (such as LDA!) are not tractable for exact inference. For this reason, supporting junction trees is relatively low on our priority list - note also that there are plenty of existing software package for junction tree inference in discrete models.

Top 25 Contributor
Posts 37

With LDA we want to learn topics that are hidden in text documents. Phi refers to word distributions that are characteristic for each topic. If all those phis are identical, we found that all topics are identical. The question is whether all topics in the data are indeed identical (which I doubt is true if you have realistic data sets) or whether the inference got stuck at "a saddle point in optimization space". (I use the word saddle point here in a somewhat figurative manner.)

The problem with LDA is that any permutation of topic indices give an equally good solution.

Breaking the symmetry refers to initializing messages to non-null message, i.e. instead of starting the inference loop at a position that is likely to be a saddle point, we start a bit next to it. This initialization should "wash out" during the iterations (such as a gibbs sampling initialization will not have any effect on the final result).

You can achieve a similar effect by perturbing each of the phi's prior a bit. But this will not wash out during inference.

Laura

 

Top 50 Contributor
Posts 14

Thank you all for the replies. I think things will be clearer when I learn about variational inference in my graphical model course.

Top 75 Contributor
Posts 9

Hi John,

I ran this model with Gibbs sampling, but failed during compilation. Any idea?

 

// Use Gibbs sampling

            GibbsSampling gs = new GibbsSampling();

            gs.BurnIn = 100;

            gs.Thin = 10;

            InferenceEngine ie = new InferenceEngine(gs);

            ie.NumberOfIterations = 2000;

            Console.WriteLine(ie.Infer(Z));

 

Top 10 Contributor
Posts 50

The Gibbs sampler is still in the experimental stages and it does not yet support 'Variable.Switch' or 'Variable.If'.

Top 75 Contributor
Posts 9

Thanks for reply. Now I got another question.

Not sure if I understand it correctly, but please help.

In this implementation, each document is denoted by indexed words. And each word is sampled from a topic’s word distribution.  The example shows that each word only appears once in a document.

I come across a question here. There is no dimensionality reduction for documents since word counts are not used in this model. If the documents include several repeated words, then each individual word would be regarded different, and the output of the code is inference for each individual word.

For example, if I replace the docs in John’s code with

// Documents of variable length

            int[] block1 = System.Linq.Enumerable.Repeat(0, 1000).ToArray();

            int[] block2 = System.Linq.Enumerable.Repeat(1, 2000).ToArray();

            int[] block3 = System.Linq.Enumerable.Repeat(8, 1000).ToArray();

            int[] block4 = System.Linq.Enumerable.Repeat(11, 1500).ToArray();

 

            int[] doc1 = block1.Concat(block2).ToArray();

            int[] doc2 = block3.Concat(block4).ToArray();

            int[] doc3 = block1.Concat(block4).ToArray();

            int[] doc4 = block2.Concat(block3).ToArray();  

 

            int[][] docs = {

                               doc1,

                               doc2,

                               doc3,

                               doc4

                           };

 Even though there are only 4 unique words (indexed by 0, 1, 8, 11) in the corpus, the model treat each single word in the document as different. The efficiency is not good in this way.

Did I understand it right? How do we handle this situation?

Thank you.

Page 2 of 2 (23 items) < Previous 1 2 | RSS
©2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement | Feedback