Does Infer.NET have factors for Bayesian logistic regression? If there are some in-house factors for VMP, do they use Tommi Jaakkola's bound with auxiliary variational parameters?
Hi Ferrum
Apologies for the delay in replying. Infer.NET does not have inbuilt factors for logistic regression. However, it does have support for probit regression - you can build up a linear model, either making use of the InnerProduct factor or, using plates (variable arrays), with product and summation factors. The output of the linear model can then be fed through an IsPositive factor. The IsPositive factor uses EP rather than VMP (so the answer is no to the VMP question). We have, in the past, implemented an EP operator for a logistic link function (actually, as I recall, it took Binomial observations rather than Bernoulli observations); this never made it into Infer.NET, as we have tended to make do with probit regression for internal applications. We also have an inhouse version of a deterministic logistic factor where the message types are Gaussian on input and Beta on output (again EP) - but this is probably not what you were looking for. If there are special reasons for needing a VMP treatment, or a logistic rather than probit, then we can discuss further.
Thanks
John G.
Hi John, thank you for the reply.
Hi John,
Hope all is well at MSR-C.
In fact, Multiple Logistic Regression (i.e., softmax) would be really useful because it can naturally handle multiclass problems. I need to do logistic regression very soon. A potentially dumb question -- Is it really not possible to come up with such a model but building a linear model, use logisitic transform (by using the Variable.Exp factor) and tying the result to the (0-1) labels? I would think that this may work for binary classification.
Thanks,
Vincent
Hi Ferrum and John G,
If I'm not mistaken, the probit regression model you mentioned in this post is exactly the same as the Bayes Point Machine model/example that can be found in the Tutorials. So that would be a good place to start to build the model.
Best regards,
Hi Vincent
In theory you might be able to do something like that, and this is certainly in the spirt of Infer.NET. However, you cannot do it with the existing set of operators. For example (a) the current Exp operator projects to a Gamma (b) ratio operator only supports Gaussian and does not support random denominator, and (c) you would need to project the ratio to a Beta.
John
Yes, thanks for pointing that out. The Bayes Point machine shows probit regression - using the IsPositive constraint on a Gaussian distributed variable. You can also use IsPositive and IsBetween to do ordinal regression.
You can do multi-class probit regression using http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Multi-class%20classification.aspx.
Hi John and Vincent,
in my view, one of the practical motivations for implementing VMP for logistic regression (in addition to the existing EP for probit regression) is linked to better convergence properties of VMP (possibly at a cost of under-estimating the variance). For some datasets, it may be rather awkward to handle the problems with improper messages arising for EP (see my thread on Infer.NET: improper distribution exceptions during EP inference, and a link to John's and Tom's excellent NIPS workshop presentation). Using an annealing of alpha in power EP may potentially be a possible way forward, though I can see that including it to Infer.NET and working out the details may take some time. At the meantime, could it be useful to include an arguably more stable (VMP?) solution for a GLM classifier?
Agreed. I also encounter the error message "improper distributions exception during EP inference" when running a standard BPM on a dataset.
One quick fix that I have found useful is the following. One needs to add Gaussian noise (with small precision => high uncertainty) to the inner product <w,x> when constructing the model. The precision of the noise has to be fixed. Ideally, one would hope that this can be inferred but I have run into the same improper distributions exception when I try to set a Gamma prior on the prec. So I simply set the precision to be a fixed double, say 0.1d (sometimes larger precs don't work). Then everything seems to work fine. Any scientific explanation for this phenomenon? Perhaps the intuition is that one needs to incorporate more uncertainty when dealing with noisy datasets.
Tom has now written a VMP operator for logistic regression which uses the Jaakola and Jordan bound. It will be available in the next beta release (tentatively scheduled for the second week in July). Meanwhile, the code is small enough that I can paste it in here - it uses the BernoulliFromLogOdds factor which already exists in the current beta. I have renamed the operator class below to TempBernoulliFromLogOddsOp so as not to conflict with the largely unimplemented operator class which exists in the current beta. You can put this code in your assembly, and Infer.NET should pick this up, though somewhere in your assembly you will need to put the following annotation which tells Infer.NET that your assembly contains message functions:
[assembly: MicrosoftResearch.Infer.Factors.HasMessageFunctions]
Here is the code for the operator. I will put example usage code in a follow-up post.
// (C) Copyright 2009 Microsoft Research Cambridge
using System;using System.Collections.Generic;using System.Text;using MicrosoftResearch.Infer.Distributions;using MicrosoftResearch.Infer.Maths;namespace MicrosoftResearch.Infer.Factors{ /// <summary> /// Provides outgoing messages for <see cref="Factor.BernoulliFromLogOdds"/>, given random arguments to the function. /// </summary> [FactorMethod(typeof(Factor), "BernoulliFromLogOdds")] public static class TempBernoulliFromLogOddsOp { /// Evidence message for VMP public static double AverageLogFactor(bool sample, Gaussian logOdds) { double m,v; logOdds.GetMeanAndVariance(out m, out v); double t = Double.IsPositiveInfinity(v) ? 2.4 : Math.Sqrt(m*m+v); double a = Math.Tanh(t/2)/(2*t); double s = sample ? 1 : -1; double m2 = m*m+v; return MMath.LogisticLn(t) + (s*m-t)/2 - a/2*(m2 - t*t); } /// VMP message to 'LogOdds' public static Gaussian LogOddsAverageLogarithm(bool sample, Gaussian logOdds) { double m,v; logOdds.GetMeanAndVariance(out m, out v); double t = Double.IsPositiveInfinity(v) ? 2.4 : Math.Sqrt(m*m+v); double a = Math.Tanh(t/2)/(2*t); double s = sample ? 1 : -1; return Gaussian.FromMeanAndPrecision(s/(2*a), a); } }}
Here's the usage for the previous post. Simplest example for a single observation:
Variable<double> w = Variable.GaussianFromMeanAndPrecision(1.2, 0.4);Variable<bool> y = Variable.BernoulliFromLogOdds(w);InferenceEngine ie = new InferenceEngine(new VariationalMessagePassing());y.ObservedValue = true;Gaussian wPosterior = ie.Infer<Gaussian>(w);If you want to compute evidence, you can put in an If block as usual:
Variable<bool> evidence = Variable.Bernoulli(0.5);IfBlock block = Variable.If(evidence);Variable<double> w = Variable.GaussianFromMeanAndPrecision(1.2, 0.4);Variable<bool> y = Variable.BernoulliFromLogOdds(w);block.CloseBlock();y.ObservedValue = true;Gaussian wPosterior = ie.Infer<Gaussian>(w);Bernoulli e = ie.Infer<Bernoulli>(evidence)
Let usknow how it goes if you decide to use these.
John G
Excellent, thanks!
Yes, the Gaussian noise here is important since it encodes how much noise is expected in the dataset. If you use zero noise, then you are implying that the classes are perfectly separable by a line. If this is not the case, then EP will usually crash since the posterior distribution is empty.
Will you make support for binomial variables with Bayesian Logistic Regression?
I would like to apply following BUGS model using Infer.Net:
model { for (i in 1:N) { y[ i ] ~ dbin(p[ i ], n[ i ]) logit(p[ i ]) <- inprod(x[ i,], w[]) } for (j in 1:m) { w[ j ] ~ dnorm(0, tau[ j ]) tau[ j ] ~ dgamma(0.001, 0.001) } }
Best regards
Alvin
Hi John G,
This is not really an Infer.NET question but rather a question about the multiclass BPM model in the link you posted.
Am I right to say that in the multiclass probit model, there is no indeterminacy in the weights w_1, ..., w_K because we impose a VectorGaussian(0,I) prior on each weight vector? I ask this because the multiclass model does not seem to reduce to the binary classification model since there are two sets of weights in the multiclass model (when K=2) but for the BPM model in the tutorials, there is only 1 set of weights.
Thanks once again, Vincent.