Microsoft Research Community

Infer.NET: Bayesian logistic regression

rated by 0 users
This post has 23 Replies | 7 Followers

Top 100 Contributor
Posts 8
Ferrum Posted: 05-14-2009 9:23 AM

Does Infer.NET have factors for Bayesian logistic regression? If there are some in-house factors for VMP, do they use Tommi Jaakkola's bound with auxiliary variational parameters?

Top 10 Contributor
Posts 56

Hi Ferrum

Apologies for the delay in replying. Infer.NET does not have inbuilt factors for logistic regression. However, it does have support for probit regression - you can build up a linear model, either making use of the InnerProduct factor or, using plates (variable arrays), with product and summation factors. The output of the linear model can then be fed through an IsPositive factor.  The IsPositive factor uses EP rather than VMP (so the answer is no to the VMP question).  We have, in the past, implemented an EP operator for a logistic link function (actually, as I recall, it took Binomial observations rather than Bernoulli observations); this never made it into Infer.NET, as we have tended to make do with probit regression for internal applications. We also have an inhouse version of a deterministic logistic factor where the message types are Gaussian on input and Beta on output (again EP) - but this is probably not what you were looking for. If there are special reasons for needing a VMP treatment, or a logistic rather than probit, then we can discuss further.

Thanks

John G.

Top 100 Contributor
Posts 8

Hi John, thank you for the reply.

I would think that having a logistic factor would be quite useful. It arises quite naturally as a marginal in a Boltzmann machine, which could potentially be useful for dealing with deep architectures (e.g. by using RBMs). The logistic is also a solution of some common differential equations used e.g. in applied medical research.  Some people working in the area may like to interpret the logistic regression coefficients as estimates of the odds ratios. Logistic regression may also be motivated from the (conditional) max entropy principles.
 
Generally, since logistic regression is so widely used in statistics and machine learning, I think it could potentially be quite useful to include it to Infer.NET. In addition to the reasons mentioned above, some people may be tempted to use it for a benchmark comparison with fancier classifiers. If one really needed to, one could probably indeed use EP for a scaled probit as an alternative. However, given availability of the lower bound for the logistic function (Jaakkola and Jordan, 1997), could VMP for a logistic have better convergence properties?
Thanks!
Top 75 Contributor
Posts 11

Hi John,

Hope all is well at MSR-C.

In fact, Multiple Logistic Regression (i.e., softmax) would be really useful because it can naturally handle multiclass problems. I need to do logistic regression very soon. A potentially dumb question -- Is it really not possible to come up with such a model but building a linear model, use logisitic transform (by using the Variable.Exp factor) and tying the result to the (0-1) labels? I would think that this may work for binary classification.

Thanks,

Vincent

 

Top 75 Contributor
Posts 11

Hi Ferrum and John G,

If I'm not mistaken, the probit regression model you mentioned in this post is exactly the same as the Bayes Point Machine model/example that can be found in the Tutorials. So that would be a good place to start to build the model.

Best regards,

Vincent

Top 10 Contributor
Posts 56

Hi Vincent

In theory you might be able to do something like that, and this is certainly in the spirt of Infer.NET. However, you cannot do it with the existing set of operators. For example (a) the current Exp operator projects to a Gamma (b) ratio operator only supports Gaussian and does not support random denominator, and (c) you would need to project the ratio to a Beta.

John

 

Top 10 Contributor
Posts 56

Vincent

Yes, thanks for pointing that out. The Bayes Point machine shows probit regression - using the IsPositive constraint on a Gaussian distributed variable. You can also use IsPositive and IsBetween to do ordinal regression.

You can do multi-class probit regression using http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Multi-class%20classification.aspx.

John

Top 100 Contributor
Posts 8


Hi John and Vincent,

in my view, one of the practical motivations for implementing VMP for logistic regression (in addition to the existing EP for probit regression) is linked to better convergence properties of VMP (possibly at a cost of under-estimating the variance). For some datasets, it may be rather awkward to handle the problems with improper messages arising for EP (see my thread on Infer.NET: improper distribution exceptions during EP inference, and a link to John's and Tom's excellent NIPS workshop presentation). Using an annealing of alpha in power EP may potentially be a possible way forward, though I can see that including it to Infer.NET and working out the details may take some time. At the meantime, could it be useful to include an arguably more stable (VMP?) solution for a GLM classifier?

Top 75 Contributor
Posts 11

Hi Ferrum and John G,

Agreed. I also encounter the error message "improper distributions exception during EP inference" when running a standard BPM on a dataset.

One quick fix that I have found useful is the following. One needs to add Gaussian noise (with small precision => high uncertainty) to the inner product <w,x> when constructing the model. The precision of the noise has to be fixed. Ideally, one would hope that this can be inferred but I have run into the same improper distributions exception when I try to set a Gamma prior on the prec. So I simply set the precision to be a fixed double, say 0.1d (sometimes larger precs don't work). Then everything seems to work fine. Any scientific explanation for this phenomenon?  Perhaps the intuition is that one needs to incorporate more uncertainty when dealing with noisy datasets.

Vincent

Top 10 Contributor
Posts 56

Tom has now written a VMP operator for logistic regression which uses the Jaakola and Jordan bound. It will be available in the next beta release (tentatively scheduled for the second week in July). Meanwhile, the code is small enough that I can paste it in here - it uses the BernoulliFromLogOdds factor which already exists in the current beta. I have renamed the operator class below to TempBernoulliFromLogOddsOp so as not to conflict with the largely unimplemented operator class which exists in the current beta. You can put this code in your assembly, and Infer.NET should pick this up, though somewhere in your assembly you will need to put the following annotation which tells Infer.NET that your assembly contains message functions:

[assembly: MicrosoftResearch.Infer.Factors.HasMessageFunctions]

Here is the code for the operator. I will put example usage code in a follow-up post.

// (C) Copyright 2009 Microsoft Research Cambridge

using System;
using System.Collections.Generic;
using System.Text;
using MicrosoftResearch.Infer.Distributions;
using MicrosoftResearch.Infer.Maths;

namespace MicrosoftResearch.Infer.Factors
{
    /// <summary>
   
/// Provides outgoing messages for <see cref="Factor.BernoulliFromLogOdds"/>, given random arguments to the function.
   
/// </summary>
   
[FactorMethod(typeof(Factor), "BernoulliFromLogOdds")]
    public static class TempBernoulliFromLogOddsOp
   
{
        /// Evidence message for VMP 
        public static double AverageLogFactor(bool sample, Gaussian logOdds)
        {
            double m,v;
            logOdds.GetMeanAndVariance(out m, out v);
            double t = Double.IsPositiveInfinity(v) ? 2.4 : Math.Sqrt(m*m+v);
            double a = Math.Tanh(t/2)/(2*t);
            double s = sample ? 1 : -1;
            double m2 = m*m+v;
            return MMath.LogisticLn(t) + (s*m-t)/2 - a/2*(m2 - t*t);
        }

        /// VMP message to 'LogOdds' 
        public static Gaussian LogOddsAverageLogarithm(bool sample, Gaussian logOdds)
        {
            double m,v;
            logOdds.GetMeanAndVariance(out m, out v);
            double t = Double.IsPositiveInfinity(v) ? 2.4 : Math.Sqrt(m*m+v);
            double a = Math.Tanh(t/2)/(2*t);
            double s = sample ? 1 : -1;
            return Gaussian.FromMeanAndPrecision(s/(2*a), a);
        }
    }
}

 

Top 10 Contributor
Posts 56

Here's the usage for the previous post. Simplest example for a single observation:

Variable<double> w = Variable.GaussianFromMeanAndPrecision(1.2, 0.4);
Variable<bool> y = Variable.BernoulliFromLogOdds(w);
InferenceEngine ie = new InferenceEngine(new VariationalMessagePassing());
y.ObservedValue = true;
Gaussian wPosterior = ie.Infer<Gaussian>(w);

If you want to compute evidence, you can put in an If block as usual:

Variable<bool> evidence = Variable.Bernoulli(0.5);
IfBlock block = Variable.If(evidence);
Variable<double> w = Variable.GaussianFromMeanAndPrecision(1.2, 0.4);
Variable<bool> y = Variable.BernoulliFromLogOdds(w);
block.CloseBlock();
y.ObservedValue = true;
Gaussian wPosterior = ie.Infer<Gaussian>(w);
Bernoulli e = ie.Infer<Bernoulli>(evidence)

Let usknow how it goes if you decide to use these.

John G

Top 100 Contributor
Posts 8

Excellent, thanks!

Top 10 Contributor
Posts 50

Yes, the Gaussian noise here is important since it encodes how much noise is expected in the dataset.  If you use zero noise, then you are implying that the classes are perfectly separable by a line.  If this is not the case, then EP will usually crash since the posterior distribution is empty.

Not Ranked
Posts 1

Hi John,

Will you make support for binomial variables with Bayesian Logistic Regression?

I would like to apply following BUGS model using Infer.Net:

model {
    for (i in 1:N)
    {
        y[ i ] ~ dbin(p[ i ], n[ i ])
        logit(p[ i ]) <- inprod(x[ i,], w[])
    }   
    for (j in 1:m)
    {
        w[ j ] ~ dnorm(0, tau[ j ])
        tau[ j ] ~ dgamma(0.001, 0.001)
    }   
}

Best regards   

Alvin

Top 75 Contributor
Posts 11

Hi John G,

This is not really an Infer.NET question but rather a question about the multiclass BPM model in the link you posted.

Am I right to say that in the multiclass probit model, there is no indeterminacy in the weights w_1, ..., w_K because we impose a VectorGaussian(0,I) prior on each weight vector? I ask this because the multiclass model does not seem to reduce to the binary classification model since there are two sets of weights in the multiclass model (when K=2) but for the BPM model in the tutorials, there is only 1 set of weights.

Thanks once again, Vincent.

Page 1 of 2 (24 items) 1 2 Next > | RSS
©2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement | Feedback