Microsoft Research Community
Parallel inference in Infer.NET 2.3

Hello Infernauts!!

One of the many changes we made in the recently released Infer.NET 2.3 beta was allowing parallel inference on multiple cores.  We are keen to make Infer.NET scale to ever larger datasets and supporting parallelism has always been an important part of that goal.  Here I'll show how to use multi-core parallelism to speed up inference - of course, you'll only see the speedup on a multi-core machine.

The first thing you need to do is to install the Microsoft Parallel Extensions CTP.  This is the library that Infer.NET uses to do multi-core parallelism.  We have chosen to work with the CTP for the time being - the Parallel Extensions library is planned to be part of version 4.0 of the Microsoft .NET framework and we will move to using that version at a future date. You can read more about current and forthcoming Microsoft parallelism technologies at the MSDN Parallel Computing Developer Center.

OK, so you've installed the CTP.  Now pick a model you want to parallelise. I'll use the click model which you can run from inside the Infer.NET examples browser, under 'Applications'. Make sure that inference on your model is working as you would expect.  Now you can configure your inference engine to use parallel loops by adding the following line:

engine.Compiler.UseParallelForLoops = true;

The click model uses two engines, one for training and one for test so I added this line in both cases. You must configure the inference engine before any calls to Infer() for this to have an effect. The result of this change is that certain for loops in the generated inference code will be replaced with Parallel.For loops. To see how this affects the speed of inference, you can use the handy built-in ability to see timings for various stages of inference:

engine.ShowTimings = true;

Typical results looks on my 8-core machine are shown in the table below.  For comparison, I have also shown results for running a Hidden Markov Model on a large data set.
  Time per iteration - Normal Time per iteration - Parallel For
Click Model Training 218ms 177ms
Click Model Test 22ms 30ms
Hidden Markov Model 5341ms 1352ms

These results may be unexpected.  The speed up for click model training is about 20% using parallel for loops but for click model testing, using parallel for loops actually slows things down.  This is because parallel for loops introduce additional overhead compared to ordinary for loops, and so you will only get a benefit if the loop is large and a reasonable amount of work is being done inside the loop.  In the test case, the iterations are quick and the time is dominated by the parallel overhead. This point is illustrated by the Hidden Markov Model results, where an almost 4x speedup is achieved using parallel for loops - quite a significant improvement for one line of code!!

So, in summary, parallel for loops can speed up inference, especially in larger models, and may speed it up considerably.  The exact speedup you will get depends on your hardware, model and inference algorithm.  The easiest way to determine it is just to try it out and see.

Whilst this post has discussed multi-core parallelism, it is also possible to distribute Infer.NET inference across a cluster, by dividing your model into chunks using shared variables. At the moment, you have to wire together together the chunks manually e.g. using MPI. We will be looking how to make this more automatic in a future version of Infer.NET and we'll be sure to blog about it when we do!

John W.


Posted 08-13-2009 2:55 PM by jwinn

Comments

laura wrote re: Parallel inference in Infer.NET 2.3
on 08-14-2009 5:32 AM

This is excellent!

How much does parallel inference cost in terms of memory?

Cheers,

Laura

jwinn wrote re: Parallel inference in Infer.NET 2.3
on 08-14-2009 5:59 AM

I've not done extensive tests on this, but informally it seems the memory costs are small - presumably dominated by the memory overhead of introducing additional threads in the thread pool e.g. the stack for each thread.

NeillC wrote re: Parallel inference in Infer.NET 2.3
on 10-27-2009 1:21 PM

Hi John,

I've been using the VMP algorithm to fit a GMM to a 7D data set and it is woking well :-) but quite slow.  I'm very keen to use the parallel extensions, which I hope will speed things up, but I get the following error message:

Compiling model...done.

Compilation time was 2605ms.

Compilation failed with 1 error(s)

error CS1647: An expression is too long or complex to compile near 'MicrosoftResearch.Infer.Models.User.Model_VMP.Reset()'

Do you know if I'm doing something stupid or rather being too ambitious with the model (~8000 data points in 7 dimensions)?  It works correctly if I just remove the ie.Compiler.UseParallelForLoops = true; line.

Thanks,

Neill

jwinn wrote re: Parallel inference in Infer.NET 2.3
on 10-29-2009 5:02 AM

NeillC:

I'm sorry this isn't working for you - it looks like the code generated internally by Infer.NET is defeating the C# compiler. Can you post your model code on the Infer.NET forums (or email it to infersup@microsoft.com), so that I can reproduce the problem and try to work out what is going wrong?

Thanks, John

NeillC wrote re: Parallel inference in Infer.NET 2.3
on 10-29-2009 12:10 PM

I have emailed the code (basically the GMM fitting example) and a test data file if you have a chance to check it out :-) - I hope it's not too troublesome.

Cheers, Neill

shakey wrote re: Parallel inference in Infer.NET 2.3
on 11-25-2009 12:19 AM

the Microsoft Parallel Extensions CTP link is broken, I download the System.Threading.dll file but fail to get infer.net to recognise the parallel type in it. any advice?

©2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement | Feedback