setting of the model:
number of documents in corpus:12
number of topics:3
number of words(terms) in corpus:12
for simplicity,suppose that each document is composed of 2 words
Normal 0 7.8 磅 0 2 false false false MicrosoftInternetExplorer4
The whole corpus is show in the table below ,with each line representing a document.
Original corpus
After indexing ,the whole corpus is denoted as
university test
teacher student
teacher university
university student
economy bank
economy money
stock economy
money stock
goverment policy
goverment president
goverment military
president policy
0, 3
0, 1
0, 2
1, 2
4, 7
4, 5
4, 6
5, 6
8, 11
8, 9
8, 10
9, 10
after runing the program,the consle window show:
Compile model.....complilation failed.
then a "transform chain" window shows information "can only indexed by loop variables,not index0",it seems the position where the error occurs is near(in) "two using nest" of source code
By the way,can jagged array provide a array of a array,where the length of last array is not fixed,so I can remove the limit that each docoment is composed of 2 words.
Your help is appreciated!
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
static void Main(string[] args)
{
int M = 12;//number of documents in corpus
int K = 3;//number of topics
int V = 12; //number of words(terms) in corpus
int Nm = 2;//suppose that each document is composed of 2 words
Range CorpusSize = new Range(M);
Range TopicsNum = new Range(K);
Range WordsNum = new Range(V);
Range DocSize = new Range(Nm);
double[] alpha={ 0.5, 0.5, 0.5 };
double[] beta = { 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 };
VariableArray<Vector> theta = Variable.Array<Vector>(CorpusSize);
VariableArray<Vector> phi = Variable.Array<Vector>(TopicsNum);
theta[CorpusSize] = Variable.Dirichlet(alpha).ForEach(CorpusSize);
phi[TopicsNum] = Variable.Dirichlet(beta).ForEach(TopicsNum);
VariableArray2D<int> W = Variable.Array<int>(CorpusSize, DocSize);
VariableArray2D<int> Z = Variable.Array<int>(CorpusSize, DocSize);
using (Variable.ForEach(CorpusSize))
using (Variable.ForEach(DocSize))
Z[CorpusSize, DocSize] = Variable.Discrete(theta[CorpusSize]);
W[CorpusSize, DocSize] = Variable.Discrete(phi[Z[CorpusSize, DocSize]]);
}
W = Variable.Observed(new int[,] { { 0, 3 }, { 0, 1 }, { 0, 2 }, { 1, 2 }, { 4, 7 }, { 4, 5 }, { 4, 6 }, { 5, 6 }, { 8, 11 }, { 8, 9 }, { 8, 10 }, { 9, 10 } }, CorpusSize, DocSize);
InferenceEngine engine = new InferenceEngine();
Console.WriteLine(engine.Infer(Z));
Hi,
Since W depends on certain choices for Z, you have to add a gate (Variable.Switch).
Furthermore, you have to give set the valueRange attribute to Z, so infer.net knowns over which values the gate ranges.
Use the following code and your model compiles.
Z[CorpusSize, DocSize] = Variable.Discrete(theta[CorpusSize]).Attrib(new ValueRange (TopicsNum)); using(Variable.Switch(Z[CorpusSize, DocSize])) { W[CorpusSize, DocSize] = Variable.Discrete(phi[Z[CorpusSize, DocSize]]); }
Laura
I just came across a flaw in your code.
In your example, you first create a datastructure for W and wire it to the model. then you redefine W using a new observed data structure, which is not linked to the model. Since the data is not linked, infer() get the inference results based only on the prior.
You have to define your observed variables W as such upfront.
instead of
VariableArray2D<int> W = Variable.Array<int>(CorpusSize, DocSize).Named("W");
use the following line (and omit it later on) VariableArray2D<int> W = Variable.Observed(new int[,] { { 0, 3 }, { 0, 1 }, { 0, 2 }, { 1, 2 }, { 4, 7 }, { 4, 5 }, { 4, 6 }, { 5, 6 }, { 8, 11 }, { 8, 9 }, { 8, 10 }, { 9, 10 } }, CorpusSize, DocSize);
Another thing is that you have to break symmetry, otherwise all phis will be identical.
To break symmetry slightly, create a dense Dirichlet (denseBeta). Draw K times from it using dirich.Sample(), convert it to an infer.net array and call phi.InitializeTo()
double[] denseBeta = new double[V]; for (int v = 0; v < V; v++) denseBeta[v] = 10.0; Dirichlet[] initPhi = new Dirichlet[K]; Dirichlet dirich = (new Dirichlet(denseBeta)); for (int k = 0; k < K; k++) { initPhi[k] = new Dirichlet(dirich.Sample()); } phi.InitialiseTo(Distribution<Vector>.Array(initPhi));
To answer you final question, yes, using jagged arrays documents can have different length. If you need an example, in John Guiver's post i in the Bernoulli thread (http://community.research.microsoft.com/forums/p/2779/4511.aspx#4511 ) "e" is a jagged random variable array. Note that "sRange" is a variable range depending on "uRange".
Just to summarise everything Laura has noted (many thanks Laura), including the jagged array stuff, here is a modified version of your C# code that will compile and run:static void Main(string [] args){ int K = 3; //number of topics int V = 12; //number of words(terms) in corpus // Documents of variable length int[][] docs = { new int[] { 0, 3, 4 }, new int[] { 0, 1 }, new int[] { 0, 2, 4, 5 }, new int[] { 1, 2 }, new int[] { 4, 7 }, new int[] { 4, 5 }, new int[] { 4, 6 }, new int[] { 5, 6 }, new int[] { 8, 11 }, new int[] { 8, 9 }, new int[] { 8, 10 }, new int[] { 9, 10 }}; // Put the sizes into an array int M = docs.Length; int[] sizes = new int[M]; for (int i = 0; i < M; i++) sizes[ i ] = docs[ i ].Length; // Set up the ranges Range CorpusSize = new Range(M); Range TopicsNum = new Range(K); Range WordsNum = new Range(V); VariableArray<int> docSizeVar = Variable.Observed(sizes, CorpusSize); Range DocSize = new Range(docSizeVar[CorpusSize]); double[] alpha= { 0.5, 0.5, 0.5 }; double[] beta = { 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 }; VariableArray<Vector> theta = Variable.Array<Vector>(CorpusSize); VariableArray<Vector> phi = Variable.Array<Vector>(TopicsNum); theta[CorpusSize] = Variable.Dirichlet(alpha).ForEach(CorpusSize); phi[TopicsNum] = Variable.Dirichlet(beta).ForEach(TopicsNum); // Break symmetry by initialising phi marginals Vector denseBeta = new Vector(V, 10.0); Dirichlet[] initPhi = new Dirichlet[K]; Dirichlet dirich = new Dirichlet(denseBeta); for (int k=0; k < K; k++) initPhi[k] = new Dirichlet(dirich.Sample()); phi.InitialiseTo(Distribution<Vector>.Array(initPhi)); var Z = Variable.Array(Variable.Array<int>(DocSize), CorpusSize); var W = Variable.Array(Variable.Array<int>(DocSize), CorpusSize); W.ObservedValue = docs; using (Variable.ForEach(CorpusSize)) { using (Variable.ForEach(DocSize)) { Z[CorpusSize][DocSize] = Variable.Discrete(theta[CorpusSize]).Attrib(new ValueRange(TopicsNum)); using (Variable.Switch(Z[CorpusSize][DocSize])) { W[CorpusSize][DocSize] = Variable.Discrete(phi[Z[CorpusSize][DocSize]]); } } } InferenceEngine engine = new InferenceEngine(); Console.WriteLine(engine.Infer(Z));}
thanks
whatshould be the returned value type of engine.Infer(Z) in last line? I wanna store the posterior distribution of Z in a local variable for future use. tried several types but seemed not working.
oh, it seems correct if I use a variable of DistributionArray<DistributionRefArray<Discrete, int>>
Thanks all
Although what you have is correct in this case, DistributionArray, DistributionRefArray, and other distribution array classes are not designed to be used in the API - Infer.NET may use any one of a number of classes to internally represent distribution arrays, chosing the most efficient representation for the model. However, they can all be referenced via the IDistribution<> interface.
We encourage you to use either one of the following two approaches, depending on what you want to do with the posterior. IDistribution<int[][]> ZPostAsDistribution = engine.Infer<IDistribution<int[][]>>(Z);Discrete[][] ZPostAsArray = Distribution.ToArray<Discrete[][]>(engine.Infer(Z));We are looking at possibly making the second case more succinct in a future release by just allowing Discrete[][] to be a type parameter for the Infer method.
John G.
Normal 0 21 false false false DE X-NONE X-NONE MicrosoftInternetExplorer4
Hi John,
Hiding the Ref/Struct arrays is a cool thing I wasn't yet aware of.
Unfortunately I can not make it work in F#. I tried the following, but the compiler complains "The field, constructor or member 'ToArray' is not defined. " This is particularly funny since I can select the method from the member list of the Distribution class.
let infResult = inferenceEngine.Infer<IDistribution<Beta[]>>(epsilon)
let infResultObj = inferenceEngine.Infer<obj>(epsilon)
let epsilonPostAsArray = Distribution.ToArray<Beta[]>(infResultObj)
is there anything special to this method?
I think that in F# you currently need to use Distribution< >.ToArray rather than Distribution.ToArray. This is an F# bug that has been logged - it occurs when you have a generic and non-generic version of the same class name, and the non-generic version (Distribution in our case) has a generic method (ToArray in our case)
John
I tried the following as well, still get the same error. I tried to rebuild all, just in case. Still no success.
let epsilonPostAsArray = Distribution<_>.ToArray<Beta[]>(infResultObj)
let epsilonPostAsArray = Distribution<Beta>.ToArray<Beta[]>(infResultObj)
// just in case I was referencing the wrong class
let epsilonPostAsArray = MicrosoftResearch.Infer.Distributions.Distribution<_>.ToArray<Beta[]>(infResultObj)
I find it strance that the following expression does not give compile errors.
let x = Distribution.Equals(infResult, infResultObj)
That is why I wonder what might be so special about the ToArray method.
You must have a space rather than an underscore in Distribution< >
Thanks, John!
May I know the mathematical reason for breaking symmetry? What's so bad about all phis being identical? If we supply the data, the model learns and adapts accordingly, so I am not sure why we have to break symmetry.