
Moises Goldszmidt (above), principal researcher at Microsoft Research Silicon Valley, is showing a pair of demos, in conjunction with lab colleague Mihai Budiu, that examines performance in data centers.
"The challenge," Goldszmidt says, "is: How do I summarize thousands of machines and hundreds of metrics and find the key elements over that huge space that's giving us surprises, such that I can let it retrieve that fingerprint? How do I do that automatically?"
The demo is called Predicting Problems in the Data Center.
"We are using very sophisticated machine-learning techniques," Goldszmidt states, "that build automated models that are able to extract the main characteristics of each one of these crises."
The value of such work is readily apparent.
"Eighty percent of the time, we're predicting one hour in advance a set of actions we need to do to mitigate a problem," he says, resulting in "less downtime, less latency for our clients using our services. Our services are more efficient to run, because we don't have to have that many people look at the problem."
The second demo in the booth is named Profiling the Performance of Distributed Systems. It features a colorful analytics engine that enables the monitoring of vast data centers.
"Once you have a large cluster that you can run your services and applications on, it's very hard to understand, if something goes wrong, what's wrong," explains Budiu (left). "It could be a hardware problem. It could be your application has partitioned the data wrongly. It could be something in between, such as the network being down.
"This is a tool which pulls a lot of metrics from the machines in your cluster and allows you to easily visualize the data and find correlations in the data. You can assign colors to metrics and drag and drop the colors into other windows to see how the metrics correlate with other metrics.You can assign colors according to how many CPU cycles are utilized and immediately drop the color to see where the high CPU cycles are being used, in which machine."
In both demos, Goldszmidt declares, the objective is the same.
"Reduce the time to understand what the heck is going on," he says. "Build a better service. That's the final goal."
Posted
02-24-2009 12:50 PM
by
robk