<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://community.research.microsoft.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>TechFest Live! : privacy</title><link>http://community.research.microsoft.com/blogs/techfestlive/archive/tags/privacy/default.aspx</link><description>Tags: privacy</description><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP1 (Build: 31106.3070)</generator><item><title>Privacy Integrated Queries</title><link>http://community.research.microsoft.com/blogs/techfestlive/archive/2008/03/06/privacy-integrated-queries.aspx</link><pubDate>Thu, 06 Mar 2008 21:57:00 GMT</pubDate><guid isPermaLink="false">eaca9afb-5ccf-4c08-b3f3-369c7e6f1a06:776</guid><dc:creator>robk</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://community.research.microsoft.com/blogs/techfestlive/rsscomments.aspx?PostID=776</wfw:commentRss><wfw:comment xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://community.research.microsoft.com/blogs/techfestlive/commentapi.aspx?PostID=776</wfw:comment><comments>http://community.research.microsoft.com/blogs/techfestlive/archive/2008/03/06/privacy-integrated-queries.aspx#comments</comments><description>&lt;p&gt;Data abounds in the digital age. Bits by the billions are collected on a daily basis from a variety of sources: Web services, financial programs, governmental agencies. Those who specialize in data mining and analysis could spend a hundred lifetimes sifting through such data, looking for patterns and clues that could help explain and fine-tune 21st-century life.&lt;/p&gt;
&lt;p&gt;But they can&amp;#39;t.&lt;/p&gt;
&lt;p&gt;Much of that never-ending stream of data is privacy-protected--as it should be. Nobody wants their personal information accessible to any and all. Privacy is the cornerstone of the Internet era. Without it, society would devolve into digital chaos. But those same privacy protections deny experts access to all that massive, tantalizing data.&lt;/p&gt;
&lt;p&gt;&lt;a class="" title="Frank McSherry" href="http://research.microsoft.com/users/mcsherry/"&gt;Frank McSherry&lt;/a&gt; aims to change all that. McSherry, a researcher at Microsoft Research Silicon Valley, is demonstrating, along with colleague &lt;a class="" title="Cynthia Dwork" href="http://research.microsoft.com/users/dwork/"&gt;Cynthia Dwork&lt;/a&gt;, a principal researcher at the same lab, a project called Privacy Integrated Queries,&amp;nbsp;designed to enable the mining of huge data collections while not putting individuals&amp;#39; private information at risk.&lt;/p&gt;
&lt;p&gt;&amp;quot;We&amp;#39;re looking to put together a&amp;nbsp;privacy-preserving data-mining platform,&amp;quot; McSherry says,&amp;nbsp;&amp;quot;tools that analysts can use, even without privacy training, to interact with and mine data from sensitive data sets that they wouldn&amp;#39;t otherwise have access to. &lt;/p&gt;
&lt;p&gt;&amp;quot;Cynthia Dwork and I have had a lot of prior experience with this privacy-preserving data analysis over the last few years. There&amp;#39;s a lot of really formal mathematics behind it, but every time we do something new, we start from scratch in some sense: prove theorems, write papers--this is the model for convincing people things are private. We thought it would be smart to try to factor out the common technology we&amp;#39;ve been using in each of these results and package it&amp;nbsp;in a framework that people could use to put together their own analyses.&amp;quot;&lt;/p&gt;
&lt;p&gt;One scenario in which such a platform could play a useful role relates to recent troubles in the financial sector.&lt;/p&gt;
&lt;p&gt;&amp;quot;Some folks,&amp;quot; McSherry says,&amp;nbsp;&amp;quot;are really excited about finding what went&amp;nbsp;wrong in the subprime collapse. Unfortunately, all that data is locked up--all the mortgage information, who bought what, at what rates--sealed up for privacy reasons. People can&amp;#39;t sort out where the next collapse will be and how to counteract it, and that&amp;#39;s unfortunate, that privacy is getting in the way, in some sense, of a real common-good happening.&amp;nbsp;They didn&amp;#39;t want to know who had what mortgage, but what parts of the country are most at risk.&amp;quot;&lt;/p&gt;
&lt;p&gt;Perhaps that episode would have&amp;nbsp;developed differently if data analysts had access to the Privacy Integrated &lt;br /&gt;Queries technology devised by McSherry and Dwork.&lt;/p&gt;
&lt;p&gt;&amp;quot;We put together something that looks a lot like LINQ, Language Integrated Query, a sort of SQL-style, programmatic data access,&amp;quot; McSherry explains. &amp;quot;To the user, it&amp;#39;s basically indistinguishable. But under the covers, the privacy thing is going on, instincting about what you&amp;#39;re asking for and communicating back with the data center, trying to determine if this is OK and pushing a lot of formal mathematics around, making sure that, at the end of the day, you haven&amp;#39;t compromised privacy.&lt;br /&gt;&lt;br /&gt;&amp;quot;The goal was to try to make it transparent to the users, so they didn&amp;#39;t have to worry about what weird, funny machinations underneath are going on. They could just program against it as if it were LINQ.&amp;quot;&lt;/p&gt;
&lt;p&gt;But this latest project has one significant distinction that sets it apart from LINQ.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;quot;Unlike LINQ,&amp;quot; McSherry says,&amp;nbsp;&amp;quot;you don&amp;#39;t get to just enumerate a data set. You have to stay one step removed. You explain what you&amp;#39;d like to do with the data set and how you&amp;#39;d like it to be aggregated, and the results come back to you, perturbed a little bit. You get a little bit of noise, and the noise introduces uncertainty about the answer, which turns into this formal notion of privacy.&amp;quot;&lt;/p&gt;
&lt;p&gt;Privacy Integrated Queries could prove beneficial in a number of settings. The medical field, for example, could benefit greatly if the data could be safely aggregated without disclosing&amp;nbsp;the associated personal information. Another potential usage could find takers within Microsoft.&lt;/p&gt;
&lt;p&gt;&amp;quot;We have all sorts of data on who searched for what,&amp;quot; McSherry notes. &amp;quot;It&amp;#39;s good stuff. Microsoft would love to collaborate with external researchers, people interested in Web research who don&amp;#39;t have access to the scale of data that we have, but we&amp;#39;re really concerned about privacy. That sort of stalls research, to some extent, for&amp;nbsp;Web researchers who don&amp;#39;t have anywhere to go. They can&amp;#39;t start up their own&amp;nbsp;Web-search engine. &lt;/p&gt;
&lt;p&gt;&amp;quot;With this sort of technology, we can really start the ball rolling on this type of work, work&amp;nbsp;that people just haven&amp;#39;t been able to do before.&amp;quot; &amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://community.research.microsoft.com/blogs/techfestlive/McSherry-Dwork4.jpg"&gt;&lt;img src="http://community.research.microsoft.com/blogs/techfestlive/McSherry-Dwork4.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Frank McSherry and Cynthia Dwork of Microsoft Research Silicon Valley in front of their TechFest poster on Privacy Integrated Queries.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://community.research.microsoft.com/aggbug.aspx?PostID=776" width="1" height="1"&gt;</description><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/2008/default.aspx">2008</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Research/default.aspx">Research</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/TechFest/default.aspx">TechFest</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Frank+McSherry/default.aspx">Frank McSherry</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/privacy/default.aspx">privacy</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Silicon+Valley/default.aspx">Silicon Valley</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Cynthia+Dwork/default.aspx">Cynthia Dwork</category></item></channel></rss>