<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://community.research.microsoft.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>TechFest Live! : Aditya Nori</title><link>http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Aditya+Nori/default.aspx</link><description>Tags: Aditya Nori</description><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP1 (Build: 31106.3070)</generator><item><title>Specification Inference for Security</title><link>http://community.research.microsoft.com/blogs/techfestlive/archive/2009/02/26/specification-inference-for-security.aspx</link><pubDate>Fri, 27 Feb 2009 01:04:00 GMT</pubDate><guid isPermaLink="false">eaca9afb-5ccf-4c08-b3f3-369c7e6f1a06:4711</guid><dc:creator>robk</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://community.research.microsoft.com/blogs/techfestlive/rsscomments.aspx?PostID=4711</wfw:commentRss><wfw:comment xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://community.research.microsoft.com/blogs/techfestlive/commentapi.aspx?PostID=4711</wfw:comment><comments>http://community.research.microsoft.com/blogs/techfestlive/archive/2009/02/26/specification-inference-for-security.aspx#comments</comments><description>&lt;p&gt;&lt;a href="http://research.microsoft.com/en-us/people/adityan/"&gt;Aditya Nori&lt;/a&gt;, a researcher in the &lt;a href="http://research.microsoft.com/en-us/groups/rse/"&gt;Rigorous Software Engineering&lt;/a&gt; team at &lt;a href="http://research.microsoft.com/en-us/labs/india/"&gt;Microsoft Research India&lt;/a&gt;, just gave me a brief overview of his demo, entitled Specification Inference for Security, and as he made repeated references to the poster in his TechFest booth, I though it would be instructive to share:&lt;/p&gt;
&lt;p&gt;&lt;img border="0" src="http://www.microsoft.com/presspass/events/msrtechfest/Posters/id121_15x20.jpg" alt="Specification Inference for Security" style="max-width:550px;border:0;float:left;" /&gt;&lt;/p&gt;
&lt;p&gt;A higher-resolution version is &lt;a href="http://www.microsoft.com/presspass/events/msrtechfest/Posters/id121_15x20.jpg"&gt;available&lt;/a&gt; that enables you to enlarge the poster to read through everything, but the key to the project--a new algorithm that automatically infers explicit information-flow security specifications from program code--is located under Specification on the left-hand side of the poster, which defines the classification of nodes in a data-flow graph of program code as &amp;quot;sources,&amp;quot; &amp;quot;sinks,&amp;quot; or &amp;quot;sanitizers.&amp;quot; A source is a node that returns tainted, bad data. A sink receives that bad data, and a sanitizer cleans the data, so that even if it receives tainted data, it does not pass it along. It&amp;#39;s a case of garbage in/no garbage out.&lt;/p&gt;
&lt;p&gt;&amp;quot;This project is about improving the quality of existing static-analysis tools for security,&amp;quot; Nori explains. &amp;quot;Most automated tools for security rely on specifications, because they really need to know what they&amp;#39;re searching for. Our job here is to go through the programs and automatically infer specifications in the program.&amp;quot;&lt;/p&gt;
&lt;p&gt;He points to the &amp;quot;Information flow vulnerabilities&amp;quot; portion of the poster and to the node named ReadData1.&lt;/p&gt;
&lt;p&gt;&amp;quot;If you look at this code fragment,&amp;quot; Nori&amp;nbsp;continues,&amp;nbsp;&amp;quot;here&amp;#39;s a data-flow graph of the response to this code fragment. Information flows from this method or function call into this function call [Prop1]. If I just went though this graph and asked if there was something wrong, if there&amp;#39;s a bug over here, it&amp;#39;s hard to say, because you need more context. &lt;/p&gt;
&lt;p&gt;&amp;quot;That&amp;#39;s exactly what a static-analysis tool is going to say: You need to provide more information, and you need to say what the problem is. On the other hand, if I consider this a source, a producer of taint,&amp;nbsp;and that one a sink, a consumer of taint, then the static-analysis tool will be able to say that the first part&amp;nbsp;is a bad part.&amp;nbsp;There could be a malicious user sending some data that screws up your database.&amp;nbsp;If this [Cleanse] was a cleanser, which actually checks that whatever is being passed on to the database is safe data, then this part is OK.&amp;nbsp;What the static-analysis tool needs to know, in addition to your program, is the role of every function in your program: Here&amp;#39;s a function of source, here&amp;#39;s a function of sink, here&amp;#39;s a function of sanitizer.&amp;quot;&lt;/p&gt;
&lt;p&gt;In the real world of software development, though, that&amp;#39;s an improbable scenario.&lt;/p&gt;
&lt;p&gt;&amp;quot;It&amp;#39;s really&amp;nbsp;unreasonable to assume that if somebody is presented with 1 million lines of code, they&amp;#39;re actually, mindfully going to go through the code and annotate every method as a source, sink, or sanitizer,&amp;quot; Nori stipulates. &amp;quot;That&amp;#39;s unacceptable. No developer&amp;nbsp;is going to do it. So what we do is go through the code and automatically analyze the code and annotate every function as a source, sink, or sanitizer.&amp;quot;&lt;/p&gt;
&lt;p&gt;He turns to the Architecture section of the poster.&lt;/p&gt;
&lt;p&gt;&amp;quot;Here&amp;#39;s a high-level overview,&amp;quot; he says. &amp;quot;We take the program, do static analysis, and convert it into a&amp;nbsp;data-flow graph. The data-flow graph includes a bunch of constraints. For technical reasons, it&amp;#39;s helpful to look upon these constraints as probabilistic constraints We take the data-flow graph, convert it into this probabilistic model, and feed that to a constraint solver. The solution to the set of constraints precisely tells us&amp;nbsp;which methods in our program correspond to sources, sinks, and sanitizers.&amp;quot;&lt;/p&gt;
&lt;p&gt;And in a real-life test, the technique worked wonders.&lt;/p&gt;
&lt;p&gt;&amp;quot;We ran our tool on 10 critical Microsoft business applications,&amp;quot; Nori reports,&amp;nbsp;&amp;quot;and we discovered 67 new sources, 25 new sanitizers, and 75 new sinks. The next step for us was to assess the quality of these specifications.&amp;nbsp;Did these specifications really improve the quality of an existing&amp;nbsp;static-analysis tool? We took a static-analysis tool that is being developer by Microsoft for security and ran that tool on these applications. The tool discovered 89 vulnerabilities, of which 20 were false positives. Then we ran the tool again with our new specifications, and we discovered 335 vulnerabilities. We were very excited about that. Another nice thing about our specifications is that they eliminated 13 of the 20 false positives from this set.&amp;quot;&lt;/p&gt;
&lt;p&gt;Such numbers grab attention within Microsoft, and there is a strong possibility that the specification-inference algorithm might&amp;nbsp;be included in an upcoming product release. Successful technology transfer makes researchers smile, but Nori knows there is much left to be done.&lt;/p&gt;
&lt;p&gt;&amp;quot;My collaborators believe this is a new way of analyzing programs, combining program analysis with statistical analysis,&amp;quot; he says. &amp;quot;We are a long ways from applying this to other domains and program analysis. That&amp;#39;s the future of this project. We look at this project as the starting point for combining program analysis and statistical analysis.&amp;quot;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://community.research.microsoft.com/aggbug.aspx?PostID=4711" width="1" height="1"&gt;</description><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Research/default.aspx">Research</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/TechFest/default.aspx">TechFest</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/2009/default.aspx">2009</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/infer/default.aspx">infer</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/security/default.aspx">security</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/code/default.aspx">code</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/India/default.aspx">India</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Aditya+Nori/default.aspx">Aditya Nori</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/specification/default.aspx">specification</category><category domain="http://community.research.microsoft.com/blogs/techfestlive/archive/tags/Rigorous+Software+Engineering/default.aspx">Rigorous Software Engineering</category></item></channel></rss>