<div dir="ltr"><div>Vedant and Mats,</div><span style="font-size:12.8px"><b><div><span style="font-size:12.8px"><b><br></b></span></div>Have you taken a look at this paper?</b></span><br><div><span style="font-size:12.8px"><b><br></b></span></div><div><span style="font-size:12.8px">I looked at that paper and here are the cons the paper points out: </span></div><div><span style="font-size:12.8px"><br></span></div><div><span><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">If a data structure is used in network communication, the communicating parties may not understand each other if the data structure is randomized.</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">If a data structure definition is public (e.g., defined in shared library stdio.h), it cannot be randomized.</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">There is a special case in GNU C that allows zero-length arrays to be the last element of a structure (a zero-length array is actually the header of a variable-length object). If a zero-length array is declared as the last element in a struct, that element cannot be randomized, otherwise, it cannot pass GCC syntax checking.</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">A programmer may directly use the data offset to access some fields. (This is particularly true in programs which mix assembly and C code.)</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">To initialize the value of a structure, the programmer uses the order declared to initialize the structure. These fields cannot be randomized, as the program may crash.</span></p></li></ul><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">The programmer needs to decide which data structure is randomizable in the program by inserting new keyword </span><span><span style="font-size:14.6667px;font-family:arial;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"> “__obfuscate__”.</span></span></li></ul><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap">Actually, there is another paper that solves some of the cons. Here is the link: <a href="http://link.springer.com/chapter/10.1007%2F978-3-642-18178-8_16">http://link.springer.com/chapter/10.1007%2F978-3-642-18178-8_16</a></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap">In this paper, three of the cons explained above solved. They are as follows:</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap"><br></span></font></div><div><span><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">To initialize the value of a structure, the programmer uses the order declared to initialize the structure. These fields cannot be randomized, as the program may crash.</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">There is a special case in GNU C that allows zero-length arrays to be the last element of a structure (a zero-length array is actually the header of a variable-length object). If a zero-length array is declared as the last element in a struct, that element cannot be randomized, otherwise, it cannot pass GCC syntax checking. → solved by keeping zero-length array in the same position.</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:14.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><span style="font-size:14.6667px;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">The programmer needs to decide which data structure is randomizable in the program by inserting a new keyword.</span></li></ul><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap">There is also another paper that solves some of the other cons by doing randomization dynamically.</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap">My aim is for now implementing frontend plugin or something similar to handle these three cons in the clang.</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap"><br></span></font></div><div><span style="font-size:12.8px"><b>Having an explicit "reorder" attribute is one way to work around this.</b></span><font color="#000000" face="arial"><span style="font-size:14.6667px;white-space:pre-wrap"><br></span></font></div><div><span style="font-size:12.8px"><b><br></b></span></div><div><span style="font-size:12.8px">I did not know that there is such a thing. I googled and found out that it is in GCC and it is not supported anymore. Even it is supported I cannot use it because I should not modify the source code myself and I need to do this in clang. I couldn't find any info whether it is supported by clang and if you have any info, I would be happy to hear that.</span></div><div><span style="font-size:12.8px"><br></span></div><div><b><span style="font-size:12.8px">It sounds like you need some rule that tells you whether or not it's OK to</span><br style="font-size:12.8px"><span style="font-size:12.8px">randomize a struct, regardless of how many translation units you have already</span><br style="font-size:12.8px"><span style="font-size:12.8px">processed.</span></b><span style="font-size:12.8px"><br></span></div><div><b><span style="font-size:12.8px"><br></span></b></div><div>My plan is as follows:</div><div>1. To understand whether a function is library function or not, I will check if the function has only declaration or declaration + definition.</div><div>2. If the function has only declaration, in other words; it is a library function, I will not randomize structs that are passed as parameters or if it returns a pointer to a struct.</div><div>3. I will randomize the rest by handling zero-length arrays and also initialization of struct by values.</div><div><br></div><div>I am just not sure whether my plan will always work to eliminate the 3 cons if I do randomization like above in one translation unit at a time, That's why I wanted to see ASTs of all translation units at the same time. What do you think about it? Do you have any example that will show it will not work if I do it in one translation unit at a time? If you have any example, what is your suggestion, how should I proceed?</div><div><br></div><div>I did this randomization as an llvm pass but only randomized the same size fields with each other. If I do randomize all the fields with each other, the program crashes because of align info in store and load instructions.</div><div><br></div><div>Anil</div><div><br></div><div> <br></div></span></div></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 5, 2016 at 2:54 AM, mats petersson <span dir="ltr"><<a href="mailto:mats@planetcatfish.com" target="_blank">mats@planetcatfish.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>There are plenty of "not a library" places where re-ordering/changing the data structure will cause trouble. For example storing data in binary form in a file (and plenty of simple applications just write the struct data straight to a file, so order is important) - rebuilding the app, or having multiple applications that are built using the same header. <br><br></div>Applications that share content using shared memory or memory mapped files would be another case where re-ordering the data in any unpredictable way will be disastrous. (I've written applications where there are several instances of the actual application, and then a separate status application, where the actual application is storing "what I'm doing right now" in shared memory, and the status application reads a snapshot every second or so, displaying any changes in what's going on in each of the applications)<br><br></div><div>So for sure, this sort of thing needs to be done in a way that can either be predictable over two different binaries, and/or be possible to turn off for certain data structures.<br><br>--<br></div><div>Mats<br></div><div><br></div><br></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On 5 October 2016 at 05:55, Vedant Kumar via cfe-dev <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><span>> I am trying to change the layout of fields(randomize) in a struct in a c program.<br>
<br>
</span>Have you taken a look at this paper?<br>
<br>
<a href="https://www.utdallas.edu/~zxl111930/file/DIMVA09.pdf" rel="noreferrer" target="_blank">https://www.utdallas.edu/~zxl1<wbr>11930/file/DIMVA09.pdf</a><br>
<br>
The authors describe their approach to data structure layout randomization in<br>
some detail, including pros/cons of implementing the feature at the AST level.<br>
There are some other interesting bits, like their decision to introduce an<br>
explicit obfuscation struct attribute.<br>
<br>
> ... there are structs that I should not touch like structs defined in libraries.<br>
<br>
Having an explicit "reorder" attribute is one way to work around this.<br>
<span><br>
> So if I randomize a struct in one compilation unit and then realize that actually, I shouldn't have randomized it when clang was working on another compilation unit, there is no way to go back and revert the layout of the struct that I already randomized in previous translation unit because it is already over.<br>
<br>
</span>It sounds like you need some rule that tells you whether or not it's OK to<br>
randomize a struct, regardless of how many translation units you have already<br>
processed.<br>
<span><br>
> So what I am thinking is I should look all the translation units in AST level before they create llvm IR and decide which structs I should randomize, then randomize the structs I have decided to randomize, then let clang to create llvm IR using modified ASTs.<br>
<br>
</span>Let's say you take this approach, and you collate the AST's for every source<br>
file in a project. What information do you plan on gathering that will help you<br>
determine the right structs to reorder? Can you guarantee that your decision<br>
procedure will never reorder a struct that is not meant to be reordered, and<br>
will always reorder all other structs?<br>
<span><br>
> I am trying to transform programs like apache.<br>
<br>
</span>I suspect that you'd need to manually audit that codebase and apply "reorder"<br>
attributes to get good results. I could be wrong though :).<br>
<span><br>
> Also I am not sure about one thing. Can I make sure that a struct is defined in a library or in the source code of the program by looking only one translation unit without any false flag? If I can, then there is no need for what I am asking for.<br>
<br>
</span>In the paper I linked to, the authors mention several other conditions under<br>
which it's inappropriate to randomize structs.<br>
<span class="m_-7240220751659244643HOEnZb"><font color="#888888"><br>
vedant<br>
</font></span><div class="m_-7240220751659244643HOEnZb"><div class="m_-7240220751659244643h5"><br>
> On Oct 4, 2016, at 6:57 PM, Anil Altinay <<a href="mailto:aaltinay101@gmail.com" target="_blank">aaltinay101@gmail.com</a>> wrote:<br>
><br>
> Hi Vedant,<br>
><br>
> What kind of transformation are you interested in, and what kind of programs<br>
> are you looking to transform?<br>
><br>
> I am trying to change the layout of fields(randomize) in a struct in a c program. I already figured out how to change the layout of fields in a struct but there are structs that I should not touch like structs defined in libraries. So if I randomize a struct in one compilation unit and then realize that actually, I shouldn't have randomized it when clang was working on another compilation unit, there is no way to go back and revert the layout of the struct that I already randomized in previous translation unit because it is already over. So what I am thinking is I should look all the translation units in AST level before they create llvm IR and decide which structs I should randomize, then randomize the structs I have decided to randomize, then let clang to create llvm IR using modified ASTs.<br>
><br>
> I am trying to transform programs like apache.<br>
><br>
> By 'AST of whole program', do you mean AST's for the source from all libraries<br>
> linked into the program?<br>
><br>
> I am not sure if I understand your question but I will try to explain what I meant. For each translation unit, AST gets created. The problem is I can only see AST of current translation unit. I cannot see AST of next translation unit because clang works on one translation unit at a time. Maybe I should dump AST of each translation unit to the disk, decide which structs can be randomized, change the AST on the disk and start compilation from modified ASTs. But this may be so slow and I do not really know how I can do this.<br>
><br>
> Also I am not sure about one thing. Can I make sure that a struct is defined in a library or in the source code of the program by looking only one translation unit without any false flag? If I can, then there is no need for what I am asking for.<br>
><br>
> I hope that I explained what I am trying to do clearly. If you have any suggestion how I should do this, I would really appreciate hearing your opinion. Thank you very much for your quick response.<br>
><br>
> Anil<br>
><br>
><br>
><br>
><br>
> On Tue, Oct 4, 2016 at 5:58 PM, Vedant Kumar <<a href="mailto:vsk@apple.com" target="_blank">vsk@apple.com</a>> wrote:<br>
> - llvm-dev, + cfe-dev<br>
><br>
> Hi,<br>
><br>
> What kind of transformation are you interested in, and what kind of programs<br>
> are you looking to transform?<br>
><br>
> By 'AST of whole program', do you mean AST's for the source from all libraries<br>
> linked into the program?<br>
><br>
> vedant<br>
><br>
><br>
> > On Oct 4, 2016, at 5:50 PM, Anil Altinay via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
> ><br>
> > Hello,<br>
> ><br>
> > I would like to do transformations on AST of a c program but I need to have access to all ASTs created for the program to do right changes. LLVM processes one translation unit at a time and because of it, I do not have access to AST of all the translation units at the same time. Do you have any suggestion how I can access all the ASTs created for a program, do analysis on the ASTs and do modifications on the ASTs?<br>
> ><br>
> > As a summary:<br>
> ><br>
> > • I need to have access to ASTs of the program at the same time.<br>
> > • Do analysis on ASTs.<br>
> > • Modify ASTs based on my analysis and create llvm IR from modified ASTs.<br>
> > Thank you,<br>
> > Anil<br>
> > ______________________________<wbr>_________________<br>
> > LLVM Developers mailing list<br>
> > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
><br>
><br>
<br>
</div></div></div></div><div class="m_-7240220751659244643HOEnZb"><div class="m_-7240220751659244643h5">______________________________<wbr>_________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div>