[cfe-dev] [llvm-dev] AST of whole program

Vedant Kumar via cfe-dev cfe-dev at lists.llvm.org
Tue Oct 4 21:55:43 PDT 2016


> I am trying to change the layout of fields(randomize) in a struct in a c program.

Have you taken a look at this paper?

    https://www.utdallas.edu/~zxl111930/file/DIMVA09.pdf

The authors describe their approach to data structure layout randomization in
some detail, including pros/cons of implementing the feature at the AST level.
There are some other interesting bits, like their decision to introduce an
explicit obfuscation struct attribute.

> ... there are structs that I should not touch like structs defined in libraries.

Having an explicit "reorder" attribute is one way to work around this.

> So if I randomize a struct in one compilation unit and then realize that actually, I shouldn't have randomized it when clang was working on another compilation unit, there is no way to go back and revert the layout of the struct that I already randomized in previous translation unit because it is already over.

It sounds like you need some rule that tells you whether or not it's OK to
randomize a struct, regardless of how many translation units you have already
processed.

> So what I am thinking is I should look all the translation units in AST level before they create llvm IR and decide which structs I should randomize, then randomize the structs I have decided to randomize, then let clang to create llvm IR using modified ASTs.

Let's say you take this approach, and you collate the AST's for every source
file in a project. What information do you plan on gathering that will help you
determine the right structs to reorder? Can you guarantee that your decision
procedure will never reorder a struct that is not meant to be reordered, and
will always reorder all other structs?

> I am trying to transform programs like apache.

I suspect that you'd need to manually audit that codebase and apply "reorder"
attributes to get good results. I could be wrong though :).

> Also I am not sure about one thing. Can I make sure that a struct is defined in a library or in the source code of the program by looking only one translation unit without any false flag? If I can, then there is no need for what I am asking for.

In the paper I linked to, the authors mention several other conditions under
which it's inappropriate to randomize structs.

vedant

> On Oct 4, 2016, at 6:57 PM, Anil Altinay <aaltinay101 at gmail.com> wrote:
> 
> Hi Vedant,
> 
> What kind of transformation are you interested in, and what kind of programs
> are you looking to transform?
> 
> I am trying to change the layout of fields(randomize) in a struct in a c program. I already figured out how to change the layout of fields in a struct but there are structs that I should not touch like structs defined in libraries. So if I randomize a struct in one compilation unit and then realize that actually, I shouldn't have randomized it when clang was working on another compilation unit, there is no way to go back and revert the layout of the struct that I already randomized in previous translation unit because it is already over. So what I am thinking is I should look all the translation units in AST level before they create llvm IR and decide which structs I should randomize, then randomize the structs I have decided to randomize, then let clang to create llvm IR using modified ASTs.
> 
> I am trying to transform programs like apache.
> 
> By 'AST of whole program', do you mean AST's for the source from all libraries
> linked into the program?
> 
> I am not sure if I understand your question but I will try to explain what I meant. For each translation unit, AST gets created. The problem is I can only see AST of current translation unit. I cannot see AST of next translation unit because clang works on one translation unit at a time. Maybe I should dump AST of each translation unit to the disk, decide which structs can be randomized, change the AST on the disk and start compilation from modified ASTs. But this may be so slow and I do not really know how I can do this.
> 
> Also I am not sure about one thing. Can I make sure that a struct is defined in a library or in the source code of the program by looking only one translation unit without any false flag? If I can, then there is no need for what I am asking for.
> 
> I hope that I explained what I am trying to do clearly. If you have any suggestion how I should do this, I would really appreciate hearing your opinion. Thank you very much for your quick response.
> 
> Anil 
> 
> 
> 
> 
> On Tue, Oct 4, 2016 at 5:58 PM, Vedant Kumar <vsk at apple.com> wrote:
> - llvm-dev, + cfe-dev
> 
> Hi,
> 
> What kind of transformation are you interested in, and what kind of programs
> are you looking to transform?
> 
> By 'AST of whole program', do you mean AST's for the source from all libraries
> linked into the program?
> 
> vedant
> 
> 
> > On Oct 4, 2016, at 5:50 PM, Anil Altinay via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >
> > Hello,
> >
> > I would like to do transformations on AST of a c program but I need to have access to all ASTs created for the program to do right changes. LLVM processes one translation unit at a time and because of it, I do not have access to AST of all the translation units at the same time. Do you have any suggestion how I can access all the ASTs created for a program, do analysis on the ASTs and do modifications on the ASTs?
> >
> > As a summary:
> >
> >       • I need to have access to ASTs of the program at the same time.
> >       • Do analysis on ASTs.
> >       • Modify ASTs based on my analysis and create llvm IR from modified ASTs.
> > Thank you,
> > Anil
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> 




More information about the cfe-dev mailing list