[cfe-dev] Getting out body of a while Statement

mats petersson via cfe-dev cfe-dev at lists.llvm.org
Fri Nov 18 09:03:35 PST 2016


Not sure I can help on the rewriter - I only know of it's existence.

However, I'm reasonably sure you:
1. Should make absolutely sure this is worth doing, doing it by hand on a
real file that is part of or resembles the real project.
2. Need to take into account what to do with cases like

   a)
    T1 x;
    while( ... ) { T2 x; ... use x ... }

  b)
    while(....) { ... break;  ... }

   c)
     while(...) { ... continue; ... }

   d)
     while(...) { ... goto ...; ... }

   e)
     while(...) { .... while( ... ) { ... } ... }

    f)
      try { ... while () ... } catch(...) { ... }

    g)
      while { std::lock(...) ... }

These are just a few examples I can think of that will "make things
difficult" - and I'm pretty sure there are MANY others, unless your
programming style is sufficiently restrictive to forbid the majority of
those.

And of course, if you plan to do this for `for`, `do - while` and `switch`,
you'd better cover the same sort of problems there too.

--
Mats


On 18 November 2016 at 04:59, John Tan <NewSelleron at hotmail.com> wrote:

> Update on what i have done :
>
> const WhileStmt *WS = Result.Nodes.getNodeAs<clang::
> WhileStmt>("whileStmt");
> Stmt *s = WS->getBody(); <-- i am assuming this is can get the body of
> while statment
>
> As for the rewriter functionality , i am not able to print out Stmt.
>
>
> Any advices ???
>
>
> ------------------------------
> *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> on behalf of John Tan
> via cfe-dev <cfe-dev at lists.llvm.org>
> *Sent:* Friday, November 18, 2016 8:21:30 AM
> *To:* mats petersson; Clang Dev
>
> *Subject:* Re: [cfe-dev] Getting out body of a while Statement
>
> while (a>3){
>
> goto label1;
>
>
> }
>
> label1:
>
> cout << "hello" << endl;
> goto label2;
>
> label2:
>     break;
>
> This would be the final result. all the labels would be outside of the
> while loop.
>
> i know while Stmt has a getBody(). But i am unsure on how to use to it
> together with the astmatcher
>
> i use the rewrite functionality to write to the body - i use this.
>
> But now , i am stumped on how to use the rewriter functionality to get the
> body.
>
> i did this in my method thats binded to the AstMatcher for while loop :
>
> const WhileStmt *WS = Result.Nodes.getNodeAs<clang::
> WhileStmt>("whileStmt")
> stmt s1 = WS->getBody();   <-- i am not able to print out this with the
> rewriter functionlity.
>
> my astMatcher:  Matcher.addMatcher(whileStmt(
> hasDescendant(compoundStmt())).bind("whileStmt"), &HandlerForWhile);
> ------------------------------
> *From:* mats.o.petersson at googlemail.com <mats.o.petersson at googlemail.com>
> on behalf of mats petersson <mats at planetcatfish.com>
> *Sent:* Friday, November 18, 2016 2:40:18 AM
> *To:* John Tan; Clang Dev
> *Subject:* Re: [cfe-dev] Getting out body of a while Statement
>
> Please: unless there are specific reasons to do so (e.g. discussing
> personal things), always reply to the mailing list and all personal
> participants taking part in the thread. It helps other people being able to
> chime in, if they have better/different suggestions, as well as someone
> else seeing the thread understanding what the outcome was.
>
> On 17 November 2016 at 17:49, John Tan <NewSelleron at hotmail.com> wrote:
>
>> What i want is really simple.
>>
>> I just wan to replace the original content in the while loop body with
>> into a goto statement which will point to a label outside the statement,
>> reason for this my project wants to do control flow flattening so the main
>> purpose is to make reverse engineering harder.
>>
>>
>>
>> while (a > 3) {
>>
>> cout << "hello" << endl;
>>
>> }
>>
>> Will become
>>
>> while (a>3){
>>
>> goto label1;
>>
>> }
>>
>> label1:
>>
>> cout << "hello" << endl;
>>
>>
>> Surely you mean:
> while (a>3){
>
> goto label1;
>
> label2:
> }
> goto label3;
> label1:
>
> cout << "hello" << endl;
> goto label2;
> label3:
>
> And since LLVM is pretty decent at figuring out "useless jumps", I'm not
> at all sure that this will actually achieve anything useful - the "useless"
> goto will just be removed. For any sufficiently complex project, with a bit
> of inline code and a general large code-base, the compiler is pretty good
> at obfuscating the code anyway.
>
>> I am not sure how to use the ASTMatcher to get to the body of the while
>> stmt .
>>
> The ASTMatcher will give you the AST statement for the while-loop,
> WhileStmt, which has a "getBody", which gives you the statements inside the
> body - you can get the source-locaton for the first and last. Or you can
> perhaps use the Clang Rewriter functionality.
>
> However, before you do that, do check that LLVM doesn't just remove your
> gotos and turn the code into the same as you had before adding goto's - I'd
> be very surprised if it doesn't optimise that away at some stage. At least
> my small exampls:
>
> a.c:
>
> #include <stdio.h>
>
> int main()
> {
>     int a = 0;
>     while (a < 3)
>     {
>         printf("a=%d\n", a);
>         a++;
>     }
> }
>
> b.c:
>
> #include <stdio.h>
>
> int main()
> {
>     int a = 0;
>     while (a < 3)
>     {
>         goto L1;
>     L2:;
>     }
>     goto L3;
>
> L1:
>     printf("a=%d\n", a);
>     a++;
>     goto L2;
>
> L3:
>
>     return 0;
> }
>
> clang -S -O2 a.c
> clang -S -O2 b.c
>
> compiles to identical assembly-code in clang 3.8 (aside from the .file
> line which obviously shows 'a.c' and 'b.c' respectively)
>
> [And this is of course ignoring interesting effects of scoping in C++,
> which will have to be dealt with in your translator if you don't want the
> converted code to behave differently]
>
> Modern compilers aren't very easy to trick into generating different code
> just by adding goto's.
>
> --
> Mats
>
>> I can get the sourcelocation of both the start and end of the body of the
>> while stmt , but there is not method for me to extract information by
>> source location. i hope you can help with me the method and the ASTMatcher
>> needed.
>> ------------------------------
>> *From:* mats.o.petersson at googlemail.com <mats.o.petersson at googlemail.com>
>> on behalf of mats petersson <mats at planetcatfish.com>
>> *Sent:* Friday, November 18, 2016 1:42:13 AM
>> *To:* John Tan
>> *Cc:* cfe-dev at lists.llvm.org
>> *Subject:* Re: [cfe-dev] Getting out body of a while Statement
>>
>> Really depends on what you want to achieve [in the big picture, not "I
>> want a variable holding the content inside the while", but what you are
>> actually planning to do beyond getting it into a variable - do you want to
>> edit the source file to add or remove something, check that the body
>> does/doesn't do something]
>>
>> Something involving the ASTMatcher would be a starting point:
>> http://clang.llvm.org/docs/LibASTMatchersReference.html?
>>
>> If you want the actual source-code, then you'll also need to get out the
>> source location, and use sourcemanager to get the "section of source code
>> within the body into a string", but consider that you can have really
>> "interesting" code:
>>
>>     while( a > 3 )
>>     {
>>     #include "mycode.h"
>>     }
>>
>> or:
>>     while( a > 3 )
>>      #include "mycode.h"
>>
>> [where the content in mycode.h contains not just the loop body, but also
>> further code that continues AFTER the loop.]
>>
>> or:
>>
>>     #define SOME_MACRO(x) while(a < (x))
>>
>>     SOME_MACRO(3)
>>     {
>>       ...
>>     }
>>
>> or:
>>     while( a > 3 )
>>     {
>>         SOME_MACRO(foo);
>>     }
>>
>> where SOME_MACRO expands to some rather large chunk of code - and knowing
>> the "source" is not really helpful in either of these cases. And of course,
>> like the #include sample, you can have a the loop body end part way through
>> the macro, so you probably don't really want to rely on the "string
>> contains the body of the loop" if you want to do something with the content
>> of the loop that is of any importance. These are of course simple examples
>> of "unusual programming", but I guarantee that if you look at enoug code,
>> you'll find SOMETHING like that.
>>
>>
>> So, depending on what you actualy want to achieve, you may want to NOT
>> try to deal with this as files/text strings, but as AST-code.
>>
>> --
>> Mats
>>
>> On 17 November 2016 at 16:04, John Tan via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> i need help to get out the body of a while statement.
>>>
>>>
>>> While( a > 3) {
>>>
>>>
>>> cout << "hello" <<endl;   << --  I wan to copy out this line and store
>>> into a variable.
>>>
>>> }
>>>
>>>
>>> This is a example , i want to take out whats inside of the while
>>> statement, and if its possible store it into a variable so i can print the
>>> result out somewhere.
>>>
>>>
>>> Much appreciated
>>>
>>> John Tan.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161118/cb76c596/attachment.html>


More information about the cfe-dev mailing list