[cfe-dev] Getting out body of a while Statement

John Tan via cfe-dev cfe-dev at lists.llvm.org
Thu Nov 17 20:59:30 PST 2016


Update on what i have done :

const WhileStmt *WS = Result.Nodes.getNodeAs<clang::WhileStmt>("whileStmt");
Stmt *s = WS->getBody(); <-- i am assuming this is can get the body of while statment


As for the rewriter functionality , i am not able to print out Stmt.


Any advices ???


________________________________
From: cfe-dev <cfe-dev-bounces at lists.llvm.org> on behalf of John Tan via cfe-dev <cfe-dev at lists.llvm.org>
Sent: Friday, November 18, 2016 8:21:30 AM
To: mats petersson; Clang Dev
Subject: Re: [cfe-dev] Getting out body of a while Statement

while (a>3){

goto label1;


}

label1:

cout << "hello" << endl;
goto label2;

label2:
    break;

This would be the final result. all the labels would be outside of the while loop.

i know while Stmt has a getBody(). But i am unsure on how to use to it together with the astmatcher

i use the rewrite functionality to write to the body - i use this.

But now , i am stumped on how to use the rewriter functionality to get the body.

i did this in my method thats binded to the AstMatcher for while loop :

const WhileStmt *WS = Result.Nodes.getNodeAs<clang::WhileStmt>("whileStmt")
stmt s1 = WS->getBody();   <-- i am not able to print out this with the rewriter functionlity.

my astMatcher:  Matcher.addMatcher(whileStmt(hasDescendant(compoundStmt())).bind("whileStmt"), &HandlerForWhile);
________________________________
From: mats.o.petersson at googlemail.com <mats.o.petersson at googlemail.com> on behalf of mats petersson <mats at planetcatfish.com>
Sent: Friday, November 18, 2016 2:40:18 AM
To: John Tan; Clang Dev
Subject: Re: [cfe-dev] Getting out body of a while Statement

Please: unless there are specific reasons to do so (e.g. discussing personal things), always reply to the mailing list and all personal participants taking part in the thread. It helps other people being able to chime in, if they have better/different suggestions, as well as someone else seeing the thread understanding what the outcome was.

On 17 November 2016 at 17:49, John Tan <NewSelleron at hotmail.com<mailto:NewSelleron at hotmail.com>> wrote:

What i want is really simple.

I just wan to replace the original content in the while loop body with into a goto statement which will point to a label outside the statement, reason for this my project wants to do control flow flattening so the main purpose is to make reverse engineering harder.


while (a > 3) {

cout << "hello" << endl;

}

Will become

while (a>3){

goto label1;

}

label1:

cout << "hello" << endl;



Surely you mean:
while (a>3){

goto label1;

label2:
}
goto label3;
label1:

cout << "hello" << endl;
goto label2;
label3:

And since LLVM is pretty decent at figuring out "useless jumps", I'm not at all sure that this will actually achieve anything useful - the "useless" goto will just be removed. For any sufficiently complex project, with a bit of inline code and a general large code-base, the compiler is pretty good at obfuscating the code anyway.

I am not sure how to use the ASTMatcher to get to the body of the while stmt .

The ASTMatcher will give you the AST statement for the while-loop, WhileStmt, which has a "getBody", which gives you the statements inside the body - you can get the source-locaton for the first and last. Or you can perhaps use the Clang Rewriter functionality.

However, before you do that, do check that LLVM doesn't just remove your gotos and turn the code into the same as you had before adding goto's - I'd be very surprised if it doesn't optimise that away at some stage. At least my small exampls:

a.c:

#include <stdio.h>

int main()
{
    int a = 0;
    while (a < 3)
    {
        printf("a=%d\n", a);
        a++;
    }
}

b.c:

#include <stdio.h>

int main()
{
    int a = 0;
    while (a < 3)
    {
        goto L1;
    L2:;
    }
    goto L3;

L1:
    printf("a=%d\n", a);
    a++;
    goto L2;

L3:

    return 0;
}

clang -S -O2 a.c
clang -S -O2 b.c

compiles to identical assembly-code in clang 3.8 (aside from the .file line which obviously shows 'a.c' and 'b.c' respectively)

[And this is of course ignoring interesting effects of scoping in C++, which will have to be dealt with in your translator if you don't want the converted code to behave differently]

Modern compilers aren't very easy to trick into generating different code just by adding goto's.

--
Mats

I can get the sourcelocation of both the start and end of the body of the while stmt , but there is not method for me to extract information by source location. i hope you can help with me the method and the ASTMatcher needed.

________________________________
From: mats.o.petersson at googlemail.com<mailto:mats.o.petersson at googlemail.com> <mats.o.petersson at googlemail.com<mailto:mats.o.petersson at googlemail.com>> on behalf of mats petersson <mats at planetcatfish.com<mailto:mats at planetcatfish.com>>
Sent: Friday, November 18, 2016 1:42:13 AM
To: John Tan
Cc: cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [cfe-dev] Getting out body of a while Statement

Really depends on what you want to achieve [in the big picture, not "I want a variable holding the content inside the while", but what you are actually planning to do beyond getting it into a variable - do you want to edit the source file to add or remove something, check that the body does/doesn't do something]

Something involving the ASTMatcher would be a starting point:
http://clang.llvm.org/docs/LibASTMatchersReference.html?

If you want the actual source-code, then you'll also need to get out the source location, and use sourcemanager to get the "section of source code within the body into a string", but consider that you can have really "interesting" code:

    while( a > 3 )
    {
    #include "mycode.h"
    }

or:
    while( a > 3 )
     #include "mycode.h"

[where the content in mycode.h contains not just the loop body, but also further code that continues AFTER the loop.]

or:

    #define SOME_MACRO(x) while(a < (x))

    SOME_MACRO(3)
    {
      ...
    }

or:
    while( a > 3 )
    {
        SOME_MACRO(foo);
    }

where SOME_MACRO expands to some rather large chunk of code - and knowing the "source" is not really helpful in either of these cases. And of course, like the #include sample, you can have a the loop body end part way through the macro, so you probably don't really want to rely on the "string contains the body of the loop" if you want to do something with the content of the loop that is of any importance. These are of course simple examples of "unusual programming", but I guarantee that if you look at enoug code, you'll find SOMETHING like that.


So, depending on what you actualy want to achieve, you may want to NOT try to deal with this as files/text strings, but as AST-code.

--
Mats

On 17 November 2016 at 16:04, John Tan via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

i need help to get out the body of a while statement.


While( a > 3) {


cout << "hello" <<endl;   << --  I wan to copy out this line and store into a variable.

}


This is a example , i want to take out whats inside of the while statement, and if its possible store it into a variable so i can print the result out somewhere.


Much appreciated

John Tan.

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161118/666464e0/attachment.html>


More information about the cfe-dev mailing list