[cfe-dev] A howto or template for adding a transformation pass

Wed Aug 13 21:00:45 PDT 2008

On Aug 9, 2008, at 12:14 PM, Hiren Patel wrote:

> My apologies for the vague question formulation. I'll explain my  
> objective with an example.
>
> My initial source is as follows:
> unsigned int fib(unsigned int m) {
>   unsigned int f0 = 0 ;
>   unsigned int f1 = 1, f2, i ;
>   unsigned int l1 = 0;
>   if (m <= 1)  {
>     MARKSTART();
>     printf("Return\n");
>     MARKEND();
>     return m;
>   } else {
>     for (i = 2; i <=m; i++) {
>       f2 = f0 + f1;
>       f0 = f1;
>       f1 = f2;
>       for (l1 = 0; l1 <5; l1++) {
>         printf("woof\n");
>         MARKSTART();
>       }
>       MARKEND();
>       MARKSTART();
>     }
>     MARKEND();
>     return f2;
>   }
> }
>
> The MARKSTART and MARKEND represent MACROs that implement a certain  
> instruction in a particular processor. The semantics of the MARKEND  
> and MARKSTART can be simply thought of as starting and stopping  
> cycle timers for segments of code between the MARKSTART and MARKEND.  
> The resulting code after the transformation is as follows:
> unsigned int fib(unsigned int m) {
>   unsigned int f0 = 0 ;
>   unsigned int f1 = 1, f2, i ;
>   unsigned int l1 = 0;
>   if (m <= 1)  {
>     MARKSTART0("fib_seq0");
>     printf("Return\n");
>     MARKEND0("fib_seq0");
>     return m;
>   } else {
>     for (i = 2; i <=m; i++) {
>       MARKSTARTCOUNT("bb_k");
>       f2 = f0 + f1;
>       f0 = f1;
>       f1 = f2;
>       MARKENDCOUNT("bb_k");
>       for (l1 = 0; l1 <5; l1++) {
>         printf("woof\n");
>         MARKSTART2("fib_loop1");
>       }
>       MARKEND2("fib_loop1");
>       MARKSTART1("fib_loop0");
>     }
>     MARKEND1("fib_loop0");
>     return f2;
>   }
> }
>
> The number after a MARKSTART and MARKEND denotes a timer in the  
> processor and the string, a unique label identifying blocks of  
> interest used during profiling this code. We have certain rules that  
> form a block, such as, a MARKSTART must always follow by a MARKEND  
> and use of timers must be correctly nested. Notice the MARK*COUNT as  
> only one example of labeling one basic block in the source. I would  
> like to add these for every BB in the CFG as well. So the tasks here  
> are:

Hi Hiren,

My apologies for the delay in my reply.  I'll address each of your  
points inline.  In general, I think there are two views you can take  
of the source:

1) The constructed ASTs.

2) A token stream, which literally consists of the stream of tokens  
passed to/returned from the preprocessor.  A good example of this is  
in RewriteMacros.cpp in Driver.

We have accurate SourceLocation information for both, but both are  
different views of the source code.  Since macros can expand to  
arbitrary amounts of code (including code that contains control flow),  
their is a tricky mapping from macros to their expanded statements, or  
even the basic blocks in the CFG.

I assume that by default MARKSTART and MARKEND expand to nothing.   
Using the token stream approach, you could record the SourceLocations  
for the approprate macros, and then using the AST approach you can  
conceivably search for the AST Stmt* that appears closest to those  
recorded locations (this may require two passes).  This might be  
really tricky.  An easier approach might be to have both macros expand  
(when using clang) to the "annotate" attribute, which can then be  
affixed to any AST node.  Your analysis could then look for Stmt* with  
the annotation that you are interested in.

Once you have computed what new MARKSTARTXXX and MARKENDXXX text you  
want to insert, however, it is easy to add these back to the source.   
Using the rewriter, you can splice in text at any arbitrary  
SourceLocation.

> - Identify MARKSTART and MARKEND, allocate a cycle timer such that  
> there are no conflicts between the START and END of a block.

I'm not quite certain what you mean by "conflicts", but I assume you  
are referring to the nesting property of MARKSTART and MARKEND above.   
Once you determine the locations of the MARKSTART and MARKEND macros,  
determine their association to the appropriate statements, the  
allocating a cycle timer, etc., may just result in doing a walk on the  
AST, since the AST is hierarchical and represents the nested structure  
of the code.

>
> - Identify MARKSTART and MARKEND, add a unique label used in  
> profiling. The labels much match the START and END that form a block.
> - For every basic block, add MARKSTARTCOUNT and MARKENDCOUNT  
> respectively with a unique BB label.

Note that Basic Blocks in the CFG include lots of control-flow  
constructs: if statements, loops, ternary operators (?), short circuit  
evaluation (&&, ||), gotos, etc.  If you just want to profile things  
like loops, you might not even need to go to the CFG at all unless you  
want to support gotos.  With gotos, you then have to identify the loop  
structure yourself.  I think we have analyses like this at the LLVM IR  
level, but nothing like that yet for source-level analysis.

Right now the CFG also has some oddities with DeclStmts.  If a Decl  
represents a multiple variable declaration (e.g., int x, y, z), in the  
CFG the single DeclStmt in the AST is transformed into three DeclStmts  
to explicitly represent their control-flow dependency.  These new  
DeclStmts don't appear in the original AST, but they should have the  
correct sourcelocation information.  This solution is likely to change  
soon.

That said, each basic block is uniquely numbered.  I don't know all  
the details of your analysis, but I think the hardest part of your  
task will be identifying the locations of the original macros and  
mapping those to the nearest AST Stmt*.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080813/1e6865d5/attachment.html>