[cfe-dev] A howto or template for adding a transformation pass
Ted Kremenek
kremenek at apple.com
Wed Aug 13 21:00:45 PDT 2008
On Aug 9, 2008, at 12:14 PM, Hiren Patel wrote:
> My apologies for the vague question formulation. I'll explain my
> objective with an example.
>
> My initial source is as follows:
> unsigned int fib(unsigned int m) {
> unsigned int f0 = 0 ;
> unsigned int f1 = 1, f2, i ;
> unsigned int l1 = 0;
> if (m <= 1) {
> MARKSTART();
> printf("Return\n");
> MARKEND();
> return m;
> } else {
> for (i = 2; i <=m; i++) {
> f2 = f0 + f1;
> f0 = f1;
> f1 = f2;
> for (l1 = 0; l1 <5; l1++) {
> printf("woof\n");
> MARKSTART();
> }
> MARKEND();
> MARKSTART();
> }
> MARKEND();
> return f2;
> }
> }
>
> The MARKSTART and MARKEND represent MACROs that implement a certain
> instruction in a particular processor. The semantics of the MARKEND
> and MARKSTART can be simply thought of as starting and stopping
> cycle timers for segments of code between the MARKSTART and MARKEND.
> The resulting code after the transformation is as follows:
> unsigned int fib(unsigned int m) {
> unsigned int f0 = 0 ;
> unsigned int f1 = 1, f2, i ;
> unsigned int l1 = 0;
> if (m <= 1) {
> MARKSTART0("fib_seq0");
> printf("Return\n");
> MARKEND0("fib_seq0");
> return m;
> } else {
> for (i = 2; i <=m; i++) {
> MARKSTARTCOUNT("bb_k");
> f2 = f0 + f1;
> f0 = f1;
> f1 = f2;
> MARKENDCOUNT("bb_k");
> for (l1 = 0; l1 <5; l1++) {
> printf("woof\n");
> MARKSTART2("fib_loop1");
> }
> MARKEND2("fib_loop1");
> MARKSTART1("fib_loop0");
> }
> MARKEND1("fib_loop0");
> return f2;
> }
> }
>
> The number after a MARKSTART and MARKEND denotes a timer in the
> processor and the string, a unique label identifying blocks of
> interest used during profiling this code. We have certain rules that
> form a block, such as, a MARKSTART must always follow by a MARKEND
> and use of timers must be correctly nested. Notice the MARK*COUNT as
> only one example of labeling one basic block in the source. I would
> like to add these for every BB in the CFG as well. So the tasks here
> are:
Hi Hiren,
My apologies for the delay in my reply. I'll address each of your
points inline. In general, I think there are two views you can take
of the source:
1) The constructed ASTs.
2) A token stream, which literally consists of the stream of tokens
passed to/returned from the preprocessor. A good example of this is
in RewriteMacros.cpp in Driver.
We have accurate SourceLocation information for both, but both are
different views of the source code. Since macros can expand to
arbitrary amounts of code (including code that contains control flow),
their is a tricky mapping from macros to their expanded statements, or
even the basic blocks in the CFG.
I assume that by default MARKSTART and MARKEND expand to nothing.
Using the token stream approach, you could record the SourceLocations
for the approprate macros, and then using the AST approach you can
conceivably search for the AST Stmt* that appears closest to those
recorded locations (this may require two passes). This might be
really tricky. An easier approach might be to have both macros expand
(when using clang) to the "annotate" attribute, which can then be
affixed to any AST node. Your analysis could then look for Stmt* with
the annotation that you are interested in.
Once you have computed what new MARKSTARTXXX and MARKENDXXX text you
want to insert, however, it is easy to add these back to the source.
Using the rewriter, you can splice in text at any arbitrary
SourceLocation.
> - Identify MARKSTART and MARKEND, allocate a cycle timer such that
> there are no conflicts between the START and END of a block.
I'm not quite certain what you mean by "conflicts", but I assume you
are referring to the nesting property of MARKSTART and MARKEND above.
Once you determine the locations of the MARKSTART and MARKEND macros,
determine their association to the appropriate statements, the
allocating a cycle timer, etc., may just result in doing a walk on the
AST, since the AST is hierarchical and represents the nested structure
of the code.
>
> - Identify MARKSTART and MARKEND, add a unique label used in
> profiling. The labels much match the START and END that form a block.
> - For every basic block, add MARKSTARTCOUNT and MARKENDCOUNT
> respectively with a unique BB label.
Note that Basic Blocks in the CFG include lots of control-flow
constructs: if statements, loops, ternary operators (?), short circuit
evaluation (&&, ||), gotos, etc. If you just want to profile things
like loops, you might not even need to go to the CFG at all unless you
want to support gotos. With gotos, you then have to identify the loop
structure yourself. I think we have analyses like this at the LLVM IR
level, but nothing like that yet for source-level analysis.
Right now the CFG also has some oddities with DeclStmts. If a Decl
represents a multiple variable declaration (e.g., int x, y, z), in the
CFG the single DeclStmt in the AST is transformed into three DeclStmts
to explicitly represent their control-flow dependency. These new
DeclStmts don't appear in the original AST, but they should have the
correct sourcelocation information. This solution is likely to change
soon.
That said, each basic block is uniquely numbered. I don't know all
the details of your analysis, but I think the hardest part of your
task will be identifying the locations of the original macros and
mapping those to the nearest AST Stmt*.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080813/1e6865d5/attachment.html>
More information about the cfe-dev
mailing list