[cfe-dev] Macro GURUs, I need advice: Is new class "MacroStmt" necessary?
ja_comp
ja_comp at yahoo.com
Wed Aug 29 02:19:47 PDT 2012
I believe there is a need for a "MacroStmt" class to help source to source
conversion.
*** PROBLEM ***
The post-processing of macros is difficult, especially when doing source to
source conversion by walking the AST. Take for example the following code:
#define OL_MACRO (99/2)
#define OL_MACRO_USEDINCODE OL_MACRO
#define FL_MACRO(a, b) a*b
#define FL_MACRO_USEDINCODE(a, b, c) FL_MACRO(a, 2^b)
void main() {
int ii = FL_MACRO_USEDINCODE(OL_MACRO_USEDINCODE, 1, 2);
}
When converting source to source, it is desirable to keep as true to the
original source as possible. However, when the macros are expanded, it is
difficult to separate the macro expression from the non-macro code and to
maintain the mappings for each of the arguments in the macro call.
*** PROPOSED SOLUTION ***
I suggest a new Preprocessor flag "UseMacroStmts" and three new helper
classes to be used when macros can be represented by statements. I realize
that by the very nature of macros, THEY WILL NOT ALL BE WELL FORMED. It is
not my intent to handle fringe macro cases, and it should possible to ensure
that fringe cases are not mishandled.
Proposed helper classes:
1) MacroStmt: AST object for a macro that can be represented by statements.
Structure should be something like:
class MacroStmt : public Stmt {
MacroInfo *MI; // the referenced MacroInfo
unsigned NumTokens;
Expr **Args; /* if the macro is function-like, Args contain the
arguments which
have been parsed in the context of the macro
instance */
unsigned NumArgs;
/* WellFormed is marked when examining the MacroParentStmt
if it is valid to parse the macro separately */
bool WellFormed;
/* MacroBody is the body of the macro
parsed with placeholders (MacroParamExpr) for the parameters */
Stmt *MacroBody;
...
}
2) MacroParamExpr: AST object used to represent passed parameters when
parsing the MacroBody. Structure should be something like:
class MacroParamExpr : public Expr {
MacroStmt *ParentStmt;
unsigned ParentArgNum;
...
}
3) MacroParentStmt: AST object used to represent a "stand alone statement",
which is known to be independent of preceding or following statements, for
example, the statement "ii := FL_MACRO_USEDINCODE(OL_MACRO_USEDINCODE, 1,
2);" The MacroParentStmt would have two internal representations of the
contained code, one using MacroStmts and one expanded as is currently done.
The MacroStmts can be independently expanded and made canonical and checked
against the conventional expansion as an assurance that the contained macros
can be represented in this manner. Structure should be something like:
class MacroParentStmt : public Stmt {
Stmt *BodyWithMacroStmts; // parsed keeping the macros in separate
MacroStmts
Stmt *BodyNormallyExpanded; // redundant: could recreate this by
recursively expanding above
...
}
Although there are fringe cases, for the majority of real-world situations,
macros CAN be represented by statements. If the macro is not "well formed"
(i.e. it cannot be parsed meaningfully separately), this should be
identified when constructing the MacroParentStmt by iteratively leaving sub
MacroStmts out until a canonical match is found.
The idea is that 1) all of the macro parameters (if function-like), and 2)
body are parsed in the calling context and the resultant Stmts are placed in
the object.
So, the above example would look something like this in the AST:
Decl "main"
DeclStmt (VarDecl) "ii"
Init: MacroStmt "FL_MACRO_USEDINCODE"
Args[0]: MacroStmt "OL_MACRO_USEDINCODE"
MacroBody: MacroStmt "OL_MACRO"
MacroBody: BinaryOperator "/"
LHS: IntegerLiteral "99"
RHS: IntegerLiteral "2"
Args[1]: IntegerLiteral "1"
Args[2]: IntegerLiteral "2"
MacroBody: MacroStmt "FL_MACRO"
Args[0]: MacroParamExpr "Param 0"
Args[1]: BinaryOperator "^"
LHS: IntegerLiteral "2"
RHS: MacroParamExpr "Param 1"
MacroBody: BinaryOperator "*"
LHS: MacroParamExpr "Param 0"
RHS: MacroParamExpr "Param 1"
*** METHOD OF IMPLEMENTATION ***
I'm not really of the best way to implement this. (My experience and
expertise at this low level is next to nil.)
It is important to note that given a MacroStmt, tokens could be created that
are identical to a fully expanded macro.
Perhaps during lexing, one could lex the fully expanded macro, then
backtrack and create the MacroStmt and replace the entire macro with a
special token that points to the MacroStmt? When this special token is
parsed it could:
1) supply tokens in expanded form (which would be no different than it
is now); and
2) recursively parse the body of each MacroStmt encountered as though it
were a stand alone statement.
a) When parsing inside a MacroStmt, other MacroStmts should be
directly inserted into the AST.
b) If a MacroStmt body gives errors when parsing, it should be
marked as WellFormed = false;
3) When the end of the current independent statement in code is reached,
the AST should be modified to encapsulate the current independent statement
inside a MacroParentStmt. At this time, a canonical representation of the
entire statement should be made and compared to the result of iterating
through the sub-macros to determine whether or not they meet the
"well-formed" criteria.
I'm open to suggestions and direction.
Thoughts? Comments? Alternatives? Not-so-subtle hints that I'm off my
rocker?
--
View this message in context: http://clang-developers.42468.n3.nabble.com/Macro-GURUs-I-need-advice-Is-new-class-MacroStmt-necessary-tp4026429.html
Sent from the Clang Developers mailing list archive at Nabble.com.
More information about the cfe-dev
mailing list