[cfe-commits] r161703 - in /cfe/trunk: include/clang/AST/Stmt.h lib/AST/Stmt.cpp

Mon Aug 13 02:37:52 PDT 2012

On Mon, Aug 13, 2012 at 2:24 AM, Enea Zaffanella <zaffanella at cs.unipr.it> wrote:
> On 08/10/2012 11:36 PM, Chad Rosier wrote:
>> Author: mcrosier
>> Date: Fri Aug 10 16:36:25 2012
>> New Revision: 161703
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=161703&view=rev
>> Log:
>> [ms-inline asm] Avoid extra allocations by making this an array of StringRefs.
> [...]
>
>
> Hello.
>
> Since you are working on the ms-inline asm statements,
> we would like to propose a change to the AST representation.
>
> Currently, consecutive MS asm statements are merged into a single
> MSAsmStmt node.
>
> This choice negatively affects source code fidelity.
> For instance, there is a single source location for just the first of
> the possibly many consecutive __asm keywords and there is no faithful
> tracking of braces.
> Moreover, it seems to also complicate later code trying to reconstruct
> the spacing and line-breaking around the asm tokens (namely, the use of
> array LineEnds and of static function needSpaceAsmToken).
>
> As things are now, when parsing the following:
>
>    __asm {
>      mov  eax, dword ptr [0]
>      mov  edx, dword ptr [4]
>    }
>
> we obtain the string
>    "mov eax, dword ptr[0]mov edx, dword ptr[4]"
> with no line break.
>
>
> Wouldn't thing be more simple if each __asm keyword is mapped onto its
> own MSAsmStmt node?
> (Side note: afawct, this is what happens for non-MS AsmStmt nodes.)
>
>  From the source code fidelity perspective, we would get a source
> location for each __asm keyword and we will be able to properly tell if
> it is using braces or not.
> Moreover, the logic for producing the asm string would be simpler too:
> there will be no need to use an array LineEnds, because you can query
> each asm token with isAtStartOfLine().
> As a matter of fact, token method hasLeadingSpace() could replace static
> function needSpaceAsmToken() to add a white space if there was one in
> the original sources.
>
> For the sample above, we would thus obtain the string
>    "mov eax, dword ptr [0]\nmov edx, dword ptr [4]"
> with both the line break and the extra space before the "[".

Semantically, there's only one statement; CodeGen, for example, must
produce a single LLVM IR inline asm.  A series of __asm lines and a
single __asm{} block have the same meaning; they're basically just
spelled differently in the source code.  I think it would
substantially complicate the semantic analysis and CodeGen to be
forced to handle a series of AST nodes as if they were one.

Granted, if the current AST nodes don't provide enough source location
information, we should do something, but I don't think splitting them
is the right answer.

-Eli