[cfe-commits] r161703 - in /cfe/trunk: include/clang/AST/Stmt.h lib/AST/Stmt.cpp

Mon Aug 13 03:36:46 PDT 2012

Il 13/08/2012 11:37, Eli Friedman ha scritto:
> On Mon, Aug 13, 2012 at 2:24 AM, Enea Zaffanella <zaffanella at cs.unipr.it> wrote:
>> On 08/10/2012 11:36 PM, Chad Rosier wrote:
>>> Author: mcrosier
>>> Date: Fri Aug 10 16:36:25 2012
>>> New Revision: 161703
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=161703&view=rev
>>> Log:
>>> [ms-inline asm] Avoid extra allocations by making this an array of StringRefs.
>> [...]
>>
>>
>> Hello.
>>
>> Since you are working on the ms-inline asm statements,
>> we would like to propose a change to the AST representation.
>>
>> Currently, consecutive MS asm statements are merged into a single
>> MSAsmStmt node.
>>
>> This choice negatively affects source code fidelity.
>> For instance, there is a single source location for just the first of
>> the possibly many consecutive __asm keywords and there is no faithful
>> tracking of braces.
>> Moreover, it seems to also complicate later code trying to reconstruct
>> the spacing and line-breaking around the asm tokens (namely, the use of
>> array LineEnds and of static function needSpaceAsmToken).
>>
>> As things are now, when parsing the following:
>>
>>    __asm {
>>      mov  eax, dword ptr [0]
>>      mov  edx, dword ptr [4]
>>    }
>>
>> we obtain the string
>>    "mov eax, dword ptr[0]mov edx, dword ptr[4]"
>> with no line break.
>>
>>
>> Wouldn't thing be more simple if each __asm keyword is mapped onto its
>> own MSAsmStmt node?
>> (Side note: afawct, this is what happens for non-MS AsmStmt nodes.)
>>
>>  From the source code fidelity perspective, we would get a source
>> location for each __asm keyword and we will be able to properly tell if
>> it is using braces or not.
>> Moreover, the logic for producing the asm string would be simpler too:
>> there will be no need to use an array LineEnds, because you can query
>> each asm token with isAtStartOfLine().
>> As a matter of fact, token method hasLeadingSpace() could replace static
>> function needSpaceAsmToken() to add a white space if there was one in
>> the original sources.
>>
>> For the sample above, we would thus obtain the string
>>    "mov eax, dword ptr [0]\nmov edx, dword ptr [4]"
>> with both the line break and the extra space before the "[".
> 
> Semantically, there's only one statement; CodeGen, for example, must
> produce a single LLVM IR inline asm.  A series of __asm lines and a
> single __asm{} block have the same meaning; they're basically just
> spelled differently in the source code.  I think it would
> substantially complicate the semantic analysis and CodeGen to be
> forced to handle a series of AST nodes as if they were one.
> 
> Granted, if the current AST nodes don't provide enough source location
> information, we should do something, but I don't think splitting them
> is the right answer.

IMHO to have every *syntactic* asm statement in separate nodes
simplifies a lot their handling, location storing and token
saving/retrieval.

Further Visual Studio compilers accept an optional semicolon at end of
asm stmt and this leads to interpret every __asm as syntactically
separated statements.

Once said that, if for semantic handling we have benefits aggregating
semantic data in a single place this can be handled otherwise (e.g.
putting a pointer in MSAsmStmt that refers the aggregated semantic data).