[PATCH] D82862: [ThinLTO] Always parse module level inline asm with At&t dialect

Fri Jan 15 03:51:49 PST 2021

uweigand added a comment.

In D82862#2500038 <https://reviews.llvm.org/D82862#2500038>, @pengfei wrote:

>> What is the reason for treating this differently in LLVM?
>
> I'm not sure if it is related to this. I think one difference is that LLVM is supporting MS inline assembly. Although it uses Intel dialect, it has different representation in input, output, clobber etc. with GCC'.

That is indeed another complication.  On the one hand, we have two different inline asm formats, GNU assembly and MS assembly.  These differ completely in how asm arguments (passed in from surrounding C/C++ code) are handled and therefore need to be parsed quite differently.  On the other hand, we have the different assembler dialects (AT&T vs. Intel on x86), which affect how the single asm instructions are to be parsed.

Unfortunately, the distinction between the two seems to be somewhat muddled in LLVM at this point.  In particular, MS assembly statements are represented at the LLVM IR level via the "inteldialect" token.  This is a bit confusing because while MS asm indeed always uses the Intel dialect, we can also have GNU asm with the Intel dialect -- at least this is the case in GCC when using -masm=intel.

What I think we should have is:

- the LLVM IR level distinction between GNU and MS asm statements; and in addition
- for GNU asm statements, a target-specific selection of assembler variants, which may depend on the target triple and/or command line options like -masm=

If you look at AsmPrinter::emitInlineAsm, this is actually **partially** implemented already:

  // The variant of the current asmprinter.
  int AsmPrinterVariant = MAI->getAssemblerDialect();
  AsmPrinter *AP = const_cast<AsmPrinter*>(this);
  if (MI->getInlineAsmDialect() == InlineAsm::AD_ATT)
    EmitGCCInlineAsmStr(AsmStr, MI, MMI, AsmPrinterVariant, AP, LocCookie, OS);
  else
    EmitMSInlineAsmStr(AsmStr, MI, MMI, AP, LocCookie, OS);

So here the LLVM IR marker is used to select between GCC and MS inline asm instructions, and for the GCC inline asm statements, the target is queried as to the proper asm dialect variant to use.

However, later on we have a

  Parser->setAssemblerDialect(Dialect);

call, where the Dialect is again taken from the LLVM IR setting -- so for GNU asm, this gets now always set back to AT&T.   It seems to me that this is simply wrong.

Then, we finally have **module** level inline asm statements, which are always treated as GNU style (although this may not really make any difference since module level statements do not have arguments anyway).  Again because of the above setAssemblerDialect call they would therefore be also treated as AT&T dialect, which I guess is what @hans originally noticed.

In D82862#2498785 <https://reviews.llvm.org/D82862#2498785>, @hans wrote:

> The motivation for my change was really just to make ThinLTO compiles work the same as non-ThinLTO ones.
>
> Maybe the fact that -x86-asm-syntax=intel doesn't affect inline asm is a bug. I wasn't aware that Clang and GCC's -masm= flags behaved differently in that way, but that certainly suggests there's a problem here.

So I'm wondering, if I remove the above setAssemblerDialect line **and** revert this commit, we should have inline asm (both module-level and GNU function-leve) respect the target-selected asm dialect variant both for ThinLTO and non-ThinLTO, so they should match again.   Would that also solve the problem you were originally tracking?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82862/new/

https://reviews.llvm.org/D82862