[PATCH] D95444: Allow GNU inline asm using target-specific dialect variant

Mon Feb 8 03:45:30 PST 2021

uweigand added a comment.

In D95444#2546235 <https://reviews.llvm.org/D95444#2546235>, @rnk wrote:

> Yes, sorry, I meant the assembly dialect that the GNU binutils assembler parses. What I'm trying to get at is that you might want to support separate blobs that use different dialects:
>
>   void foo() { asm volatile ("gnu style asm"); }
>   void bar() { asm volatile ("HLASM style asm"); }
>
> If there is one global setting for the inline asm dialect, this won't work. You could create a flag to control the setting, but then if you want to use LTO on two objects that use different assembly dialects, the command line flag isn't sufficient. To fully address this use case, you would want to put the dialect on each inline assembly blob, similar to the way we handle the existing intel/gnu dialect flag.

At least for our use case, this is not relevant.  We want to use the "GCC style asm" whenever compiling for the z/OS target, and the "HLASM style asm" whenever compiling for the Linux target.  These are incompatible anyway (different ABI, different target OS), so they cannot be combined using LTO.  We simply need to use asm dialects since both targets are implemented in a single SystemZ back-end.  (While the OS and ABI are different, the **ISA** is of course the same, so it does not make sense to use a different back-end.)

For this use case to work correctly, the back-end must be able to set the asm variant to be used for **everything**: writing asm source output, assembling a full asm source file, function-level inline asm, and module-level inline asm.  There's really not much point in the front-end getting involved here, since the asm variant is already fully determined by the target triple.

Originally, I thought this would be a simple matter of the back-end setting the AssemblerDialect variable in its MCAsmInfo.  However, it turned out this variable was actually only used for writing asm source output and assembling a full asm source file.  For function-level inline asm, the dialect is taked from the IR, and for module-level inline asm, the dialect is hard-coded to 0 no matter what.  This is what this patch was intended to fix.

Now, I fully agree that there are open questions on what the best implementation would be.   Specifically, as Hubert points out above, the IR-level "dialect" field is currently overloaded with two semantics: it indicates **both** that the inline asm should be parsed using the MS-asm input/output syntax **and** that the Intel-style mnemonic/operand syntax should be used.

The idea of this patch is to explicitly decouple those two semantics by having the IR-level "dialect" **only** refer to the GCC vs. MS input/output syntax, and allowing the mnemonic/operand syntax to be always set by the back-end with no influence from the front-end / IR.   In order to allow the X86 back-end to preserve existing behaviour, this meant that the back-end must be allowed to chose different mnemonic/operand dialects depending for GCC vs. MS input/output syntax inline asm statements, and therefore this is not a simple variable but a function.

If you have any better suggestions on how to implement this, I'd be happy to try those out as well ...

================
Comment at: llvm/include/llvm/MC/MCAsmInfo.h:572
+  // Allow overriding default dialect for inline assembler.
+  virtual unsigned getInlineAsmDialect(InlineAsm::AsmDialect asmDialect) const {
+    return getAssemblerDialect();
----------------
rnk wrote:
> hubert.reinterpretcast wrote:
> > The parameter here needs to be explained more clearly. This function seems to be compensating for problems in the applicability/correctness of what the input argument represents. I think a function that just actually gives a default dialect for the inline assembly would be easier to understand. I guess that in cases where the input to this function matters, we should consider changing the code where the input argument was originally set.
> Can this be a field instead of a virtual method? Most MCAsmInfo customization points seem to be data instead of code, and I think it's usually easier to reason about data than code. I see that this method has a parameter and that the x86 implementations use it, and I'm not immediately sure how to replace that, but it seems like there might be a way to do it.
The intent is not to "compensate for problems", really, but to clearly separate the two notions of "GNU vs. MS input/output syntax" from "AT&T vs. Intel mnemonic/operand syntax".  This function maps a value from the first domain into one of the second domain, allowing the back-end to provide the appropriate default mnemonic/operand syntax depending on the GNU vs. MS inline asm statement.

================
Comment at: llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp:21-22
 enum AsmWriterFlavorTy {
-  // Note: This numbering has to match the GCC assembler dialects for inline
-  // asm alternatives to work right.
   ATT = 0, Intel = 1
----------------
hubert.reinterpretcast wrote:
> Why is this comment removed?
Well, the whole reason the numbering had to match was that the value is currently overloaded between the "GNU vs. MS" values and "AT&T vs Intel" values.  With the patch, the overload is no longer present (the correspondence is rather explicitly provided via a function), and therefore the numbers no longer have to match.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95444/new/

https://reviews.llvm.org/D95444