[PATCH] D47545: [MC][X86] Allow assembler variable assignment to register name.

Mon Jun 4 13:01:46 PDT 2018

rnk accepted this revision.
rnk added a comment.
This revision is now accepted and ready to land.

lgtm

================
Comment at: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp:2197
+    SMLoc StartLoc = Parser.getTok().getLoc();
+    // Normal Expression parse fails, check if it could be a register.
+    unsigned RegNo;
----------------
niravd wrote:
> void wrote:
> > rnk wrote:
> > > niravd wrote:
> > > > void wrote:
> > > > > nickdesaulniers wrote:
> > > > > > rnk wrote:
> > > > > > > Does this generalize to more than just registers? Could it include things like `foo = $42`, or `foo = 0x40(%rcx)` as some kind of alias for an accessor? We might want to allow these things so that we don't have to do this fire drill again the next time a Linux developer writes some creative GNU as.
> > > > > > or tries to do this with another ISA, like arm64.  I guess I'm curious if this is needs to be implemented in the other parsers as well?  We only see the issue on x86 in the kernel currently, but I'd think this kind of functionality would be ISA independent?
> > > > > Something like `foo = 0x40(%rcx)` should be an illegal expression according to their own documentation....but then again so should using a register as a primary symbol. :-/
> > > > Extending to generic instruction operands seems relatively straightforward, but given that it's illegal and there's really only a few cases we need to support, I'd prefer to keep this as simple as possible and deal with it as it comes along.
> > > > 
> > > > Hopefully, once we have clang consistently building the linux kernel with the integrated assembler (which I believe is now just this) additional things we need to workaround in this we should get quick push back and that'll be enough social pressure to prevent new violations needing compiler support vs. a code rewrite.
> > > > 
> > > I'm still curious to know what gas actually supports in practice before we commit to limiting ourselves to things that look like registers. If gas really treats this as a textual macro, that's probably the way we should go.
> > > 
> > > They may document that `foo = 0x40(%rcx)` is illegal, but documentation has been known to be wrong.
> > In an ideal world, we could do that because it would be documented. But as you mentioned their documentation seems faulty. In order to know exactly what gas supports, we would need to unravel its code. That's a fairly complex task, which would take a lot of time and may not be useful.
> > 
> > In this case, it used to be an acceptable practice to allow registers in assembly macros, but it was uncommon. It's possible that its support in gas was unknown to the documentation writers.
> > 
> > IMO, we should go with their documentation and change only when we detect a deviation from it.
> From testing, it's not just textual replacement. It looks like we always interpret assigned variables either as registers or memory references. 
> 
> foo = <Register>  -> valid. 
> 
> foo = ($<immediate expression>)
> foo = $<Immediate expression>
> foo = <immediate expression>
> 
> Both valid. Always interpreted as memory access i.e. xorl $eax, foo translates to xorl $eax, (<immediate>)
> 
> foo = 40(%rcx)
> foo = 40 + (%rcx)
> foo = 40 + %rcx 
> 
> All invalid. 
> 
> That means, the only things we're missing from gas is the parsing of memory references off of immediate (4 + $100), but this isn't parsed currently in straight line code, e.g., "xor %eax,  ($40)". 
> 
> 
> 
> 
@void I think my perspective is informed by doing lots of MS compatibility work and triaging user complaints about incompatibility. I lean more towards doing all the experiments we can to try to understand the principle of what gas (or GCC, or MSVC) is really trying to do. It usually saves time in the long run if we can discover the underlying principle and implement it if it is not too far from our existing model. Knowing what's a bridge too far is tough, though.

@niravd Thanks for doing the experiments! The examples convince me that gas users aren't likely to rely on this extension for much more than register names. `($imm)` is just not that useful, and is easy to rewrite as an absolute symbol definition. Actually, isn't that just an absolute symbol? Do we not already handle that? Whatever, it's not relevant to this change.

Repository:
  rL LLVM

https://reviews.llvm.org/D47545