[LLVMdev] Problems with 64-bit register operands of inline asm on ARM

Wed Mar 27 17:35:46 PDT 2013

On Mar 27, 2013, at 5:21 PM, Måns Rullgård <mans at mansr.com> wrote:

>>> GAS is not *wrong*, strictly speaking.  It is not forbidden for an
>>> assembler to accept syntax beyond that described in the ARM ARM.  In
>>> fact, this is even encouraged in some places.
>>> 
>>> Since this syntax is non-standard, you are correct that there is no
>>> _need_ to follow this.  However, if the aim is for clang/llvm to compile
>>> existing source code unmodified, this does become a requirement, like it
>>> or not.
>> 
>> We want to do the right thing, not blindly accept whatever language
>> extensions other toolchains have adopted. Sometimes that means telling
>> people they need to modify their source code to conform to our
>> stricter adherence to documentation. This is one of those places. This
>> syntax is a relic of the old divided syntax, which LLVM's assembler
>> does not support. We should be actively deprecating what support there
>> is for divided syntax (i.e., adding warnings when it's used, then
>> removing it entirely in following releases), not expanding support for
>> it.
> 
> That approach may work for a closed outfit like Apple.  In the open
> world, all it will accomplish is further cementing the dominance of
> gcc.  If strict conformance to standards is more important to you than
> uptake of your product, fine with me.  Just don't expect developers to
> be falling over themselves to adapt to your stricter requirements when
> gcc, in their eyes, works perfectly well.

You're right, but this is a careful dance.  As a general policy, we try not to implement too many weird things, particularly if the implementation is complex.

On the X86 assembler, we started out really strict, but then added more and more compatibility hacks as it came in contact with more code.  The line in the sand that we ended up with is that we accept reasonable things that are unambiguous even if they don't make sense: For example, the in/out instructions in X86 take register operands, but GAS buggily accepts a memory operands with one register, and this is widely used.

On the other hand, we do not accept things that are ambiguous and are frequently bugs.  An example from X86, is that GAS (and llvm) infer the width suffix of an instruction when not specified on a mnemonic (e.g. "add $1, %bx").  GAS has a bug where it picks a suffix when it is ambiguous (e.g. "add $1, (%eax)"), resolving to an inconsistent and arbitrary answer.  We choose to reject this, forcing people to be explicit in their code.

> Besides, clang/llvm already supports a large number of gcc extensions.
> Who decides, and how, which ones are accepted and which get short
> shrift?

Ultimately the code owner, who we hope is reasonable.  Before it gets to that, general discussion in the community is the right way to escalate issues like this.  A lot of it comes down to the "squishy" cost/benefit tradeoff of how complicated it is to implement something vs how much code uses it.

-Chris