[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

Sun Jul 14 10:19:20 PDT 2013

On Sun, Jul 14, 2013 at 5:56 AM, Ramkumar Ramachandra
<artagnon at gmail.com> wrote:
> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
>   The inline assembly for the bit operations has been changed to remove
>   explicit sizing hints on the instructions, so the assembler will pick
>   the appropriate instruction forms depending on the architecture and
>   the context.
>
> Unfortunately, GNU as does no such thing

Yes it does.

>   btrl  $1, 0
>   btr   $1, 0
>   btsl  $1, 0
>   bts   $1, 0

What the heck is that supposed to show? It shows nothing at all. With
an argument of '1', *of*course* gas will use "btsl", since that's the
short form. Using the rex-predix and a btsq would be *stupid*.

So gas will pick the appropriate form, exactly as claimed.

Try some actual relevant test instead:

   bt %eax,mem
   bt %rax,mem

and notice how they are actually fundamentally different. Test-case:

int main(int argc, char **argv)
{
  asm("bt %1,%0":"=m" (**argv): "a" (argc));
  asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc)));
}

and I get

   0f a3 02             bt     %eax,(%rdx)
   48 0f a3 02           bt     %rax,(%rdx)

exactly as expected and wanted.

Now, there are possible cases where you want to make the size explicit
because you are mixing memory operand sizes and there can be nasty
performance implications of doing a 32-bit write and then doing a
64-bit read of the result. I'm not actually aware of us having ever
worried/cared about it, but it's a possible source of trouble: mixing
bitop instructions with non-bitop instructions can have some subtle
interactions, and you need to be careful, since the size of the
operand affects both the offset *and* the memory access size. The
access size generally is meaningless from a semantic standpoint
(little-endian being the only sane model), but the access size *can*
have performance implications for the write queue forwarding.

                      Linus