[llvm-commits] [llvm] r61424 - in /llvm/trunk/lib/Target/X86: X86Instr64bit.td X86InstrInfo.td

Wed Dec 24 22:01:25 PST 2008

On Wed, Dec 24, 2008 at 9:32 PM, Chris Lattner <clattner at apple.com> wrote:
> On Dec 24, 2008, at 9:21 PM, Eli Friedman wrote:
>>>> Also, even ignoring that, performance is hugely different: on a Core
>>>> 2, "bt %ebx, %eax" is one uop, but "bt %ebx, (%esp)" is 10 uops.
>>>> The
>>>> difference isn't quite as severe on other processors, but the reg-
>>>> reg
>>>> form is still significantly faster even if a load from memory is
>>>> necessary.
>>>
>>> Are you sure you aren't thinking of btc/bts?  bt doesn't modify any
>>> operands.
>>
>> Oh, oops, s/modifies/tests/.  The rest is correct.
>
> Do you have a benchmark to show this?  If it shows that it is slower
> in practice, I think it would make sense to have a "has slow bt from
> memory" subtarget flag that would be a nice predicate for the memory
> form of the instructions.

I'm going by timings from http://www.agner.org/optimize/.

If you want a benchmark, try the following; it's a completely silly
benchmark, but it shows the issue at hand.

#include <stdlib.h>

int main() {
  int testlen = 1000000000;
  int* a = malloc(testlen/8);
  unsigned i;
#if 1
  for (i = 0; i < testlen; i++) {
    asm volatile ("btl %0, (%1)" : : "r"(i), "r"(a));
  }
#else
  for (i = 0; i < testlen; i++) {
    asm volatile ("mov %0, %%eax;"
                  "shrl $5, %%eax;"
                  "mov (%1,%%eax,4), %%eax;"
                  "btl %0, %%eax" : : "r"(i), "r"(a) : "eax");
  }
#endif
}

The two branches do approximately the same thing; the second version
is almost twice as fast as the first on my computer (a Core Duo).

-Eli