[llvm-commits] [llvm] r61424 - in /llvm/trunk/lib/Target/X86: X86Instr64bit.td X86InstrInfo.td
Eli Friedman
eli.friedman at gmail.com
Wed Dec 24 22:01:25 PST 2008
On Wed, Dec 24, 2008 at 9:32 PM, Chris Lattner <clattner at apple.com> wrote:
> On Dec 24, 2008, at 9:21 PM, Eli Friedman wrote:
>>>> Also, even ignoring that, performance is hugely different: on a Core
>>>> 2, "bt %ebx, %eax" is one uop, but "bt %ebx, (%esp)" is 10 uops.
>>>> The
>>>> difference isn't quite as severe on other processors, but the reg-
>>>> reg
>>>> form is still significantly faster even if a load from memory is
>>>> necessary.
>>>
>>> Are you sure you aren't thinking of btc/bts? bt doesn't modify any
>>> operands.
>>
>> Oh, oops, s/modifies/tests/. The rest is correct.
>
> Do you have a benchmark to show this? If it shows that it is slower
> in practice, I think it would make sense to have a "has slow bt from
> memory" subtarget flag that would be a nice predicate for the memory
> form of the instructions.
I'm going by timings from http://www.agner.org/optimize/.
If you want a benchmark, try the following; it's a completely silly
benchmark, but it shows the issue at hand.
#include <stdlib.h>
int main() {
int testlen = 1000000000;
int* a = malloc(testlen/8);
unsigned i;
#if 1
for (i = 0; i < testlen; i++) {
asm volatile ("btl %0, (%1)" : : "r"(i), "r"(a));
}
#else
for (i = 0; i < testlen; i++) {
asm volatile ("mov %0, %%eax;"
"shrl $5, %%eax;"
"mov (%1,%%eax,4), %%eax;"
"btl %0, %%eax" : : "r"(i), "r"(a) : "eax");
}
#endif
}
The two branches do approximately the same thing; the second version
is almost twice as fast as the first on my computer (a Core Duo).
-Eli
More information about the llvm-commits
mailing list