[LLVMdev] Tight overlapping loops and performance
Eli Friedman
eli.friedman at gmail.com
Mon Mar 2 15:30:55 PST 2009
On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote:
> For which version of gcc? I should mention I'm on OS X and using the LLVM
> SVN.
gcc 4.3. It's also possible this is processor-sensitive.
>> First, try looking at the generated code... the code LLVM generates is
>> probably not what you're expecting. I'm getting the following for the
>> main loop:
>
> I was seeing the same thing, but wasn't sure what to make of it. It looks
> like values are being swapped into and out of memory and not holding them in
> registers.
You're misreading the asm... nothing is touching memory. (BTW, "leal
-1(%eax), %eax" isn't a memory operation; it's just subtracting one
from %eax.) You might want to try reading the LLVM IR (which you can
generate with llvm-gcc -S -emit-llvm); it tends to be easier to read.
> My current implementation uses something very similar, but if you'll notice
> the difference between this example and my examples is that the branch for
> checking 'timeout' is taken in the majority case where in mine it isn't. It
> can be checked separately for less cost, assuming the variables stay in
> registers.
A taken and non-taken branch have roughly the same cost on any
remotely recent x86 processor.
-Eli
More information about the llvm-dev
mailing list