[LLVMdev] Tight overlapping loops and performance

Mon Mar 2 15:30:55 PST 2009

On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote:
> For which version of gcc?  I should mention I'm on OS X and using the LLVM
> SVN.

gcc 4.3.  It's also possible this is processor-sensitive.

>> First, try looking at the generated code... the code LLVM generates is
>> probably not what you're expecting. I'm getting the following for the
>> main loop:
>
> I was seeing the same thing, but wasn't sure what to make of it.  It looks
> like values are being swapped into and out of memory and not holding them in
> registers.

You're misreading the asm... nothing is touching memory.  (BTW, "leal
-1(%eax), %eax" isn't a memory operation; it's just subtracting one
from %eax.)  You might want to try reading the LLVM IR (which you can
generate with llvm-gcc -S -emit-llvm); it tends to be easier to read.

> My current implementation uses something very similar, but if you'll notice
> the difference between this example and my examples is that the branch for
> checking 'timeout' is taken in the majority case where in mine it isn't.  It
> can be checked separately for less cost, assuming the variables stay in
> registers.

A taken and non-taken branch have roughly the same cost on any
remotely recent x86 processor.

-Eli