[LLVMdev] Tight overlapping loops and performance

Mon Mar 2 13:41:45 PST 2009

On Mon, Mar 2, 2009 at 11:38 AM, Jonathan Turner <probata at hotmail.com> wrote:
> With gcc -O3 4.2 and 4.4 we match 1.0s.   The LLVM, after running it through
> opt -std-compile-opts, is around 1.7s.

Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and
llvm-gcc -O3 (using llvm-gcc from svn).  Not sure what you're doing
differently; I wouldn't be surprised if it's sensitive to the version
of LLVM.

> Should I be looking at any particular optimization passes that aren't in
> -std-compile-opts to match the gcc speeds?

First, try looking at the generated code... the code LLVM generates is
probably not what you're expecting.  I'm getting the following for the
main loop:

.LBB1_1:	# loopto
	cmpl	$1, %eax
	leal	-1(%eax), %eax
	cmove	%edx, %eax
	incl	%ecx
	cmpl	$999999999, %ecx
	jne	.LBB1_1	# loopto

LLVM is optimizing your oddly nested loops into a single loop which
does some extra computation to keep track of the timeout variable.
Since you'd normally be doing something non-trivial in the timeout
portion of the loop, the results you're getting with this contrived
testcase are irrelevant to your actual issue.

In general, you'll probably get better results from LLVM with properly
nested loops; LLVM's loop optimizers don't know how to deal with deal
with overlapping loops.  I'd suggest writing it more like the
following:

int timeout = 2000;
int loopcond;
do {
timeoutwork();
do {
timeout--;
loopcond = computationresult();
} while (loopcond && timeout);
} while (loopcond);

-Eli