[llvm-commits] [llvm-testresults] lab-mini-01__O3-plain__clang_DEV__x86_64 test results
Duncan Sands
baldrick at free.fr
Fri May 25 08:28:47 PDT 2012
This 44% performance regression was caused by my reassociate changes. The
reason is pretty interesting though. I could do with some suggestions.
> Performance Regressions - Execution Time Δ Previous Current σ Δ (B) σ (B)
> SingleSource/Benchmarks/BenchmarkGame/puzzle
> <http://llvm.org/perf/db_default/v4/nts/789/graph?test.198=2> 44.42% 0.4829
> 0.6974 0.0001 43.91% 0.0001
The change to the optimized IR was:
%phi213.i = phi i32 [ %xor1.i, %for.body.i ], [ 0, %for.body.i.preheader ]
%indvars.iv.next.i = add i64 %indvars.iv.i, 1
%arrayidx.i = getelementptr inbounds i32* %call, i64 %indvars.iv.i
%0 = load i32* %arrayidx.i, align 4, !tbaa !0
%1 = trunc i64 %indvars.iv.next.i to i32
- %xor.i = xor i32 %0, %phi213.i
- %xor1.i = xor i32 %xor.i, %1
+ %xor.i = xor i32 %1, %phi213.i
+ %xor1.i = xor i32 %xor.i, %0
%exitcond = icmp eq i32 %1, 500001
br i1 %exitcond, label %findDuplicate.exit, label %for.body.i
The old code computes
%phi213.i ^ %0 ^ %1
while the new computes
%phi213.i ^ %1 ^ %0
Here %0 is a load and %1 is a truncation.
Since reassociate computes the same rank for %0 and %1, there is no reason to
prefer one to the other - it's just a matter of chance which one you get, and
the old code was luckier than the new.
The reason for the big slowdown is in the different codegen:
# phi213.i is in %ebx
+ leaq 1(%rdx), %rsi
+ xorl %esi, %ebx
xorl (%rax,%rdx,4), %ebx
- incq %rdx
- xorl %edx, %ebx
- cmpl $500001, %edx # imm = 0x7A121
+ cmpl $500001, %esi # imm = 0x7A121
+ movq %rsi, %rdx
I'm not sure why this codegen difference arises. Any suggestions?
If there is a fairly generic explanation for the different codegen, maybe the
rank function can be tweaked to force the more effective order.
Ciao, Duncan.
More information about the llvm-commits
mailing list