[llvm-commits] [llvm-testresults] lab-mini-01O3-plainclang_DEV__x86_64 test results

Fri May 25 18:17:36 PDT 2012

On May 25, 2012, at 8:28 AM, Duncan Sands <baldrick at free.fr> wrote:

> This 44% performance regression was caused by my reassociate changes.  The
> reason is pretty interesting though.  I could do with some suggestions.
> 
>> Performance Regressions - Execution Time 	Δ	Previous	Current	σ 	Δ (B) 	σ (B)
>> SingleSource/Benchmarks/BenchmarkGame/puzzle
>> <http://llvm.org/perf/db_default/v4/nts/789/graph?test.198=2>	44.42%	0.4829
>> 0.6974	0.0001	43.91%	0.0001
> 
> The change to the optimized IR was:
> 
>    %phi213.i = phi i32 [ %xor1.i, %for.body.i ], [ 0, %for.body.i.preheader ]
>    %indvars.iv.next.i = add i64 %indvars.iv.i, 1
>    %arrayidx.i = getelementptr inbounds i32* %call, i64 %indvars.iv.i
>    %0 = load i32* %arrayidx.i, align 4, !tbaa !0
>    %1 = trunc i64 %indvars.iv.next.i to i32
> -  %xor.i = xor i32 %0, %phi213.i
> -  %xor1.i = xor i32 %xor.i, %1
> +  %xor.i = xor i32 %1, %phi213.i
> +  %xor1.i = xor i32 %xor.i, %0
>    %exitcond = icmp eq i32 %1, 500001
>    br i1 %exitcond, label %findDuplicate.exit, label %for.body.i
> 
> The old code computes
>   %phi213.i ^ %0 ^ %1
> while the new computes
>   %phi213.i ^ %1 ^ %0
> Here %0 is a load and %1 is a truncation.
> 
> Since reassociate computes the same rank for %0 and %1, there is no reason to
> prefer one to the other - it's just a matter of chance which one you get, and
> the old code was luckier than the new.
> 
> The reason for the big slowdown is in the different codegen:
> 
> # phi213.i is in %ebx
> 
> +       leaq    1(%rdx), %rsi
> +       xorl    %esi, %ebx
>         xorl    (%rax,%rdx,4), %ebx
> -       incq    %rdx
> -       xorl    %edx, %ebx
> -       cmpl    $500001, %edx           # imm = 0x7A121
> +       cmpl    $500001, %esi           # imm = 0x7A121
> +       movq    %rsi, %rdx
> 
> I'm not sure why this codegen difference arises.  Any suggestions?
> 
> If there is a fairly generic explanation for the different codegen, maybe the
> rank function can be tweaked to force the more effective order.

Hello Duncan,

This is a micro-architectural glass jaw that we should not attempt to compensate for in the optimizer. For example, moving the loaded value upward in the dependence chain is not generally a good thing. If this particular case mattered enough, we would need to specifically target the problem in codegen, ideally between coalescing and scheduling where we know how many cycles the loop will take and which resources are available in those cycles. At that point we could reassociate the xors, unfold the load to expose a coalescing opportunity. Or we could simply sink the copy across the loop back to allow fusing the cmp+jne.

But on to my real point. I think it's important not to arbitrarily reassociate, or otherwise canonicalize, unless the canonical form is clearly superior in exposing real optimization. You say that you've made an arbitrary decision to select one form over another. In that situation, we should try hard to preserve the original expression. Two reasons for this:

(1) We lose information about intermediate values. This means we have to throw away any value-specific annotations: NSW/NUW flags, debug information, things like value profile if we had it. We have a serious problem already when the Reassociate pass drops NSW flags, inhibiting important optimization.

(2) We introduce arbitrary performance variations, as you just noticed, which take a lot of time to track down. It becomes harder to provide hints to the compiler to guide codegen.

A while back, I was planning to rewrite Reassociate to preserve flags when possible. That fell by the wayside, but I'm becoming concerned that it will be harder to fix now that the pass is becoming more sophisticated based on the old design of throwing away the original expression. If there's any way you can think of having Reassociate bias expressions toward their original form, that would be helpful.

-Andy

[llvm-commits] [llvm-testresults] lab-mini-01__O3-plain__clang_DEV__x86_64 test results

[llvm-commits] [llvm-testresults] lab-mini-01O3-plainclang_DEV__x86_64 test results