[LLVMdev] XOR Optimization

Tue Jul 26 17:21:00 PDT 2011

After a few more tests, I found out that if we set -unroll-threshold to a
value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times,
the unroll will be able to unroll the original loop 32 times, and when you
have it unrolled for at least 32 times a optimization is triggered, folding
it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does not
transform it into a NOT though).

Therefore, the optimization that I want is already there somewhere, but is
not triggered when llvm unrolls the loop less than 32 times.

My goal now is to make it work for smaller unrolled loops, like for 4, 8,
and 16.

Does anyone knows which function gives the information that for 32 unrolled
steps it is possible to optimize but not for smaller numbers?

I also would like to see why the "XOR  A,  -1" is not turned into a NOT, any
hints on that are welcome.

Thanks,

Daniel Nicacio

2011/7/26 Daniel Nicácio <dnicacios at gmail.com>

> Hi Duncan,
>
> when I run "opt -std-compile-opts" on the original source code it has the
> same output of O3.
> when I run "opt -std-compile-opts" on the -O3 optimized code, things get
> even more weird, it outputs the following code:
>
> while.body:                                       ; preds = %while.body,
> %entry
>   %indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ]
>   %tmp = shl i32 %indvar, 2
>   %0 = lshr i32 %indvar, 3
>   %shr = and i32 %0, 134217727
>   %rem = and i32 %tmp, 16
>   %shl = shl i32 1, %rem
>   %arrayidx = getelementptr inbounds i32* %bitmap, i32 %shr
>    %tmp6 = load i32* %arrayidx, align 4
>   %rem.1 = or i32 %rem, 1
>   %shl.1 = shl i32 1, %rem.1
>   %rem.2 = or i32 %rem, 2
>   %shl.2 = shl i32 1, %rem.2
>   %rem.3 = or i32 %rem, 3
>   %shl.3 = shl i32 1, %rem.3
>   %xor = xor i32 %shl, %tmp6
>   %xor.1 = xor i32 %xor, %shl.3
>   %xor.2 = xor i32 %xor.1, %shl.2
>   %xor.3 = xor i32 %xor.2, %shl.1
>   %rem.11 = or i32 %rem, 4
>   %shl.12 = shl i32 1, %rem.11
>   %rem.1.1 = or i32 %rem, 5
>   %shl.1.1 = shl i32 1, %rem.1.1
>   %rem.2.1 = or i32 %rem, 6
>   %shl.2.1 = shl i32 1, %rem.2.1
>   %rem.3.1 = or i32 %rem, 7
>   %shl.3.1 = shl i32 1, %rem.3.1
>   %xor.13 = xor i32 %shl.12, %xor.3
>   %xor.1.1 = xor i32 %xor.13, %shl.3.1
>   %xor.2.1 = xor i32 %xor.1.1, %shl.2.1
>   %xor.3.1 = xor i32 %xor.2.1, %shl.1.1
>   %rem.24 = or i32 %rem, 8
>   %shl.25 = shl i32 1, %rem.24
>   %rem.1.2 = or i32 %rem, 9
>   %shl.1.2 = shl i32 1, %rem.1.2
>   %rem.2.2 = or i32 %rem, 10
>   %shl.2.2 = shl i32 1, %rem.2.2
>   %rem.3.2 = or i32 %rem, 11
>   %shl.3.2 = shl i32 1, %rem.3.2
>   %xor.26 = xor i32 %shl.25, %xor.3.1
>   %xor.1.2 = xor i32 %xor.26, %shl.3.2
>   %xor.2.2 = xor i32 %xor.1.2, %shl.2.2
>   %xor.3.2 = xor i32 %xor.2.2, %shl.1.2
>   %rem.37 = or i32 %rem, 12
>   %shl.38 = shl i32 1, %rem.37
>   %rem.1.3 = or i32 %rem, 13
>   %shl.1.3 = shl i32 1, %rem.1.3
>   %rem.2.3 = or i32 %rem, 14
>   %shl.2.3 = shl i32 1, %rem.2.3
>   %rem.3.3 = or i32 %rem, 15
>   %shl.3.3 = shl i32 1, %rem.3.3
>   %xor.39 = xor i32 %shl.38, %xor.3.2
>   %xor.1.3 = xor i32 %xor.39, %shl.3.3
>   %xor.2.3 = xor i32 %xor.1.3, %shl.2.3
>   %xor.3.3 = xor i32 %xor.2.3, %shl.1.3
>   store i32 %xor.3.3, i32* %arrayidx, align 4
>   %indvar.next.3 = add i32 %indvar, 4
>   %exitcond.3 = icmp eq i32 %indvar.next.3, 32
>    br i1 %exitcond.3, label %while.end, label %while.body
>
> while.end:                                        ; preds = %while.body
>   ret void
>
>
> Thanks for the reply,
>
> Daniel
>
>
> On Tue, Jul 26, 2011 at 12:50 PM, Duncan Sands <baldrick at free.fr> wrote:
>
>> Hi Daniel,
>>
>> > Precisely. The code generated by unrolling can be folded into a single
>> XOR and
>> > SHL. And even if it was not inside a loop, it can still be optimized.
>> What I
>> > want to know is:  is there any optimization supposed to optimize this
>> code, but
>> > for some reason it thinks it is not possible, or  there is no
>> optimization for
>> > that situation at all?
>>
>> it could be a phase ordering problem.  If you run "opt -std-compile-opts"
>> on the
>> unsatisfactory bitcode, does it clean up all the bit fiddling?
>>
>> Ciao, Duncan.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110726/bf925be4/attachment.html>