[LLVMdev] XOR Optimization
Daniel Nicácio
dnicacios at gmail.com
Tue Jul 26 14:32:47 PDT 2011
Hi Duncan,
when I run "opt -std-compile-opts" on the original source code it has the
same output of O3.
when I run "opt -std-compile-opts" on the -O3 optimized code, things get
even more weird, it outputs the following code:
while.body: ; preds = %while.body,
%entry
%indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ]
%tmp = shl i32 %indvar, 2
%0 = lshr i32 %indvar, 3
%shr = and i32 %0, 134217727
%rem = and i32 %tmp, 16
%shl = shl i32 1, %rem
%arrayidx = getelementptr inbounds i32* %bitmap, i32 %shr
%tmp6 = load i32* %arrayidx, align 4
%rem.1 = or i32 %rem, 1
%shl.1 = shl i32 1, %rem.1
%rem.2 = or i32 %rem, 2
%shl.2 = shl i32 1, %rem.2
%rem.3 = or i32 %rem, 3
%shl.3 = shl i32 1, %rem.3
%xor = xor i32 %shl, %tmp6
%xor.1 = xor i32 %xor, %shl.3
%xor.2 = xor i32 %xor.1, %shl.2
%xor.3 = xor i32 %xor.2, %shl.1
%rem.11 = or i32 %rem, 4
%shl.12 = shl i32 1, %rem.11
%rem.1.1 = or i32 %rem, 5
%shl.1.1 = shl i32 1, %rem.1.1
%rem.2.1 = or i32 %rem, 6
%shl.2.1 = shl i32 1, %rem.2.1
%rem.3.1 = or i32 %rem, 7
%shl.3.1 = shl i32 1, %rem.3.1
%xor.13 = xor i32 %shl.12, %xor.3
%xor.1.1 = xor i32 %xor.13, %shl.3.1
%xor.2.1 = xor i32 %xor.1.1, %shl.2.1
%xor.3.1 = xor i32 %xor.2.1, %shl.1.1
%rem.24 = or i32 %rem, 8
%shl.25 = shl i32 1, %rem.24
%rem.1.2 = or i32 %rem, 9
%shl.1.2 = shl i32 1, %rem.1.2
%rem.2.2 = or i32 %rem, 10
%shl.2.2 = shl i32 1, %rem.2.2
%rem.3.2 = or i32 %rem, 11
%shl.3.2 = shl i32 1, %rem.3.2
%xor.26 = xor i32 %shl.25, %xor.3.1
%xor.1.2 = xor i32 %xor.26, %shl.3.2
%xor.2.2 = xor i32 %xor.1.2, %shl.2.2
%xor.3.2 = xor i32 %xor.2.2, %shl.1.2
%rem.37 = or i32 %rem, 12
%shl.38 = shl i32 1, %rem.37
%rem.1.3 = or i32 %rem, 13
%shl.1.3 = shl i32 1, %rem.1.3
%rem.2.3 = or i32 %rem, 14
%shl.2.3 = shl i32 1, %rem.2.3
%rem.3.3 = or i32 %rem, 15
%shl.3.3 = shl i32 1, %rem.3.3
%xor.39 = xor i32 %shl.38, %xor.3.2
%xor.1.3 = xor i32 %xor.39, %shl.3.3
%xor.2.3 = xor i32 %xor.1.3, %shl.2.3
%xor.3.3 = xor i32 %xor.2.3, %shl.1.3
store i32 %xor.3.3, i32* %arrayidx, align 4
%indvar.next.3 = add i32 %indvar, 4
%exitcond.3 = icmp eq i32 %indvar.next.3, 32
br i1 %exitcond.3, label %while.end, label %while.body
while.end: ; preds = %while.body
ret void
Thanks for the reply,
Daniel
On Tue, Jul 26, 2011 at 12:50 PM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Daniel,
>
> > Precisely. The code generated by unrolling can be folded into a single
> XOR and
> > SHL. And even if it was not inside a loop, it can still be optimized.
> What I
> > want to know is: is there any optimization supposed to optimize this
> code, but
> > for some reason it thinks it is not possible, or there is no
> optimization for
> > that situation at all?
>
> it could be a phase ordering problem. If you run "opt -std-compile-opts"
> on the
> unsatisfactory bitcode, does it clean up all the bit fiddling?
>
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110726/8f798245/attachment.html>
More information about the llvm-dev
mailing list