<span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; ">"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs are not </span><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; ">folded into 1."</span><div>


<font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">I dont see why the unrolling explains it.<br>


</span></font><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; "><div><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; "><br>


</span></div>"I think he is trying to say this expression generated by unrolling by a factor of 4 can indeed be folded into a single XOR, SHL and OR. "</span><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br>


</span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">Precisely. The code generated by unrolling can be folded into a single XOR and SHL. And even if it was not inside a loop, it can still be optimized. What I want to know is:  is there any optimization supposed to optimize this code, but for some reason it thinks it is not possible, or  there is no optimization for that situation at all?</span></font></div>


<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse; "><br></span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse; ">Thanks for the help guys</span></font></div>


<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></font><br><div class="gmail_quote">On Tue, Jul 26, 2011 at 9:55 AM, Alistair Lynn <span dir="ltr"><<a href="mailto:arplynn@gmail.com">arplynn@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi-<br>

<div class="im"><br>

> I haven't seen a machine in which OR is faster than ADD nor more energy-efficient. They're all done by the same ALU circuitry which delays the pipeline by its worstcase path timing. So, for modern processor hardware purposes, OR is exactly equal ADD. Transforming ADD to OR isn't strenght reduction at all. Maybe this is benefical only if you have a backend generating circuitry (programming FPGAs).<br>


<br>

</div>I believe that in cases where ADD and OR are equivalent, LLVM prefers the latter because it's easier to reason about the bits in the result of an OR in complex cases. The x86 backend, for instance, transforms ORs in such cases back into adds, presumably in case it may be matched to an lea where that's beneficial.<br>


<font color="#888888"><br>

Alistair<br>

</font><div><div></div><div class="h5"><br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</div></div></blockquote></div><br></div></div>