<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 13, 2014 at 9:37 AM, suyog sarda <span dir="ltr"><<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks Sean for the reference.<div><br></div><div>I will go through it and see if i can implement it for generic boolean expression minimization.</div>

</div></blockquote><div><br></div><div>Hi Suyog,</div><div><br></div><div>One thing to keep in mind though is what you are trying to get out of all this work, as my suggestion might be a lot of work and there could be simpler solutions that meet your immediate needs.</div>

<div><br></div><div>What do you see as the end-result for instcombine's capability with boolean expression simplification? Is there a particular goal that you are working towards? The same also some of the other folks (coworkers of yours?) that have been contributing similar instcombines lately.</div>

<div><br></div><div><br></div><div>Also remember that basically anything that can be called a boolean minimization algorithm is going to have a worst-case exponential run time in the number of variables. An easy way to see this is that if your algorithm can decide "this expression simplifies to just False", then you can use it to solve SAT: if the expression simplifies to just False, then it is unsatisfiable, otherwise it is satisfiable.</div>

<div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>Regards,</div><div>Suyog</div></div>


<div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">On Wed, Aug 13, 2014 at 2:30 AM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Re-adding the mailing list (remember to hit "reply all")<br><div class="gmail_extra"><br><br>

<div class="gmail_quote">

<div><div>On Tue, Aug 12, 2014 at 9:36 AM, suyog sarda <span dir="ltr"><<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Thanks Sean for the reply.<div class="gmail_extra">


<br><br><div class="gmail_quote"><div><div>

On Mon, Aug 11, 2014 at 11:49 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">


<div>On Fri, Aug 8, 2014 at 1:34 PM, suyog sarda <span dir="ltr"><<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>Hi Duncan, David, Sean.<br>


<br></div>Thanks for your reply.<div><br><br>> It'd be interesting if you could find a design that also treated these<br>> the same:<br>

><br></div>>   (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C<div><br>

<div>>   (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C<br>

</div></div><div>>   (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C<br>

><br>

> I.e., `^` is also associative.<br><br></div></div>Agree with Duncan on including associative operation too. <br><div><br>> Can we handle this by just having a canonical ordering? Or is that too difficult to maintain through various instcombines?<br>


<br></div></div>Yes, its the easiest way to do that. If i am not wrong, what Sean is suggesting is that if we get <br><br></div>something like  <br><div><br>(B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C</div><div><div>

<br></div><div>and we have written pass for pattern <br><div>

<br>(A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C<br><br></div></div><div>we could just swap 'B' and 'A' before matching the pattern i.e. we do canonical re-ordering <br>(Sean, Correct me if i am wrong on my understanding of what you meant by canonical ordering).<br>


</div></div></div></blockquote><div><br></div></div><div>You understood correctly.</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div dir="ltr"><div><div>

<br></div><div>I have seen in the "instcombine" module that many times std::swap was used for the same.<br><br></div><div>But it becomes too much repetitive and somewhat irritating to keep doing this for every pattern.<br>


</div><div class="gmail_extra">Shouldn't we do this in more generic way where we do not have to worry about swapping operands? <br>(How to do it - i don't know for now. I looked at the matchers implementation for modification/improvement, <br>


but didn't find any clue on it).<div><br><br>> It seems to me that we could rejigger Reassociate to reduce the number of possibilities that InstCombine could expect.<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">


</blockquote></div><br></div></div><div class="gmail_extra">Agree with David that infact it should be reassociate pass which should handle this.<br></div><div class="gmail_extra"><br>But following is something interesting and rather worrying things i have found :<br>


</div><div class="gmail_extra">(I have omitted unimportant code and highlighted important one in following example)<br></div><div class="gmail_extra"><u>e.x. </u><br><br>((~A & B) | A)  -->  (A | B) ; Code is implemented for this already<br>


<br></div><div class="gmail_extra"><u>C code :</u><br><br>suyog@suyog-Inspiron-N5010:~$ cat 1.c<br>#include<stdio.h><br>int cal(int a, int b) {<br><b>return ((~a & b) | a);</b><br>}<br><br>int main(){<br>int a, b;<br>


scanf("%d %d", &a, &b);<br>return cal(a,b);<br>}<br><br></div><div class="gmail_extra">LLVM IR at O0 :<br><br>suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O0 -emit-llvm 1.c <br><br>; Function Attrs: nounwind<br>


<b>define i32 @cal(i32 %a, i32 %b) #0 {<br>  %1 = xor i32 %a, -1<br>  %2 = and i32 %1, %b<br>  %3 = or i32 %2, %a<br>  ret i32 %3</b><br>}<br><br>; Function Attrs: nounwind<br>define i32 @main() #0 {<br>  ..<br>  ..<br>  %4 = call i32 @cal(i32 %2, i32 %3)<br>


  ret i32 %4<br>}<br><br></div><div class="gmail_extra"><u>LLVM IR at instcombine :</u><br><br>suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -instcombine 1.ll<br><br><b>; Function Attrs: nounwind<br>define i32 @cal(i32 %a, i32 %b) #0 {<br>


  %1 = or i32 %a, %b<br>  ret i32 %1<br>}</b><br>..<br>..<br>..<br><br></div><div class="gmail_extra">which is OK as per expected transform.<br></div><div class="gmail_extra"><u><br></u></div><div class="gmail_extra"><u>LLVM IR at reassociate and instcombine :<br>


<br></u>suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -reassociate -instcombine 2.ll<u><br><br></u><b>; Function Attrs: nounwind<br>define i32 @cal(i32 %a, i32 %b) #0 {<br>  %1 = xor i32 %a, -1<br>  %2 = and i32 %b, %1<br>


  %3 = or i32 %2, %a<br>  ret i32 %3<br>}</b><br><br>..<br>..<br><br></div><div class="gmail_extra">Reassociate converted (~a&b) to (b&~a) and instruction combine failed to optimize.<br><br></div><div class="gmail_extra">


As evident, sometimes, reassociate pass actually harms the optimization opportunity !!<br><br></div><div class="gmail_extra">Some more interesting findings on how can this affect the final machine code:<br><br></div><div class="gmail_extra">


<u>Test case :</u><br></div><div class="gmail_extra">1.c :<br></div><div class="gmail_extra"><br>#include<stdio.h><br>int cal(int a, int b) {<br><b>return ((~a & b) | a);</b><br>}<br><br>int main(){<br>int a, b;<br>


scanf("%d %d", &a, &b);<br>return cal(a,b);<br>}<br><br></div><div class="gmail_extra"><u>X86 .s file with clang at O2 for above program :</u><br><br>suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c<br>


<br>main:                                   # @main<br># BB#0:<br>    subl    $28, %esp<br>    leal    20(%esp), %eax<br>    movl    %eax, 8(%esp)<br>    leal    24(%esp), %eax<br>    movl    %eax, 4(%esp)<br>    movl    $.L.str, (%esp)<br>


    calll    __isoc99_scanf<br>    movl    20(%esp), %eax<br><b>    orl    24(%esp), %eax</b><br>    addl    $28, %esp<br>    retl<br><br></div><div class="gmail_extra">As seen, optimization happened at IR level itself reflected in .s file.<br>


<br></div><div class="gmail_extra"><u>GCC output for the same:</u><br><br></div><div class="gmail_extra">suyog@suyog-Inspiron-N5010:~$ gcc -S -O2 1.c<br><br>main:<br>.LFB23:<br>    .cfi_startproc<br>    pushl    %ebp<br>


    .cfi_def_cfa_offset 8<br>

    .cfi_offset 5, -8<br>    movl    %esp, %ebp<br>    .cfi_def_cfa_register 5<br>    andl    $-16, %esp<br>    subl    $32, %esp<br>    leal    28(%esp), %eax<br>    movl    %eax, 8(%esp)<br>    leal    24(%esp), %eax<br>


    movl    %eax, 4(%esp)<br>    movl    $.LC0, (%esp)<br>    call    __isoc99_scanf<br>    movl    24(%esp), %eax<br><b>    orl    28(%esp), %eax</b><br>    leave<br>    .cfi_restore 5<br>    .cfi_def_cfa 4, 4<br>    ret<br>


    .cfi_endproc<br></div><div class="gmail_extra"><br>GCC also did the optimization.<br><br></div><div class="gmail_extra">Now we just <b>slightly flip</b> the test case :<br></div><div class="gmail_extra"><u><br></u></div>


<div class="gmail_extra"><u>1.c Test case:<br><br></u>#include<stdio.h><br>int cal(int a, int b) {<br><b>return ((b & ~a) | a);</b><br>}<br><br>int main(){<br>int a, b;<br>scanf("%d %d", &a, &b);<br>


return cal(a,b);<br>}<u><br></u><br><u>X86 .s file with clang at O2 for above program :<br><br></u>suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c<br><br>main:                                   # @main<br>


# BB#0:<br>

    subl    $28, %esp<br>    leal    20(%esp), %eax<br>    movl    %eax, 8(%esp)<br>    leal    24(%esp), %eax<br>    movl    %eax, 4(%esp)<br>    movl    $.L.str, (%esp)<br>    calll    __isoc99_scanf<br>    movl    24(%esp), %ecx<br>


    movl    %ecx, %eax<br>    <b>notl    %eax<br>    andl    20(%esp), %eax<br>    orl    %ecx, %eax</b><br>    addl    $28, %esp<br>    retl<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">We lost the Optimization opportunity here !!<br>


<br></div><div class="gmail_extra"><u>GCC output for the same:</u><br><br>suyog@suyog-Inspiron-N5010:~$ gcc -S -O2 1.c<br><br>main:<br>.LFB23:<br>    .cfi_startproc<br>    pushl    %ebp<br>    .cfi_def_cfa_offset 8<br>    .cfi_offset 5, -8<br>


    movl    %esp, %ebp<br>    .cfi_def_cfa_register 5<br>    andl    $-16, %esp<br>    subl    $32, %esp<br>    leal    28(%esp), %eax<br>    movl    %eax, 8(%esp)<br>    leal    24(%esp), %eax<br>    movl    %eax, 4(%esp)<br>


    movl    $.LC0, (%esp)<br>    call    __isoc99_scanf<br>    movl    24(%esp), %eax<br><b>    orl    28(%esp), %eax</b><br>    leave<br>    .cfi_restore 5<br>    .cfi_def_cfa 4, 4<br>    ret<br>    .cfi_endproc<br><br>


</div>

<div class="gmail_extra">GCC still optimized it !!<br><br></div><div class="gmail_extra">Clearly evident that llvm is losing because of naive approach of pattern matching.<br><br></div><div class="gmail_extra">How can we improve this so that LLVM also recognizes commutative/associative <br>


pattern  efficiently with single 'matchers' call? I am not having any idea so far.<br></div><div class="gmail_extra">I will think more on it. Any help would be really helpful.<br></div><div class="gmail_extra"><br>


</div><div class="gmail_extra">Any suggestions/comments/rectification are most awaited :)<br></div></div></div></blockquote><div><br></div><div><br></div></div></div><div><div>I'm actually sort of wondering if instead of hardcoding all these bitwise simplifications, we can just use a general boolean minimization algorithm.</div>


<div><br></div><div>Is there a reason we don't? Maybe it's just compile time? But eventually (maybe already?) adding enough special case combines will be slower than using the general optimization.</div></div></div>


</div></div></blockquote><div><br></div></div></div><div>Can you please elaborate more on "general boolean minimization algorithm" ? Is it a specific algorithm? I couldn't find any proper </div><div>reference to it. If you can provide a reference for it, i will try to see if i can implement it. </div>


</div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra">


<div class="gmail_quote">


<div><br></div><div>Can K-Map be an option to generalize the logic?</div><div> (I could think of K-Map for now for simplification from my college days :)) </div></div></div></div></blockquote><div><br></div></div></div><div>


Yeah, K-Maps are one way to do it, however K-Maps are really most useful for when your target operations are all NAND/NOR, which isn't the case here (and you can also do Reed-Muller for XOR/XNOR); instcombine has a very different set of "primitive operations" and their relative costs (i.e. XOR, AND, OR, and NOT are usually the base operations and cost the same amount of resources to execute).</div>


<div><br></div><div>Since optimization of binary expressions is of such enormous importance to the electronics industry, this topic is extremely well researched, so there should be a lot to draw from. There may be stuff targeted specifically at software compilers, but I haven't looked in depth (maybe try Muchnick section 12.3 for references? I don't have the book handy).</div>


<div><br></div><div>For starters, you may want to check out Knuth volume 4A, section 7.1. I'm working from memory here, but IIRC there is a whole part about minimization/optimization of binary expressions. Knuth as usual provides tons of references to the literature in case the text there is not enough. You can see preprint copies of volume 4A here: <a href="http://www.cs.utsa.edu/~wagner/knuth/" target="_blank">http://www.cs.utsa.edu/~wagner/knuth/</a> (although Knuth is the sort of book that any programmer should have on their bookshelf; in fact, you can probably justify it as a work expense).</div>


<div><div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra">


<div class="gmail_quote"><div><br></div><div>Also few more things i will like to understand.</div>


<div>From above example i have shown,</div><div><br></div><div>when we have test case as:</div><div><div><br></div><div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">

#include<stdio.h><br>

int cal(int a, int b) {<br><b>return ((~a & b) | a);</b><br>}<br><br>int main(){<br>int a, b;<br>scanf("%d %d", &a, &b);<br>return cal(a,b);<br>}<br> </div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">


<u>X86 .s file with clang at O2 for above program :</u><br></div></div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px"><u><br></u></div></div><div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">


main:                                   # @main<br># BB#0:<br>    subl    $28, %esp<br>    leal    20(%esp), %eax<br>    movl    %eax, 8(%esp)<br>    leal    24(%esp), %eax<br>    movl    %eax, 4(%esp)<br>    movl    $.L.str, (%esp)<br>


    calll    __isoc99_scanf<br>    movl    20(%esp), %eax<br><b>    orl    24(%esp), %eax</b><br>    addl    $28, %esp<br>    retl<u><br></u></div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">


<br></div></div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">As seen, we optimize the pattern.</div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">


<br></div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">but when we have test case as :</div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">


<br></div><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px"><div><div><div class="gmail_extra">#include<stdio.h><br>int cal(int a, int b) {<br><b>return ((b & ~a) | a);</b><br>


}<br><br>int main(){<br>int a, b;<br>scanf("%d %d", &a, &b);<br>return cal(a,b);<br>}<u><br></u><br><u>X86 .s file with clang at O2 for above program :<br><br></u>suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c<br>


<br>main:                                   # @main<br># BB#0:<br>    subl    $28, %esp<br>    leal    20(%esp), %eax<br>    movl    %eax, 8(%esp)<br>    leal    24(%esp), %eax<br>    movl    %eax, 4(%esp)<br>    movl    $.L.str, (%esp)<br>


    calll    __isoc99_scanf<br>    movl    24(%esp), %ecx<br>    movl    %ecx, %eax<br>    <b>notl    %eax<br>    andl    20(%esp), %eax<br>    orl    %ecx, %eax</b><br>    addl    $28, %esp<br>    retl<br></div><div class="gmail_extra">


<br></div></div></div><div> we fail  to optimize.</div><div><br></div><div>Does, 'reassociate' pass always run before 'instcombine' ? </div><div>Theoretically, it should in my opinion. But from the output in first case, it seems it doesn't.</div>


<div>If 'reassociate' would have run before 'instcombine', it would have converted (~a&b) to (b&~a) </div><div>and the pattern would have not been optimized. </div><div><br></div><div>Is there any logic which determines when to run reassociate and when not to run?</div>


</div></div></div></div></blockquote><div><br></div></div></div><div>I don't know off the top of my head. Try running opt with -debug-pass=Arguments to see what the order is (or I think you should be able to pass -debug-pass=Arguments to clang's -mllvm option).</div>


<span><font color="#888888">

<div><br></div><div>-- Sean Silva</div></font></span><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div dir="ltr">

<div class="gmail_extra"><div class="gmail_quote"><div class="gmail_extra" style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">


<div><br></div><div>Duncan, David, Sean - any comment on this are most welcomed :)</div><div><br></div><div>Regards,</div><div>Suyog</div><div><br></div></div></div>

</div></div>

</blockquote></div></div><br></div></div>

</blockquote></div><br><br clear="all"><div><br></div></div></div><div class="">-- <br>With regards,<br>Suyog Sarda<br>

</div></div>

</blockquote></div><br></div></div>