<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Sanjay,<div class=""><br class=""></div><div class="">Please see my comments below</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="">For any of the examples shown below, if the logical equivalent using cmp + other IR instructions is no more than the number of IR instructions as the variant that uses shift, we should consider reversing the canonicalization.</div></div></blockquote><div class=""><br class=""></div><div class="">I understand that by “reversing a canonicalisation” you mean replacing it by a different one in InstCombine. Is this what you mean?.</div><div class="">Only in a very few cases this is possible. The one that you showed seem to be one of them, but they are not the majority. If the criteria for “canonicalization” is minimal number of IR instructions, then it’s not possible to reverse all the transforms to shifts in InstCombine. My suggestion would be to define a set of concrete rules what would define what’s a “canonical” form, other than minimal number or IR.  </div><br class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="">To make that happen, you would need to show that at least the minimal cases have codegen that is equal or better using the cmp form for at least a few in-tree targets. My guess is that we already have DAGCombine code that handles some of these. </div></div></blockquote><div class=""><br class=""></div><div class="">Unfortunately, my tests showed that DAGcombine does not currently handle this well. I found that there’s a number of InstCombine transforms that are not available in DAGCombine. This means that it is necessary to add new code to handle them in DAGCombine.</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="">Then, you would need to reverse the transform in instcombine and see what happens in the regression tests there (it will probably expose missing transforms in instcombine).</div></div></blockquote><br class=""></div><div class="">I don’t understand this statement. I think it’s DAGCombine, not InstCombine, what suffers from missing transforms. Please can you clarify?</div><div class=""><br class=""></div><div class="">Finally, what you suggest is about what I have been stating as my preferred approach for long, and I searched support for it. Unfortunately, I can’t do this alone, I’m now disabled and this demands a lot more work than I can reasonably do. Test cases are implemented in ways that are tight and fragile, even simple changes will break them, which will add even more work. My intermediate solution would be ready and working in a couple of days, but if it can’t be accepted, then I don’t have another choice than kindly ask you to try to obtain some support from community members who would be willing to work on this. I will help with what I can, as I’m convinced that this would be a great improvement for LLVM and I have a strong interest on it, but I just can't do it all.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class=""><br class=""></div><div class="">John.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 14 Nov 2019, at 21:46, Sanjay Patel <<a href="mailto:spatel@rotateright.com" class="">spatel@rotateright.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">For any of the examples shown below, if the logical equivalent using cmp + other IR instructions is no more than the number of IR instructions as the variant that uses shift, we should consider reversing the canonicalization.</div><div class="">To make that happen, you would need to show that at least the minimal cases have codegen that is equal or better using the cmp form for at least a few in-tree targets. My guess is that we already have DAGCombine code that handles some of these. Then, you would need to reverse the transform in instcombine and see what happens in the regression tests there (it will probably expose missing transforms in instcombine).<br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 14, 2019 at 1:31 PM Joan Lluch <<a href="mailto:joan.lluch@icloud.com" class="">joan.lluch@icloud.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div dir="auto" style="overflow-wrap: break-word;" class=""><div dir="auto" style="overflow-wrap: break-word;" class=""><div dir="auto" style="overflow-wrap: break-word;" class="">Hi Spatel,<div class=""><br class=""></div><div class="">Thanks for that.</div><div class=""><br class=""></div><div class="">Well, ultimately, my preferred approach would be that the canonical form was the icmp, not the shift. There are already cases where C code shifts are converted into icmp, such as this one:</div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><span style="color:rgb(186,45,162)" class="">void</span> isNotNegativeUsingBigShift_r(<span style="color:rgb(186,45,162)" class="">short</span> x, <span style="color:rgb(186,45,162)" class="">short</span> *r) {</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  <span style="color:rgb(186,45,162)" class="">if</span> ((<span style="color:rgb(186,45,162)" class="">unsigned</span> <span style="color:rgb(186,45,162)" class="">short</span>)(~x) >> <span style="color:rgb(39,42,216)" class="">15</span> ) *r = <span style="color:rgb(39,42,216)" class="">1</span>;</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">}</div></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div class="">this gets compiled into this:</div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">define void @isNotNegativeUsingBigShift_r(i16 signext %x, i16* nocapture %r) {</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">entry:</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  %tobool = icmp sgt i16 %x, -1</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  br i1 %tobool, label %if.then, label %if.end</div><div style="margin:0px;line-height:normal;background-color:rgb(255,255,255);min-height:14px" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">if.then:                                          ; preds = %entry</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  store i16 1, i16* %r, align 2, !tbaa !2</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  br label %if.end</div><div style="margin:0px;line-height:normal;background-color:rgb(255,255,255);min-height:14px" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">if.end:                                           ; preds = %if.then, %entry</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  ret void</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">}</div></div><div class=""><br class=""></div><div class="">I think that icmps and selects are more “target independent” than shifts. Instead of in InstCombine, transforms into shifts would be created in DAGCombine and TargetLowering, which already incorporate some quite aggressive transforms.</div><div class=""><br class=""></div><div class="">As an 'experiment', I ran a small number of cases to see how well DAGCombine deals with code that is normally converted into shifts by InstCombine. I mean, I disabled InstCombine for these cases and watched what the DAGCombine is able to do with the resulting IR. Well, although DAGCombine is able to create some shifts by itself, it does not currently handle some of the cases normally handled by InstCombine, so the reliance on InstCombine to get some shifts emitted is still strong.  </div><div class=""><br class=""></div><div class="">I understand the objection to an instCombine option. I truly understand that, seriously. And I kind of dislike that option too, which is why I am trying to openly expose my reasons for that.</div><div class=""><br class=""></div><div class="">As per your question, the following are the ‘undesirable' InstCombine transforms that I identified:</div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;color:rgb(0,132,0);background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class=""><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> testSimplifySetCC_0( </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> x )  </span>// 904 (InstCombineCasts::transformZExtICmp)</div><div style="margin: 0px; line-height: normal;" class="">{</div><div style="margin: 0px; line-height: normal;" class="">  <span style="color:rgb(186,45,162)" class="">return</span> (x & <span style="color:rgb(39,42,216)" class="">32</span>) != <span style="color:rgb(39,42,216)" class="">0</span>;</div><div style="margin: 0px; line-height: normal;" class="">}</div></div></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class="">define i16 @testSimplifySetCC_0(i16 %x) {</div><div style="margin:0px;line-height:normal" class="">entry:</div><div style="margin:0px;line-height:normal" class="">  %and = lshr i16 %x, 5</div><div style="margin:0px;line-height:normal" class="">  %and.lobit = and i16 %and, 1</div><div style="margin:0px;line-height:normal" class="">  ret i16 %and.lobit</div><div style="margin:0px;line-height:normal" class="">}</div></div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;color:rgb(0,132,0);background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class=""><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> testSExtICmp_0( </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> x )  </span>// 1274 (InstCombineCasts:transformSExtICmp)</div><div style="margin: 0px; line-height: normal;" class="">{</div><div style="margin: 0px; line-height: normal;" class="">  <span style="color:rgb(186,45,162)" class="">return</span> (x & <span style="color:rgb(39,42,216)" class="">32</span>) ? -<span style="color:rgb(39,42,216)" class="">1</span> : <span style="color:rgb(39,42,216)" class="">0</span>;</div><div style="margin: 0px; line-height: normal;" class="">}</div></div></div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class="">define i16 @testSExtICmp_0(i16 %x) {</div><div style="margin:0px;line-height:normal" class="">entry:</div><div style="margin:0px;line-height:normal" class="">  %0 = shl i16 %x, 10</div><div style="margin:0px;line-height:normal" class="">  %sext = ashr i16 %0, 15</div><div style="margin:0px;line-height:normal" class="">  ret i16 %sext</div><div style="margin:0px;line-height:normal" class="">}</div></div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;color:rgb(0,132,0);background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class=""><div style="margin:0px;line-height:normal" class=""><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> testExtendSignBit_0( </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> x )  </span>// 1239 (InstCombineCasts::transformSExtICmp)</div><div style="margin: 0px; line-height: normal;" class="">{</div><div style="margin: 0px; line-height: normal;" class="">  <span style="color:rgb(186,45,162)" class="">return</span> x<<span style="color:rgb(39,42,216)" class="">0</span> ? <span style="color:rgb(39,42,216)" class="">0</span> : -<span style="color:rgb(39,42,216)" class="">1</span>;</div><div style="margin: 0px; line-height: normal;" class="">}</div></div></div></div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class="">define i16 @testExtendSignBit_0(i16 %x) {</div><div style="margin:0px;line-height:normal" class="">entry:</div><div style="margin:0px;line-height:normal" class="">  %x.lobit = ashr i16 %x, 15</div><div style="margin:0px;line-height:normal" class="">  %x.lobit.not = xor i16 %x.lobit, -1</div><div style="margin:0px;line-height:normal" class="">  ret i16 %x.lobit.not</div><div style="margin:0px;line-height:normal" class="">}</div></div></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal;color:rgb(0,132,0)" class=""><div style="margin:0px;line-height:normal" class=""><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> testExtendSignBit_1( </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> x )  </span>// 861 (InstCombineCasts::transformZExtICmp)</div><div style="margin: 0px; line-height: normal;" class="">{</div><div style="margin: 0px; line-height: normal;" class="">  <span style="color:rgb(186,45,162)" class="">return</span> x>-<span style="color:rgb(39,42,216)" class="">1</span> ? <span style="color:rgb(39,42,216)" class="">1</span> : <span style="color:rgb(39,42,216)" class="">0</span>;</div><div style="margin: 0px; line-height: normal;" class="">}</div></div><div style="margin:0px;line-height:normal" class=""><br class=""></div><div style="margin:0px;line-height:normal" class=""><div style="margin:0px;line-height:normal" class=""><div style="margin:0px;line-height:normal" class="">define i16 @testExtendSignBit_1(i16 %x) {</div><div style="margin:0px;line-height:normal" class="">entry:</div><div style="margin:0px;line-height:normal" class="">  %x.lobit = lshr i16 %x, 15</div><div style="margin:0px;line-height:normal" class="">  %x.lobit.not = xor i16 %x.lobit, 1</div><div style="margin:0px;line-height:normal" class="">  ret i16 %x.lobit.not</div><div style="margin:0px;line-height:normal" class="">}</div></div></div></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal;color:rgb(0,132,0)" class=""><div style="margin:0px;line-height:normal" class=""><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> testShiftAnd_1( </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> x )   </span>// 132 (InstCombineSelect foldSelectICmpAnd)</div><div style="margin: 0px; line-height: normal;" class="">{</div><div style="margin: 0px; line-height: normal;" class="">  <span style="color:rgb(186,45,162)" class="">return</span> x<<span style="color:rgb(39,42,216)" class="">0</span> ? <span style="color:rgb(39,42,216)" class="">2</span> : <span style="color:rgb(39,42,216)" class="">0</span>;</div><div style="margin: 0px; line-height: normal;" class="">}</div></div></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><br class=""></div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class=""><div style="margin:0px;line-height:normal" class=""><div style="margin:0px;line-height:normal" class="">define i16 @testShiftAnd_1(i16 %x) {</div><div style="margin:0px;line-height:normal" class="">entry:</div><div style="margin:0px;line-height:normal" class="">  %0 = lshr i16 %x, 14</div><div style="margin:0px;line-height:normal" class="">  %1 = and i16 %0, 2</div><div style="margin:0px;line-height:normal" class="">  ret i16 %1</div><div style="margin:0px;line-height:normal" class="">}</div></div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;color:rgb(0,132,0);background-color:rgb(255,255,255)" class=""><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> foldSelectICmpAndOr( </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> x, </span><span style="color:rgb(186,45,162)" class="">int</span><span style="" class=""> y )    </span>// 600 (InstCombineSelect foldSelectICmpAndOr)</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">{</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  <span style="color:rgb(186,45,162)" class="">return</span> (x & <span style="color:rgb(39,42,216)" class="">2048</span>) ? (y | <span style="color:rgb(39,42,216)" class="">2</span>) : y;</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">}</div></div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">define i16 @foldSelectICmpAndOr(i16 %x, i16 %y) {</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">entry:</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  %and = lshr i16 %x, 10</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  %0 = and i16 %and, 2</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  %1 = or i16 %0, %y</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">  ret i16 %1</div><div style="margin:0px;font-size:11px;line-height:normal;font-family:Monaco;background-color:rgb(255,255,255)" class="">}</div></div><div class=""><br class=""></div><div class="">There are possibly a few more, and there are also variations of the same, but the above should be the most common ones.  </div><div class=""><br class=""></div><div class="">Thanks,</div><div class=""><br class=""></div><div class="">John</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""> <br class=""><div class=""><blockquote type="cite" class=""><div class="">On 14 Nov 2019, at 00:05, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank" class="">spatel@rotateright.com</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class=""><div class="">As before, I'm not convinced that we want to allow target-based enable/disable in instcombine for performance. That undermines having a target-independent canonical form in the 1st place.<br class=""></div><div class=""><br class=""></div><div class="">It's not clear to me what the remaining motivating cases look like. If you could post those here or as bugs, I think you'd have a better chance of finding an answer.<br class=""></div><div class=""><br class=""></div><div class="">Let's take a minimal example starting in C and compiling for MSP430 since that's what we have used as a public approximation of your target:<br class=""></div><div class=""><br class=""></div><div class="">short isNotNegativeUsingBigShift(short x) {<br class="">  return (unsigned short)(~x) >> 15;<br class="">}<br class=""><br class="">short isNotNegativeUsingCmp(short x) {<br class="">  return x > -1;<br class="">}<br class=""></div><div class=""><br class=""></div><div class="">Currently, we will canonicalize these to the shift form (but you could argue that is backwards). <br class=""></div><div class="">Alive proof for logical equivalence:</div><div class=""><a href="https://rise4fun.com/Alive/uGH" target="_blank" class="">https://rise4fun.com/Alive/uGH</a></div><div class=""><br class=""></div><div class="">If we disable the instcombine for this, we would have IR like this:</div><div class=""><br class=""></div><div class="">define signext i16 @isNotNegativeUsingShift(i16 signext %x) {<br class="">  %signbit = lshr i16 %x, 15<br class="">  %r = xor i16 %signbit, 1<br class="">  ret i16 %r<br class="">}<br class=""><br class="">define signext i16 @isNotNegativeUsingCmp(i16 signext %x) {<br class="">  %cmp = icmp sgt i16 %x, -1<br class="">  %r = zext i1 %cmp to i16<br class="">  ret i16 %r<br class="">}<br class=""><br class=""></div><div class="">And compile that for MSP430:</div><div class="">$ ./llc -o - -mtriple=msp430 shift.ll<br class="">isNotNegativeUsingShift:                ; @isNotNegativeUsingShift<br class="">; %bb.0:<br class=""> inv     r12<br class="">        swpb    r12<br class="">        mov.b   r12, r12<br class="">   clrc<br class="">       rrc     r12<br class="">        rra     r12<br class="">        rra     r12<br class="">        rra     r12<br class="">        rra     r12<br class="">        rra     r12<br class="">        rra     r12<br class="">        ret<br class=""><br class=""></div><div class="">isNotNegativeUsingCmp:                  ; @isNotNegativeUsingCmp<br class="">; %bb.0:<br class="">  mov     r12, r13<br class="">   mov     #1, r12<br class="">    tst     r13<br class="">        jge     .LBB1_2<br class="">; %bb.1:<br class="">       clr     r12<br class="">.LBB1_2:<br class="">   ret<br class=""></div><div class=""><br class=""></div><div class="">How do you intend to optimize code that is written in the 1st form? Or is that not allowed in some way?<br class=""> </div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 13, 2019 at 5:52 AM Joan Lluch via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Hi Roman,<div class=""><br class=""></div><div class="">Thanks for your input.</div><div class=""><br class=""></div><div class="">The subject of reverse transformations was discussed before (it’s even mentioned on my reference message below) and I think there was a general agreement that it’s best to avoid reversals if the issue can be dealt with in a better way from the origins. I also understood that there was a general support about /moving/ as much transformations as possible from InstCombine to DAGCombine, although this is a major goal and not the subject of this.</div><div class=""><br class=""></div><div class="">This proposal does not aim to remove all shifts, this is just not possible or even desirable. All targets have shifts, however large amount shifts can be particularly expensive for some targets, as they require a large sequence of instructions to complete them.</div><div class=""><br class=""></div><div class="">We only want to prevent NEW shifts that are emitted as a consequence of transformations. LLVM tends to act too easily at creating new shifts in circumstances where they are not desirable for some targets. We just want to improve on this.</div><div class=""><br class=""></div><div class="">Finally, this does not aim at all to create a different canonical representation. This was also mentioned on the reference message.</div><div class=""><br class=""></div><div class="">I understood from the very beginning that this proposal could be controversial, and I still think that the ultimate solution would be to move a lot of InstCombine into DAGCombine. However, the latter is a major goal with strong impacts on all targets that would require really strong support and hard work from many community members. I’m advocating for something in the middle that would solve the problem for the affected targets with ZERO impact for the non-affected ones.</div><div class=""><br class=""></div><div class="">I hope this helps to clarify it.</div><div class=""><br class=""></div><div class="">Thanks again,</div><div class=""><br class=""></div><div class="">John</div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 13 Nov 2019, at 10:39, Roman Lebedev <<a href="mailto:lebedev.ri@gmail.com" target="_blank" class="">lebedev.ri@gmail.com</a>> wrote:</div><br class=""><div class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">On Wed, Nov 13, 2019 at 12:26 PM Joan Lluch via llvm-dev</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class=""><</span><a href="mailto:llvm-dev@lists.llvm.org" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">llvm-dev@lists.llvm.org</a><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">> wrote:</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><br class="">Hi All,<br class=""><br class="">In relation to the subject of this message I got my first round of patches successfully reviewed and committed. As a matter of reference, they are the following:<br class=""><br class=""><a href="https://reviews.llvm.org/D69116" target="_blank" class="">https://reviews.llvm.org/D69116</a><br class=""><a href="https://reviews.llvm.org/D69120" target="_blank" class="">https://reviews.llvm.org/D69120</a><br class=""><a href="https://reviews.llvm.org/D69326" target="_blank" class="">https://reviews.llvm.org/D69326</a><br class=""><a href="https://reviews.llvm.org/D70042" target="_blank" class="">https://reviews.llvm.org/D70042</a><br class=""><br class="">They provided hooks in TargetLowering and DAGCombine that enable interested targets to implement a filter for expensive shift operations. The patches work by preventing certain transformations that would result in expensive code for these targets.<br class=""><br class="">I want to express my gratitude to the LLVM community and particularly to members @spatel and @asl who have directly followed, helped with, and reviewed these patches.<br class=""></blockquote><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">This is half of what’s required to get the full benefits. As I exposed before, in order to get this fully functional, we need to do some work on InstCombine. This is because some of the transformations that we want to avoid, are created earlier in InstCombine, thus deactivating the patches above.<br class=""><br class="">My general proposal when I started this (quoted below for reference), was to implement a command line option that would act on InstCombine by bypassing (preventing) certain transformations. I still think that this is the easier and safer way to obtain the desired goals, but I want to subject that to the consideration of the community again to make sure I am on the right track.<br class=""><br class="">My current concrete proposal is to add a command line option (boolean) that I would name “enable-shift-relaxation” or just  “relax-shifts”. This option would act in several places in InstCombineCasts and in InstCombineSelect with the described effects.<br class=""></blockquote><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">I'm not really sold on this part, for the reasons previously discussed.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">This is only going to avoid creating such shifts, in passes that will</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">be adjusted.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">This will not completely ban such shifts, meaning they still can exist.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">Which means this will only partially prevent 'degrading' existing IR.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">What about the ones that were already present in the original input</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">(from C code, e.g.)?</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">I think you just want to add an inverse set of DAGCombine transforms,</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">also guarded with that target hook you added. That way there's no chance</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">to still end up with unfavorable shifts on your target, and no middle-end</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">impact from having more than one canonical representation.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">I also need to ask about the best way to present tests cases for that. I learned how to create test files for codegen transforms (IR to Assembly), but now I will be working on the “target Independent” side. For my internal work, I have manually been testing C-code to IR generation, but I do not know how to create proper test cases for the llvm project. Any help on this would be appreciated.<br class=""><br class="">Thanks in advance<br class=""><br class="">John<br class=""></blockquote><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">Roman</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><blockquote type="cite" class="">On 8 Oct 2019, at 00:22, Joan Lluch <<a href="mailto:joan.lluch@icloud.com" target="_blank" class="">joan.lluch@icloud.com</a>> wrote:<br class=""><br class="">Hi All,<br class=""><br class="">While implementing a custom 16 bit target for academical and demonstration purposes, I unexpectedly found that LLVM was not really ready for 8 bit and 16 bit targets. Let me expose why.<br class=""><br class="">Target backends can be divided into two major categories, with essentially nothing in between:<br class=""><br class="">Type 1: The big 32 or 64 bit targets. Heavily pipelined with expensive branches, running at clock frequencies up to the GHZ range. Aimed at workstations, computers or smartphones. For example PowerPC, x86 and ARM.<br class=""><br class="">Type 2: The 8 or 16 bit targets. Non-pipelined processors, running at frequencies on the MHz range, generally fast access to memory, aimed at the embedded marked or low consumption applications (they are virtually everywhere). LLVM currently implements an experimental AVR target and the MSP430.<br class=""><br class="">LLVM does a great for Type 1 targets, but it can be improved for Type 2 targets.<br class=""><br class="">The essential target feature that makes one way of code generation better for either type 1 or type 2 targets, is pipelining. For type 1 we want branching to be avoided for as much as possible. Turning branching code into sequential instructions with the execution of speculative code is advantageous. These targets have instruction sets that help with that goal, in particular cheap ‘shifts’ and ‘cmove' type instructions.<br class=""><br class="">Type 2 targets, on the contrary, have cheap branching. Their instruction set is not particularly designed to assist branching avoidance because that’s not required. In fact, branching on these targets is often desirable, as opposed to transforms creating expensive speculative execution. ‘Shifts’ are only one-single-bit, and conditional execution instructions other than branches are not available.<br class=""><br class="">The current situation is that some LLVM target-independent optimisations are not really that ‘independent' when we bring type 2 targets into the mix. Unfortunately, LLVM was apparently designed with type 1 targets in mind alone, which causes degraded code for type 2 targets. In relation to this, I posted a couple of bug reports that show some of these issues:<br class=""><br class=""><a href="https://bugs.llvm.org/show_bug.cgi?id=43542" target="_blank" class="">https://bugs.llvm.org/show_bug.cgi?id=43542</a><br class=""><a href="https://bugs.llvm.org/show_bug.cgi?id=43559" target="_blank" class="">https://bugs.llvm.org/show_bug.cgi?id=43559</a><br class=""><br class="">The first bug is already fixed by somebody who also suggested me to raise this subject on the llvm-dev mailing list, which I’m doing now.<br class=""><br class="">Incidentally, most code degradations happen on the DAGCombine code. It’s a bug because LLVM may create transforms into instructions that are not Legal for some targets. Such transforms are detrimental on those targets. This bug won't show for most targets, but it is nonetheless particularly affecting targets with no native shifts support. The bug consists on the transformation of already relatively cheap code to expensive one. The fix prevents that.<br class=""><br class="">Still, although the above DAGCombine code gets fixed, the poor code generation issue will REMAIN. In fact, the same kind of transformations are performed earlier as part of the IR optimisations, in the InstCombine pass. The result is that the IR /already/ incorporates the undesirable transformations for type 2 targets, which DAGCombine can't do anything about.<br class=""><br class="">At this point, reverse pattern matching looks as the obvious solution, but I think it’s not the right one because that would need to be implemented on every single current or future (type 2) target. It is also difficult to get rid of undesired transforms when they carry complexity, or are the result or consecutive combinations. Delegating the whole solution to only reverse pattern matching code, will just perpetuate the overall problem, which will continue affecting future target developments. Some reverse pattern matching is acceptable and desirable to deal with very specific target features, but not as a global solution to this problem.<br class=""><br class="">On a previous email, a statement was posted that in recent years attempts have been made to remove code from InstCombine and port it to DAGCombiner. I agree that this is a good thing to do, but it was reportedly difficult and associated with potential problems or unanticipated regressions. I understand those concerns and I acknowledge the involved work as challenging. However, in order to solve the presented problem, some work is still required in InstCombine.<br class=""><br class="">Therefore, I wondered if something in between could still be done, so this is my proposal: There are already many command line compiler options that modify IR output in several ways. Some options are even target dependent, and some targets even explicitly set them (In RenderTargetOptions). The InstCombine code, has itself its own small set of options, for example "instcombine-maxarray-size” or "instcombine-code-sinking”. Command line compiler options produce functionally equivalent IR output, while respecting stablished canonicalizations. In all cases, the output is just valid IR code in a proper form that depends on the selected options. As an example -O0 produces a very different output than -O3, or -Os, all of them are valid as the input to any target backend. My suggestion would be to incorporate a compiler option acting on the InstCombine pass. The option would improve IR code aimed at Type 2 targets. Of course, this option would not be enabled by default so the IR output would remain exactly as it is today if not explicitly enabled.<br class=""><br class="">What this option would need to do in practice is really easy and straightforward. Just bypassing (avoiding) certain transformations that might be considered harmful for targets benefiting from it. I performed some simple tests, specially directed at the InstCombineSelect transformations, and I found them to work great and generating greatly improved code for both the MSP430 and AVR targets.<br class=""><br class="">Now, I am aware that this proposal might come a bit unexpected and even regarded as inelegant or undesirable, but maybe after some careful balancing of pros and cons, it is just what we need to do, if we really care about LLVM as a viable platform for 8 and 16 bit targets. As stated earlier, It’s easy to implement, it’s just an optional compiler setting not affecting major targets at all, and the future extend of it can be gradually defined or agreed upon as it is put into operation.  Any views would be appreciated.<br class=""><br class="">John.<br class=""><br class=""><br class=""><br class=""></blockquote><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></blockquote></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">

LLVM Developers mailing list<br class="">

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class="">

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br class="">

</blockquote></div>

</div></blockquote></div><br class=""></div></div></div></div></div></blockquote></div></div>

</div></blockquote></div><br class=""></div></div></body></html>