<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Apr 16, 2012, at 3:30 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div class="gmail_quote">On Tue, Apr 17, 2012 at 12:23 AM, Jakob Stoklund Olesen <span dir="ltr"><<a href="mailto:stoklund@2pi.dk">stoklund@2pi.dk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto; ">

I am not sure how best to fix this. If possible, InstCombine's canonicalization shouldn't hide arithmetic progressions behind bit masks.</blockquote></div><br><div>The entire concept of cleverly converting arithmetic to bit masks seems like the perfect domain for DAGCombine instead of InstCombine:</div>

<div><br></div><div>1) We know the architecture, so we can make intelligent decisions about what masks are cheap or expensive.</div><div>2) We know the addressing modes so we can fold arithmetic into them</div><div>3) There are no more high-level optimizations we're trying to enable: gvn, scev, loop opts, other deep math optimizations have all already had their shot at the code.</div>

<div><br></div><div>Does sinking these into the DAGCombine layer help? How much does it break?</div></blockquote><div><br></div><div>I don't know what would break, but DAGCombine already has these tricks:</div><br><div><div>$ cat small2.c </div><div>unsigned f(unsigned x) {</div><div>  return x >> 2 << 3 >> 2 << 5;</div><div>}</div><div><br></div><div>With the shift transforms disabled, we get:</div><div><br></div><div><div>define i32 @f(i32 %x) nounwind uwtable readnone ssp {</div><div>entry:</div><div>  %shr = lshr i32 %x, 2</div><div>  %shl = shl i32 %shr, 3</div><div>  %shr1 = lshr exact i32 %shl, 2</div><div>  %shl2 = shl i32 %shr1, 5</div><div>  ret i32 %shl2</div><div>}</div></div><div><br></div><div>But DAGCombine goes:</div><div><br></div></div><div><div><span class="Apple-tab-span" style="white-space:pre">    </span>shll<span class="Apple-tab-span" style="white-space:pre">        </span>$4, %edi</div><div><span class="Apple-tab-span" style="white-space:pre">     </span>andl<span class="Apple-tab-span" style="white-space:pre">        </span>$-64, %edi</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>%edi, %eax</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>ret</div><div><br></div><div>And you are right, we only get the bit masks when it is worthwhile.</div><div><br></div><div>/jakob</div><div><br></div></div></div></body></html>