<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On May 5, 2013, at 7:50 PM, Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Alternatively, we could use TTI so that only inexpensive shuffles are formed by these optimizations.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></blockquote></div><br><div>We can't estimate which shuffles are inexpensive. We have 1000 lines of c++ logic to figure this out during the x86 lowering. The second problem is that InstCombine is a canonicalization pass and not a lowering pass. Only lowering passes should use TTI. If we had a target independent way of estimating the cost of shuffles then DAGCombine would be the right place. </div><div><br></div><div>Thanks,</div><div>Nadav </div></body></html>