<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Hi James, </div><div><br></div><div>We don’t generate new shuffles because we don’t have a good cost model for shuffles.  The last time we discussed it was in this thread:</div><div><br></div><div><a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130429/173217.html">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130429/173217.html</a></div><div><br></div><div>Also, InstCombine should canonicalize, not optimize.  </div><div><br></div><div><blockquote type="cite"><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div dir="ltr"><div><div>Now, there are many clever things InstCombine could do with shufflevector. The two I have in my queue at the moment are:</div><div>  1) Where there's an insertelement into vector B that comes direct from an extractelement of vector A, and vector A's length is less than vector B's, create a shuffle to extend A then another shuffle to perform the equivalent of extract/insertelement.</div></div></div></div></blockquote><div><br></div><div>This won’t work for x86 because it has vector registers of different sizes (512, 256 and 128).  If this is profitable it should be done per-target in SelectionDAG where the target information is available. </div><br><blockquote type="cite"><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div dir="ltr"><div><div>  2) Where two shuffles' masks could combine to make a monotonically increasing sequence, perform the combination.</div><div><br></div></div></div></div></blockquote><div><br></div><div>This is okay, assuming that:</div><div><br></div><div>1.  There are no additional users to the shuffles.</div><div>2.  The new shuffle is a NOP, and can be deleted. </div><div><br></div><br><blockquote type="cite"><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div dir="ltr"><div><div>Both of the above have caveats that can't be said in one sentence, but they're basically rewriting common front-end patterns to make shuffles that correspond to vector extension (VEXT instructions in ARM) or concatenation of subvectors.</div><div><br></div><div>Now, I think these would both be of use to any architecture that has decent shufflevector support, and InstCombine seems like the right place for it. But if InstCombine is supposed to be conservative, where should these optimizations go?</div><div><br></div></div></div></div></blockquote><div><br></div><div>DAGCombine. </div><div><br></div><div>Thanks,</div><div>Nadav</div></div><br></body></html>