<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 16, 2015, at 2:20 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=windows-1252" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">+Chandler</div><br class=""><div class=""><div class="">On Jan 16, 2015, at 1:57 PM, Fiona Glaser <<a href="mailto:fglaser@apple.com" class="">fglaser@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite" class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">> 1. Run your patch through clang-format please. The patch does not follow the LLVM formatting guidelines.</div><div class=""><br class=""></div><div class="">Done, and changed per Mehdi’s suggestions.</div><div class=""><br class=""></div><div class="">> 2. What is the impact of this on arm64 and armv7s generated code? Although the approach makes sense to me, I want to be sure we do not degrade other targets. Note that I do not expect you to run tests if you cannot :).</div><div class=""><br class=""></div><div class="">I don’t think it should even affect any target that doesn’t have canonical vector sizes of both N and 2*N, for a data type of N/2 or smaller. Otherwise the case this patch targets can’t come up.</div></div></blockquote><div class=""><br class=""></div><div class="">I do not quite follow the condition on the data type, but regarding the vector sizes, for instance, both v2i32 and v4i32 are legal IIRC on ARM, which would indicate that the optimization can kick in there, unless I am missing something of course.</div></div></div></div></blockquote><div><br class=""></div><div>The case comes up when you have a concat of two vectors of size N/2, creating a vector of size N, which is then concerted with undef to create a 2*N vector.</div><div><br class=""></div><div>I figure it -should- at least help if it somehow did come up on ARM, since it effectively converts 128-bit shuffles to 64-bit shuffles in that case.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">3. What are the runtime performance impact on x86_64, with and without -mavx2?<br class=""></blockquote></div></blockquote><div class=""><br class=""></div><div class="">I’m not sure in general; this affects a few very specific vector constructs that were being pessimized.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Right, but I would have liked some empirical evidences. Sometimes we have surprises with our lowering even when the IR/DAG is supposed to be better :).</div></div></div></div></blockquote><br class=""></div><div>I’m new to this; what’s the typical way of demonstrating this? I tried the llvm external test suite but the test noise is vastly too high to make solid conclusions about performance.</div><div><br class=""></div><div>Fiona</div></body></html>