<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Volkan,<div class=""><br class=""></div><div class="">> Did you try to remove that part and run the LoadStoreVectorizer tests?</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Yes, it looks like SCEV still doesn't handle all the cases, primarily due to the fact that ScalarEvolution::isSCEVExprNeverPoison strips off quite a few of the nsw/nuw flags.</div><div class=""><br class=""></div><div class=""><div class="">Here's the explantion for the behavior:</div><div class=""><br class=""></div><div class="">+    /// Return true if the SCEV corresponding to \p I is never poison.  Proving</div><div class="">+    /// this is more complex than proving that just \p I is never poison, since</div><div class="">+    /// SCEV commons expressions across control flow, and you can have cases</div><div class="">+    /// like:</div><div class="">+    ///</div><div class="">+    ///   idx0 = a + b;</div><div class="">+    ///   ptr[idx0] = 100;</div><div class="">+    ///   if (<condition>) {</div><div class="">+    ///     idx1 = a +nsw b;</div><div class="">+    ///     ptr[idx1] = 200;</div><div class="">+    ///   }</div><div class="">+    ///</div><div class="">+    /// where the SCEV expression (+ a b) is guaranteed to not be poison (and</div><div class="">+    /// hence not sign-overflow) only if "<condition>" is true.  Since both</div><div class="">+    /// `idx0` and `idx1` will be mapped to the same SCEV expression, (+ a b),</div><div class="">+    /// it is not okay to annotate (+ a b) with <nsw> in the above example.</div><div class="">+    bool isSCEVExprNeverPoison(const Instruction *I);</div><div class="">(<a href="https://github.com/llvm-mirror/llvm/blob/cfc86d498749291230353638b8336901db3f1b40/include/llvm/Analysis/ScalarEvolution.h#L1764-L1780" class="">https://github.com/llvm-mirror/llvm/blob/cfc86d498749291230353638b8336901db3f1b40/include/llvm/Analysis/ScalarEvolution.h#L1764-L1780</a>)</div><div class=""><br class=""></div><div class="">I'm looking into this to see what we can do.</div><div class=""><br class=""></div><div class="">Roman</div><div><br class=""><blockquote type="cite" class=""><div class="">On Jul 3, 2018, at 2:02 PM, Volkan Keles <<a href="mailto:vkeles@apple.com" class="">vkeles@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""></div></blockquote><blockquote type="cite" class="">Looks like that part looks through only a sext/zext, so it should be okay to remove. Did you try to remove that part and run the LoadStoreVectorizer tests?<br class=""></blockquote><blockquote type="cite" class=""><br class="">Volkan</blockquote><blockquote type="cite" class=""><div class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">On Jul 2, 2018, at 2:53 PM, Roman Tereshin via Phabricator <<a href="mailto:reviews@reviews.llvm.org" class="">reviews@reviews.llvm.org</a>> wrote:<br class=""><br class="">rtereshin created this revision.<br class="">rtereshin added reviewers: sanjoy, mzolotukhin, volkan.<br class="">Herald added subscribers: javed.absar, tpr.<br class=""><br class="">if D + (C-D + x + ...) could be proven to not unsigned wrap<br class="">where D is the maximum such D that D <= C (unsigned)<br class=""><br class="">using KnownBits to prove that it's safe to do so (there is no wrapping).<br class=""><br class="">This enables better canonicalization of expressions like<br class=""><br class="">1 + zext(5 + 20 * %x + 24 * %y)  and<br class="">    zext(6 + 20 * %x + 24 * %y)<br class=""><br class="">which get both transformed to<br class=""><br class="">2 + zext(4 + 20 * %x + 24 * %y)<br class=""><br class="">This pattern is common in address arithmetics and the transformation<br class="">makes it easier for passes like LoadStoreVectorizer to prove that 2 or<br class="">more memory accesses are consecutive and optimize (vectorize) them.<br class=""><br class="">I found this change similar to a number of other changes to Scalar Evolution, namely:<br class=""><br class="">commit 63c52aea76b530d155ec6913d5c3bbe1ecd82ad8<br class="">Author: Sanjoy Das <<a href="mailto:sanjoy@playingwithpointers.com" class="">sanjoy@playingwithpointers.com</a>><br class="">Date:   Thu Oct 22 19:57:38 2015 +0000<br class=""><br class="">    [SCEV] Commute zero extends through <nuw> additions<br class=""><br class="">    git-svn-id: <a href="https://llvm.org/svn/llvm-project/llvm/trunk@251052" class="">https://llvm.org/svn/llvm-project/llvm/trunk@251052</a> 91177308-0d34-0410-b5e6-96231b3b80d8<br class=""><br class="">commit 3edd5bf90828613bacfdc2ce047d3776363123e5<br class="">Author: Justin Lebar <<a href="mailto:jlebar@google.com" class="">jlebar@google.com</a>><br class="">Date:   Thu Jun 14 17:13:48 2018 +0000<br class=""><br class="">    [SCEV] Simplify zext/trunc idiom that appears when handling bitmasks.<br class=""><br class="">    Summary:<br class="">    Specifically, we transform<br class=""><br class="">      zext(2^K * (trunc X to iN)) to iM -><br class="">      2^K * (zext(trunc X to i{N-K}) to iM)<nuw><br class=""><br class="">    This is helpful because pulling the 2^K out of the zext allows further<br class="">    optimizations.<br class=""><br class="">    Reviewers: sanjoy<br class=""><br class="">    Subscribers: hiraditya, llvm-commits, timshen<br class=""><br class="">    Differential Revision: <a href="https://reviews.llvm.org/D48158" class="">https://reviews.llvm.org/D48158</a><br class=""><br class="">and the most relevant<br class=""><br class="">commit 45788be6e2603ecfc149f43df1a6d5e04c5734d8<br class="">Author: Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a>><br class="">Date:   Sat May 24 08:09:57 2014 +0000<br class=""><br class="">    Implement sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and<br class="">    sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar<br class="">    Evolution.<br class=""><br class="">    That helps SLP-vectorizer to recognize consecutive loads/stores.<br class=""><br class="">    <<a href="rdar://problem/14860614" class="">rdar://problem/14860614</a>><br class=""><br class="">    git-svn-id: <a href="https://llvm.org/svn/llvm-project/llvm/trunk@209568" class="">https://llvm.org/svn/llvm-project/llvm/trunk@209568</a> 91177308-0d34-0410-b5e6-96231b3b80d8<br class=""><br class="">The latter seems to be achieving almost the same effect but for sign extensions and in a different way,<br class="">also looking mostly at SLP-vectorizer as the main client instead of LoadStoreVectorizer.<br class=""><br class="">I'm not sure if these 2 solutions could be generalized. I think using `KnownBits` is a bit more more generic as<br class=""><br class="">`ext(C1 + C2*X) -> ext(C1) + ext(C2*X), where C2 is a power of 2 and C1 < C2` seems to be a specific sub-case of<br class="">`ext(C1 + x + y + ...) -> ext(C2) + ext(C1-C2 + x + y + ...), where (x + y + ...) could be proven to have n least significant bits to be zeros, C2 <= C1 and C2 < 2^n`<br class=""><br class="">In most cases "(x + y + ...) could be proven to have n least significant bits to be zeros" is possible because the expression has a form of<br class="">`D1 * 2^n * X  +  D2 * 2^n * Y + ...`, but it also could be an aligned pointer for example (see the tests included).<br class=""><br class="">@mkazantsev Hi, I'm not sure I'm using `Depth` parameters here the best way possible, could you please take a look at that?<br class=""><br class="">@rampitec, @arsenm Hi, could you please see how this affects AMDGPU target? As well as<br class="">@volkan - I’m not sure we really need all the custom logic in `Vectorizer::isConsecutiveAccess` anymore - everything that goes after<br class=""></blockquote><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Looks like that part looks through only a sext/zext, so it should be okay to remove. Did you try to remove that part and run the LoadStoreVectorizer tests?</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Volkan</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class="">343   // Sometimes even this doesn't work, because SCEV can't always see through<br class="">344   // patterns that look like (gep (ext (add (shl X, C1), C2))). Try checking<br class="">345   // things the hard way.<br class=""><br class="">looks like SCEV will handle just that now and in a more generic way: the current custom logic in `isConsecutiveAccess` for instance doesn't expect nested GEPs (a GEP of a GEP),<br class="">as well as it doesn't expect any additional math between a GEP and an extend so it's fairly fragile, the exact reason why the LoadStoreVectorize tests included fail w/o this patch.<br class=""><br class=""><br class="">Repository:<br class="">rL LLVM<br class=""><br class=""><a href="https://reviews.llvm.org/D48853" class="">https://reviews.llvm.org/D48853</a><br class=""><br class="">Files:<br class="">include/llvm/Analysis/ScalarEvolution.h<br class="">lib/Analysis/ScalarEvolution.cpp<br class="">test/Analysis/ScalarEvolution/no-wrap-add-exprs.ll<br class="">test/Transforms/LoadStoreVectorizer/X86/codegenprepare-produced-address-math.ll<br class=""><br class=""><D48853.153796.patch></blockquote></div></blockquote></div><br class=""></div></body></html>