[PATCH] D48853: [SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw> transform

Mon Jul 2 14:53:35 PDT 2018

rtereshin created this revision.
rtereshin added reviewers: sanjoy, mzolotukhin, volkan.
Herald added subscribers: javed.absar, tpr.

if D + (C-D + x + ...) could be proven to not unsigned wrap
where D is the maximum such D that D <= C (unsigned)

using KnownBits to prove that it's safe to do so (there is no wrapping).

This enables better canonicalization of expressions like

  1 + zext(5 + 20 * %x + 24 * %y)  and
      zext(6 + 20 * %x + 24 * %y)

which get both transformed to

  2 + zext(4 + 20 * %x + 24 * %y)

This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.

I found this change similar to a number of other changes to Scalar Evolution, namely:

  commit 63c52aea76b530d155ec6913d5c3bbe1ecd82ad8
  Author: Sanjoy Das <sanjoy at playingwithpointers.com>
  Date:   Thu Oct 22 19:57:38 2015 +0000

      [SCEV] Commute zero extends through <nuw> additions

      git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251052 91177308-0d34-0410-b5e6-96231b3b80d8

  commit 3edd5bf90828613bacfdc2ce047d3776363123e5
  Author: Justin Lebar <jlebar at google.com>
  Date:   Thu Jun 14 17:13:48 2018 +0000

      [SCEV] Simplify zext/trunc idiom that appears when handling bitmasks.

      Summary:
      Specifically, we transform

        zext(2^K * (trunc X to iN)) to iM ->
        2^K * (zext(trunc X to i{N-K}) to iM)<nuw>

      This is helpful because pulling the 2^K out of the zext allows further
      optimizations.

      Reviewers: sanjoy

      Subscribers: hiraditya, llvm-commits, timshen

      Differential Revision: https://reviews.llvm.org/D48158

and the most relevant

  commit 45788be6e2603ecfc149f43df1a6d5e04c5734d8
  Author: Michael Zolotukhin <mzolotukhin at apple.com>
  Date:   Sat May 24 08:09:57 2014 +0000

      Implement sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
      sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar
      Evolution.

      That helps SLP-vectorizer to recognize consecutive loads/stores.

      <rdar://problem/14860614>

      git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209568 91177308-0d34-0410-b5e6-96231b3b80d8

The latter seems to be achieving almost the same effect but for sign extensions and in a different way,
also looking mostly at SLP-vectorizer as the main client instead of LoadStoreVectorizer.

I'm not sure if these 2 solutions could be generalized. I think using `KnownBits` is a bit more more generic as

`ext(C1 + C2*X) -> ext(C1) + ext(C2*X), where C2 is a power of 2 and C1 < C2` seems to be a specific sub-case of
`ext(C1 + x + y + ...) -> ext(C2) + ext(C1-C2 + x + y + ...), where (x + y + ...) could be proven to have n least significant bits to be zeros, C2 <= C1 and C2 < 2^n`

In most cases "(x + y + ...) could be proven to have n least significant bits to be zeros" is possible because the expression has a form of
`D1 * 2^n * X  +  D2 * 2^n * Y + ...`, but it also could be an aligned pointer for example (see the tests included).

@mkazantsev Hi, I'm not sure I'm using `Depth` parameters here the best way possible, could you please take a look at that?

@rampitec, @arsenm Hi, could you please see how this affects AMDGPU target? As well as
@volkan - I'm not sure we really need all the custom logic in `Vectorizer::isConsecutiveAccess` anymore - everything that goes after

  343   // Sometimes even this doesn't work, because SCEV can't always see through
  344   // patterns that look like (gep (ext (add (shl X, C1), C2))). Try checking
  345   // things the hard way.

looks like SCEV will handle just that now and in a more generic way: the current custom logic in `isConsecutiveAccess` for instance doesn't expect nested GEPs (a GEP of a GEP),
as well as it doesn't expect any additional math between a GEP and an extend so it's fairly fragile, the exact reason why the LoadStoreVectorize tests included fail w/o this patch.

Repository:
  rL LLVM

https://reviews.llvm.org/D48853

Files:
  include/llvm/Analysis/ScalarEvolution.h
  lib/Analysis/ScalarEvolution.cpp
  test/Analysis/ScalarEvolution/no-wrap-add-exprs.ll
  test/Transforms/LoadStoreVectorizer/X86/codegenprepare-produced-address-math.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D48853.153796.patch
Type: text/x-patch
Size: 11523 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180702/f61522f2/attachment.bin>