[PATCH] D116276: [IndVarS] Keep the nsw/nuw flags after simplifyAndExtend

guopeilin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 29 02:04:15 PST 2021


guopeilin added a comment.

In D116276#3209662 <https://reviews.llvm.org/D116276#3209662>, @nikic wrote:

> Can you explain why SCEVExpander does not preserve the nowrap flag in this case? Assuming it is present on the wide IV, I would have expect it to also get expanded as such.

Sorry for the late reply.  The SCEVExpander(Rewriter) is used to create the widen IV and its widenIVUse(). As for the IV increment, we just use the following code:

  WideInc =
        cast<Instruction>(WidePhi->getIncomingValueForBlock(LatchBlock));

Of course in this way, we cannot preserve the nowrap flag cause both the `WidePhi` and the `IncomingValueForBlock` do not contain the flag.
I guess the reason why we don`t need to preserve this flag previously is that the `AddRec` is computed from the `OrigPhi`, which is like the following:

  {((sext i32 %arg1 to i64) + (sext i32 %arg2 to i64)),+,(sext i32 %arg2 to i64)}<nsw><%body>

So, at this moment, it does not matter whether the increased instruction contains the flag because the SCEV is right.
However, during the optimization pipeline, we may call the SE->forgetLoop() to drop the cache value and recompute from scratch. At that moment, since we have lost the `NSW` flag, then the `BackedgeTakenCount` would be `CouldNotCompute`, which will prevent vectorizing.
With this patch, function `s122` and function `s172` in TSVC now can be vectorized. Following is the source code of s122:

  real_t s122(struct args_t * func_args)
  {
  //    induction variable recognition
  //    variable lower and upper bound, and stride
  //    reverse data access and jump in data access
      struct{int a;int b;} * x = func_args->arg_info;
      int n1 = x->a;
      int n3 = x->b;
      initialise_arrays(__func__);
      int j, k;
  #pragma clang loop vectorize(assume_safety)
      for (int nl = 0; nl < iterations; nl++) {
          j = 1;
          k = 0;
          for (int i = n1-1; i < LEN_1D; i += n3) {
              k += j;
              a[i] += b[LEN_1D - k];
          }
      }
  }

Also, I guess it is ok that if an i32-IV has nowrap flag, its corresponding widen-IV has the same nowrap flag.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116276/new/

https://reviews.llvm.org/D116276



More information about the llvm-commits mailing list