[PATCH] D158724: [AArch64][LoopVectorize] Add truncated store values to list of types for widening

Thu Aug 24 05:07:41 PDT 2023

david-arm added a comment.

Thanks @Rin - this looks like a useful improvement! In truncate-type-widening.ll we're clearly now choosing a more natural VF that we can generate code for more efficiently, in particular the step vector intrinsic call in the preheader.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5727
       // Examine the stored values.
-      if (auto *ST = dyn_cast<StoreInst>(&I))
+      if (auto *ST = dyn_cast<StoreInst>(&I)) {
         T = ST->getValueOperand()->getType();
----------------
I wonder if we should also be asking if the store operand is loop invariant too? This would avoid tests changing such as lvm/test/Transforms/LoopVectorize/vplan-stress-test-no-explict-vf.ll. If the input is loop invariant then it's not really participating in the vector loop.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5729
         T = ST->getValueOperand()->getType();
+        if (isa<TruncInst>(ST->getOperand(0))) {
+          auto *castTrunc = dyn_cast<CastInst>(ST->getOperand(0));
----------------
I think you can write this more simply as

```if (auto *Trunc = dyn_cast<TruncInst>(ST->getOperand(0)))
  T = Trunc->getSrcTy();
```

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5730
+        if (isa<TruncInst>(ST->getOperand(0))) {
+          auto *castTrunc = dyn_cast<CastInst>(ST->getOperand(0));
+          T = castTrunc->getSrcTy();
----------------
nit: Normally variables in LLVM start with a capital, i.e. `CastTrunc`

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S < %s -passes=loop-vectorize -force-vector-interleave=1 -mtriple aarch64-linux-gnu -mattr=+sve -sve-tail-folding=disabled 2>&1 | FileCheck %s
+
----------------
It looks like we're still using tail-folding despite passing in `-sve-tail-folding=disabled`, which I think is because the vectoriser knows the trip count is low. Perhaps you can just remove the flag?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll:96
+
+for.cond.cleanup.loopexit:                        ; preds = %for.body
+  br label %for.cond.cleanup
----------------
I think you can delete this block and just let everything jump directly to `%for.cond.cleanup`

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll:102
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
----------------
Could you rewrite the blocks in a more natural order, i.e. entry, for.body.preheader, for.body, for.cond.cleanup?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158724/new/

https://reviews.llvm.org/D158724