[PATCH] D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads

Mon Aug 26 04:55:32 PDT 2019

Ayal added a comment.

> There's some obvious room for API cleanup in the patch, but before I iterate on that, I want to make sure I'm approaching the problem the "right way" and that my attempt at clarifying comments are actually correct. :)

Yes, this approach of refining SafePointe[r]s, originally designed to express derefereceability, LGTM.

It may be better to place isSafeToLoadUnconditionallyInLoop() in Load.{h,cpp} rather than in LoopVectorizationLegality.cpp; this needs to also work with FoldTail / IsAnnotatedParallel; and not sure about extending the load reasoning to stores etc., but all those can be part of "API cleanup".

Added some minor comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:20
 #include "llvm/Analysis/VectorUtils.h"
+#include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/IntrinsicInst.h"
----------------
nit: lex ordering (the above admittedly sets an unordered example..)

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:950
+  // We have to adjust the size down by one iteration as the Loads.cpp adds one
+  // back.  TODO: cleanup
+  APInt Size(DL.getIndexTypeSizeInBits(Ptr->getType()),
----------------
I'm a bit confused by Loads.cpp's comment, wrt `Size`:

```
/// This uses the pointee type to determine how many bytes need to be safe to
/// load from the pointer.
```

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:976
+  // unconditionally within the loop loop header without introducing a new
+  // fault. 
   SmallPtrSet<Value *, 8> SafePointes;
----------------
Definitely better explanation, committable irrespective of this patch as NFC.

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:981
   for (BasicBlock *BB : TheLoop->blocks()) {
-    if (blockNeedsPredication(BB))
+    if (blockNeedsPredication(BB)) {
+      // For a block which requires predication, a address may be safe to
----------------
The original if blockNeedsPredication(BB) condition was an early-continue; now more suitable that its complementary condition be an early-continue instead.
OTOH, can fuse the two "for (Instruction &I : *BB) if (auto *Ptr = ...)" loops together, with an early-continue if (!blockNeedsPredication(BB)) inside - or w/o it - in case isSafeToSpeculativelyExecute() will return true just as fast(?).
(Can check if Ptr is already in SafePointe[r]s to retain an early-continue ;-).

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:982
+    if (blockNeedsPredication(BB)) {
+      // For a block which requires predication, a address may be safe to
+      // access in the loop w/o predication if we can prove dereferenceability
----------------
typo: a[n] address

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:990
+          // The choice of header vs preheader is deliberate here to allow loop
+          // varying addresses which can still be proven.
+          Instruction *HeaderFirstNonPHI =
----------------
Sure, we're not trying to hoist/LICM the load out of the loop.

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:993
+            TheLoop->getHeader()->getFirstNonPHI();
+          if (isSafeToSpeculativelyExecute(&I, HeaderFirstNonPHI, DT)) {
+            SafePointes.insert(Ptr);
----------------
Does this "generic version" add anything, given that it returns false for stores, and does it account for the addresses accessed along all iterations of the loop?

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1008
+              LI->getFunction()->hasFnAttribute(Attribute::SanitizeHWAddress))
+            continue;
+          auto &DL = LI->getModule()->getDataLayout();
----------------
Fold these conditions (of the generic version) into isSafeToLoadUnconditionallyInLoop()?

================
Comment at: test/Transforms/LoopVectorize/load-deref-pred.ll:17
+; CHECK-NEXT:    call void @init(i32* [[BASE]])
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
----------------
xbolva00 wrote:
> br i1 false
> 
> Vectorizer should cleanup it a bit..
Vectorizer leaves it and others like it for subsequent passes to clean up.

================
Comment at: test/Transforms/LoopVectorize/load-deref-pred.ll:89
+; CHECK-NEXT:    ret i32 [[ACCUM_NEXT_LCSSA]]
+;
+entry:
----------------
This "for (i=0; i<4096; ++i) {val = 0; if (i < len) val = a[i]; acc += val}" loop is indeed probably a minimal example of a speculative load.
(It does deserve a better optimization though - that of trimming the loop bound ;-)

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66688/new/

https://reviews.llvm.org/D66688