[PATCH] D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 26 04:55:32 PDT 2019
Ayal added a comment.
> There's some obvious room for API cleanup in the patch, but before I iterate on that, I want to make sure I'm approaching the problem the "right way" and that my attempt at clarifying comments are actually correct. :)
Yes, this approach of refining SafePointe[r]s, originally designed to express derefereceability, LGTM.
It may be better to place isSafeToLoadUnconditionallyInLoop() in Load.{h,cpp} rather than in LoopVectorizationLegality.cpp; this needs to also work with FoldTail / IsAnnotatedParallel; and not sure about extending the load reasoning to stores etc., but all those can be part of "API cleanup".
Added some minor comments.
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:20
#include "llvm/Analysis/VectorUtils.h"
+#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/IntrinsicInst.h"
----------------
nit: lex ordering (the above admittedly sets an unordered example..)
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:950
+ // We have to adjust the size down by one iteration as the Loads.cpp adds one
+ // back. TODO: cleanup
+ APInt Size(DL.getIndexTypeSizeInBits(Ptr->getType()),
----------------
I'm a bit confused by Loads.cpp's comment, wrt `Size`:
```
/// This uses the pointee type to determine how many bytes need to be safe to
/// load from the pointer.
```
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:976
+ // unconditionally within the loop loop header without introducing a new
+ // fault.
SmallPtrSet<Value *, 8> SafePointes;
----------------
Definitely better explanation, committable irrespective of this patch as NFC.
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:981
for (BasicBlock *BB : TheLoop->blocks()) {
- if (blockNeedsPredication(BB))
+ if (blockNeedsPredication(BB)) {
+ // For a block which requires predication, a address may be safe to
----------------
The original if blockNeedsPredication(BB) condition was an early-continue; now more suitable that its complementary condition be an early-continue instead.
OTOH, can fuse the two "for (Instruction &I : *BB) if (auto *Ptr = ...)" loops together, with an early-continue if (!blockNeedsPredication(BB)) inside - or w/o it - in case isSafeToSpeculativelyExecute() will return true just as fast(?).
(Can check if Ptr is already in SafePointe[r]s to retain an early-continue ;-).
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:982
+ if (blockNeedsPredication(BB)) {
+ // For a block which requires predication, a address may be safe to
+ // access in the loop w/o predication if we can prove dereferenceability
----------------
typo: a[n] address
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:990
+ // The choice of header vs preheader is deliberate here to allow loop
+ // varying addresses which can still be proven.
+ Instruction *HeaderFirstNonPHI =
----------------
Sure, we're not trying to hoist/LICM the load out of the loop.
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:993
+ TheLoop->getHeader()->getFirstNonPHI();
+ if (isSafeToSpeculativelyExecute(&I, HeaderFirstNonPHI, DT)) {
+ SafePointes.insert(Ptr);
----------------
Does this "generic version" add anything, given that it returns false for stores, and does it account for the addresses accessed along all iterations of the loop?
================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1008
+ LI->getFunction()->hasFnAttribute(Attribute::SanitizeHWAddress))
+ continue;
+ auto &DL = LI->getModule()->getDataLayout();
----------------
Fold these conditions (of the generic version) into isSafeToLoadUnconditionallyInLoop()?
================
Comment at: test/Transforms/LoopVectorize/load-deref-pred.ll:17
+; CHECK-NEXT: call void @init(i32* [[BASE]])
+; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK: vector.ph:
----------------
xbolva00 wrote:
> br i1 false
>
> Vectorizer should cleanup it a bit..
Vectorizer leaves it and others like it for subsequent passes to clean up.
================
Comment at: test/Transforms/LoopVectorize/load-deref-pred.ll:89
+; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
+;
+entry:
----------------
This "for (i=0; i<4096; ++i) {val = 0; if (i < len) val = a[i]; acc += val}" loop is indeed probably a minimal example of a speculative load.
(It does deserve a better optimization though - that of trimming the loop bound ;-)
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66688/new/
https://reviews.llvm.org/D66688
More information about the llvm-commits
mailing list