[PATCH] D59703: Optimize masked.loads and masked.gathers with a single active lane

Philip Reames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 22 10:18:28 PDT 2019


reames planned changes to this revision.
reames marked 3 inline comments as done.
reames added a comment.

In D59703#1473037 <https://reviews.llvm.org/D59703#1473037>, @spatel wrote:

> These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.


You can view them that way, or not.  I was looking at this as "what was needed to decompose a single element gather from a dereferenceable address", but I can see your point.  I'll split.



================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1195
 // TODO, Obvious Missing Transforms:
-// * Dereferenceable address -> speculative load/select
+// * Single active lane to scalar masked load
 // * Narrow width by halfs excluding zero/undef lanes
----------------
jdoerfert wrote:
> Isn't that implemented below (in this patch)?
Nope, that's the gather case.  Analogous, but distinct intrinsics.


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1275
+    unsigned Idx = DemandedElts.countTrailingZeros();
+    auto &B = IC.Builder;
+    auto *PtrLane = B.CreateExtractElement(II.getArgOperand(0), Idx);
----------------
jdoerfert wrote:
> I would have preferred an actual type here (and in one or two other places below where it is not obvious). However, I don't know if this is the "standard" in this part of the code base.
I'm happy to change to explicit types, which did you find non-obvious?


================
Comment at: test/Transforms/InstCombine/masked_intrinsics.ll:181
+; CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x double*> [[PTRS]] to <2 x <1 x double>*>
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x <1 x double>*> [[BC]], i64 0
+; CHECK-NEXT:    [[UNMASKEDLOAD:%.*]] = load <1 x double>, <1 x double>* [[TMP1]], align 4
----------------
jdoerfert wrote:
> The zero here is the lane, correct? It looks "like a special case", even if it's not.
> 
> Could we copy this test, change the width to 8 and gather lane 5? That should make sure it works "in general" with minimal effort on your part.
> 
> Idk if there is, or if we need, a negative test as well, e.g., two active lanes in a constant mask.
Will do.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59703/new/

https://reviews.llvm.org/D59703





More information about the llvm-commits mailing list