[PATCH] D59703: Optimize masked.loads and masked.gathers with a single active lane

Johannes Doerfert via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 19 13:34:06 PDT 2019


jdoerfert added a comment.

I'm not really an expert in InstCombine but it looks good to me. Someone else should probably take a look as well. I added some minor comments though.



================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1195
 // TODO, Obvious Missing Transforms:
-// * Dereferenceable address -> speculative load/select
+// * Single active lane to scalar masked load
 // * Narrow width by halfs excluding zero/undef lanes
----------------
Isn't that implemented below (in this patch)?


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1275
+    unsigned Idx = DemandedElts.countTrailingZeros();
+    auto &B = IC.Builder;
+    auto *PtrLane = B.CreateExtractElement(II.getArgOperand(0), Idx);
----------------
I would have preferred an actual type here (and in one or two other places below where it is not obvious). However, I don't know if this is the "standard" in this part of the code base.


================
Comment at: test/Transforms/InstCombine/masked_intrinsics.ll:181
+; CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x double*> [[PTRS]] to <2 x <1 x double>*>
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x <1 x double>*> [[BC]], i64 0
+; CHECK-NEXT:    [[UNMASKEDLOAD:%.*]] = load <1 x double>, <1 x double>* [[TMP1]], align 4
----------------
The zero here is the lane, correct? It looks "like a special case", even if it's not.

Could we copy this test, change the width to 8 and gather lane 5? That should make sure it works "in general" with minimal effort on your part.

Idk if there is, or if we need, a negative test as well, e.g., two active lanes in a constant mask.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59703/new/

https://reviews.llvm.org/D59703





More information about the llvm-commits mailing list