[PATCH] D60975: Convert a masked.gather of at most one element to a masked.load

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 23 12:35:57 PDT 2019


spatel added a subscriber: efriedma.
spatel added a comment.

In the generic case, we're creating a masked load of a scalar (although as noted inline, there's apparently no support or test for that pattern). But I don't know what the codegen for a <1 x X> masked load would be (do we cmp/br around the load since it's not speculatable?). Someone else with masked load lowering knowledge (@craig.topper @efriedma ?) should also review this.

If the motivating case really is a constant mask, then I think it's best to limit this patch to that pattern alone and create the scalar load directly.



================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1234
+  // then the masked.load combines will convert it to a simple load.
+  if (DemandedElts.isPowerOf2()) {
+    // Note: APInt indexes the bit vector from LSB to MSB, thus
----------------
'countPopulation() == 1'  - more direct translation of the code comment?


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1251-1252
+                                  B.CreateVectorSplat(1, PTLane));
+    auto *E = B.CreateExtractElement(ML, (uint64_t)0);
+    auto *Res = B.CreateInsertElement(PassThrough, E, Idx);
+    return IC.replaceInstUsesWith(II, Res);
----------------
Change 'auto' to 'Value' for all of these Builder calls to avoid ambiguity. 


================
Comment at: test/Transforms/InstCombine/masked_intrinsics.ll:204-206
 define <4 x double> @gather_lane2(double* %base, double %pt)  {
 ; CHECK-LABEL: @gather_lane2(
 ; CHECK-NEXT:    [[PTRS:%.*]] = getelementptr double, double* [[BASE:%.*]], <4 x i64> <i64 undef, i64 undef, i64 2, i64 undef>
----------------
It's fine to show the expected follow-on transform, but we should have a minimal test for this transform alone:


```
define <4 x double> @gather_lane0(<4 x double*> %ptrs, <4 x i1> %mask, <4 x double> %passthru) {
  %mask_with_at_most_1_bit_set = and <4 x i1> %mask, <i1 true, i1 false, i1 false, i1 false>
  %res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> %ptrs, i32 4, <4 x i1> %mask_with_at_most_1_bit_set, <4 x double> %passthru)
  ret <4 x double> %res
}

```
But there's not enough analysis power in possiblyDemandedEltsInMask() to get this?

Could at least reduce the test to this?

```
define <4 x double> @gather_lane1(<4 x double*> %ptrs, <4 x double> %passthru)  {
  %res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> %ptrs, i32 4, <4 x i1> <i1 false, i1 true, i1 false, i1 false>, <4 x double> %passthru)
 ret <4 x double> %res
}

```


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60975/new/

https://reviews.llvm.org/D60975





More information about the llvm-commits mailing list