[PATCH] D60975: Convert a masked.gather of at most one element to a masked.load
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 23 12:35:57 PDT 2019
spatel added a subscriber: efriedma.
spatel added a comment.
In the generic case, we're creating a masked load of a scalar (although as noted inline, there's apparently no support or test for that pattern). But I don't know what the codegen for a <1 x X> masked load would be (do we cmp/br around the load since it's not speculatable?). Someone else with masked load lowering knowledge (@craig.topper @efriedma ?) should also review this.
If the motivating case really is a constant mask, then I think it's best to limit this patch to that pattern alone and create the scalar load directly.
================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1234
+ // then the masked.load combines will convert it to a simple load.
+ if (DemandedElts.isPowerOf2()) {
+ // Note: APInt indexes the bit vector from LSB to MSB, thus
----------------
'countPopulation() == 1' - more direct translation of the code comment?
================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1251-1252
+ B.CreateVectorSplat(1, PTLane));
+ auto *E = B.CreateExtractElement(ML, (uint64_t)0);
+ auto *Res = B.CreateInsertElement(PassThrough, E, Idx);
+ return IC.replaceInstUsesWith(II, Res);
----------------
Change 'auto' to 'Value' for all of these Builder calls to avoid ambiguity.
================
Comment at: test/Transforms/InstCombine/masked_intrinsics.ll:204-206
define <4 x double> @gather_lane2(double* %base, double %pt) {
; CHECK-LABEL: @gather_lane2(
; CHECK-NEXT: [[PTRS:%.*]] = getelementptr double, double* [[BASE:%.*]], <4 x i64> <i64 undef, i64 undef, i64 2, i64 undef>
----------------
It's fine to show the expected follow-on transform, but we should have a minimal test for this transform alone:
```
define <4 x double> @gather_lane0(<4 x double*> %ptrs, <4 x i1> %mask, <4 x double> %passthru) {
%mask_with_at_most_1_bit_set = and <4 x i1> %mask, <i1 true, i1 false, i1 false, i1 false>
%res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> %ptrs, i32 4, <4 x i1> %mask_with_at_most_1_bit_set, <4 x double> %passthru)
ret <4 x double> %res
}
```
But there's not enough analysis power in possiblyDemandedEltsInMask() to get this?
Could at least reduce the test to this?
```
define <4 x double> @gather_lane1(<4 x double*> %ptrs, <4 x double> %passthru) {
%res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> %ptrs, i32 4, <4 x i1> <i1 false, i1 true, i1 false, i1 false>, <4 x double> %passthru)
ret <4 x double> %res
}
```
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D60975/new/
https://reviews.llvm.org/D60975
More information about the llvm-commits
mailing list