[PATCH] D58123: GlobalISel: Implement moreElementsVector for bit ops
Roman Tereshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 16 14:03:01 PST 2019
rtereshin added a comment.
Hi Matt,
thank you for looking into this!
Please take a look at a few comments below, the ones starting with "A nitpick:" are just notes, we don't have to discuss or fix them. The others are more interesting.
Thanks,
Roman
================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:767
+void LegalizerHelper::moreElementsVectorSrc(MachineInstr &MI, LLT WideTy,
+ unsigned OpIdx) {
----------------
A nitpick: at the callee side `MoreTy` name is used, which seems fitting. Maybe sync the implementation with it?
================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:792
+ unsigned WideReg = MRI.createGenericVirtualRegister(WideTy);
+ unsigned ImpDef = MIRBuilder.buildUndef(WideTy)->getOperand(0).getReg();
+ MIRBuilder.buildInsert(WideReg, ImpDef, MO.getReg(), 0);
----------------
A nitpick: `.getReg(0)` here as well may be?
================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:793
+ unsigned ImpDef = MIRBuilder.buildUndef(WideTy)->getOperand(0).getReg();
+ MIRBuilder.buildInsert(WideReg, ImpDef, MO.getReg(), 0);
+ MO.setReg(WideReg);
----------------
I'm looking at `test_and_v3s8` below, and thinking, maybe if we do this via an unmerge / merge pair, the results would be better, as these ops appear to be more likely to disappear as artifacts than such an insert, that splices a vector into another vector.
The intermediate "scalar" size doesn't have to be scalar, but could be a vector with a number of lanes being GCD of the original and target number of lines, like
v2s16 = G_IMPLICIT_DEF
v2s16, v2s16, v2s16 = G_UNMERGE_VALUES v6s16
v8s16 = G_CONCAT_VECTORS v2s16, v2s16, v2s16, v2s16
If it is a scalar, the "merge" component will be a G_BUILD_VECTOR, of course.
What do you think?
================
Comment at: lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:155
+ .fewerElementsIf(vectorWiderThan(0, 32), halfSizeVector(0))
+ .moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
.scalarize(0);
----------------
The largest scalar size of a vector this rule could possibly fire on is 10 bits, or if we stick to the powers of 2, 8 bits (1 vectors aren't supported, 2 vector is even => 3 is the smallest number of lanes. To be small, has to be 31 bits wide or less => 10 bit is the largest scalar size).
So it would turn V3S8 to V4S8 for instance. Which is not legal, not scalar, not wider than 32 bits, not odd -> it will be scalarized, and then the scalars widened to S32.
So, the resulting ops if the type we start with is V3S8 are 4 32-bit ops (4 x S32), one of which is on undef, unmerge / merge pair, 4 32-bit exts, and 4 32-bit truncs. It's either worse or the same as it would be if we just remove this moreElementsIf rule completely, I suspect.
Maybe you have intended making V4S8 legal? After all, these are bitwise ops with lanes being independent.
================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir:223
; CHECK: [[DEF2:%[0-9]+]]:_(<8 x s32>) = G_IMPLICIT_DEF
- ; CHECK: [[INSERT:%[0-9]+]]:_(<8 x s32>) = G_INSERT [[DEF2]], [[BUILD_VECTOR]](<5 x s32>), 0
+ ; CHECK: [[INSERT:%[0-9]+]]:_(<8 x s32>) = G_INSERT [[DEF2]], [[AND]](<5 x s32>), 0
; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = COPY [[INSERT]](<8 x s32>)
----------------
This test wasn't legalized at all.
I suspect this is because `fewerElementsVectorBasic`, being called to narrow 5 vector to 3 vector here, does this:
```lang=c++
const unsigned NarrowSize = NarrowTy.getSizeInBits(); // 96
const unsigned Size = DstTy.getSizeInBits(); // 160
const int NumParts = Size / NarrowSize; // 1
const LLT EltTy = DstTy.getElementType(); // s32
const unsigned EltSize = EltTy.getSizeInBits(); // 32
const unsigned BitsForNumParts = NarrowSize * NumParts; // 96
// Check if we have any leftovers. If we do, then only handle the case where
// the leftover is one element.
if (BitsForNumParts != Size && BitsForNumParts + EltSize != Size) // 96 != 160 && 96 + 32 != 160
return UnableToLegalize;
```
The above works for v3s32, but not for any more elements of an odd number.
================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir:341
+ $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %4
+...
+
----------------
Same, this is not legalized at all.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D58123/new/
https://reviews.llvm.org/D58123
More information about the llvm-commits
mailing list