[PATCH] D58123: GlobalISel: Implement moreElementsVector for bit ops

Sat Feb 16 14:03:01 PST 2019

rtereshin added a comment.

Hi Matt,

thank you for looking into this!

Please take a look at a few comments below, the ones starting with "A nitpick:" are just notes, we don't have to discuss or fix them. The others are more interesting.

Thanks,
Roman

================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:767

+void LegalizerHelper::moreElementsVectorSrc(MachineInstr &MI, LLT WideTy,
+                                            unsigned OpIdx) {
----------------
A nitpick: at the callee side `MoreTy` name is used, which seems fitting. Maybe sync the implementation with it?

================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:792
+  unsigned WideReg = MRI.createGenericVirtualRegister(WideTy);
+  unsigned ImpDef = MIRBuilder.buildUndef(WideTy)->getOperand(0).getReg();
+  MIRBuilder.buildInsert(WideReg, ImpDef, MO.getReg(), 0);
----------------
A nitpick: `.getReg(0)` here as well may be?

================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:793
+  unsigned ImpDef = MIRBuilder.buildUndef(WideTy)->getOperand(0).getReg();
+  MIRBuilder.buildInsert(WideReg, ImpDef, MO.getReg(), 0);
+  MO.setReg(WideReg);
----------------
I'm looking at `test_and_v3s8` below, and thinking, maybe if we do this via an unmerge / merge pair, the results would be better, as these ops appear to be more likely to disappear as artifacts than such an insert, that splices a vector into another vector.

The intermediate "scalar" size doesn't have to be scalar, but could be a vector with a number of lanes being GCD of the original and target number of lines, like

v2s16 = G_IMPLICIT_DEF
v2s16, v2s16, v2s16 = G_UNMERGE_VALUES v6s16
v8s16 = G_CONCAT_VECTORS v2s16, v2s16, v2s16, v2s16

If it is a scalar, the "merge" component will be a G_BUILD_VECTOR, of course. 

What do you think?

================
Comment at: lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:155
+    .fewerElementsIf(vectorWiderThan(0, 32), halfSizeVector(0))
+    .moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
     .scalarize(0);
----------------
The largest scalar size of a vector this rule could possibly fire on is 10 bits, or if we stick to the powers of 2, 8 bits (1 vectors aren't supported, 2 vector is even => 3 is the smallest number of lanes. To be small, has to be 31 bits wide or less => 10 bit is the largest scalar size).

So it would turn V3S8 to V4S8 for instance. Which is not legal, not scalar, not wider than 32 bits, not odd -> it will be scalarized,  and then the scalars widened to S32.

So, the resulting ops if the type we start with is V3S8 are 4 32-bit ops (4 x S32), one of which is on undef, unmerge / merge pair, 4 32-bit exts, and 4 32-bit truncs. It's either worse or the same as it would be if we just remove this moreElementsIf rule completely, I suspect.

Maybe you have intended making V4S8 legal? After all, these are bitwise ops with lanes being independent.

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir:223
     ; CHECK: [[DEF2:%[0-9]+]]:_(<8 x s32>) = G_IMPLICIT_DEF
-    ; CHECK: [[INSERT:%[0-9]+]]:_(<8 x s32>) = G_INSERT [[DEF2]], [[BUILD_VECTOR]](<5 x s32>), 0
+    ; CHECK: [[INSERT:%[0-9]+]]:_(<8 x s32>) = G_INSERT [[DEF2]], [[AND]](<5 x s32>), 0
     ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = COPY [[INSERT]](<8 x s32>)
----------------
This test wasn't legalized at all.

I suspect this is because `fewerElementsVectorBasic`, being called to narrow 5 vector to 3 vector here, does this:
```lang=c++
  const unsigned NarrowSize = NarrowTy.getSizeInBits();  // 96
  const unsigned Size = DstTy.getSizeInBits();  // 160
  const int NumParts = Size / NarrowSize;  // 1
  const LLT EltTy = DstTy.getElementType();  // s32
  const unsigned EltSize = EltTy.getSizeInBits(); // 32
  const unsigned BitsForNumParts = NarrowSize * NumParts;  // 96
  // Check if we have any leftovers. If we do, then only handle the case where
  // the leftover is one element.
  if (BitsForNumParts != Size && BitsForNumParts + EltSize != Size)  // 96 != 160 && 96 + 32 != 160
    return UnableToLegalize;
```
The above works for v3s32, but not for any more elements of an odd number.

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir:341
+    $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %4
+...
+
----------------
Same, this is not legalized at all.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58123/new/

https://reviews.llvm.org/D58123