[PATCH] D58123: GlobalISel: Implement moreElementsVector for bit ops

Sun Feb 17 11:38:29 PST 2019

arsenm added inline comments.

================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:793
+  unsigned ImpDef = MIRBuilder.buildUndef(WideTy)->getOperand(0).getReg();
+  MIRBuilder.buildInsert(WideReg, ImpDef, MO.getReg(), 0);
+  MO.setReg(WideReg);
----------------
rtereshin wrote:
> I'm looking at `test_and_v3s8` below, and thinking, maybe if we do this via an unmerge / merge pair, the results would be better, as these ops appear to be more likely to disappear as artifacts than such an insert, that splices a vector into another vector.
> 
> The intermediate "scalar" size doesn't have to be scalar, but could be a vector with a number of lanes being GCD of the original and target number of lines, like
> 
> v2s16 = G_IMPLICIT_DEF
> v2s16, v2s16, v2s16 = G_UNMERGE_VALUES v6s16
> v8s16 = G_CONCAT_VECTORS v2s16, v2s16, v2s16, v2s16
> 
> If it is a scalar, the "merge" component will be a G_BUILD_VECTOR, of course. 
> 
> What do you think?
I think this requires more experimentation. It creates a bunch more intermediate operations and vregs. The common case should be 3->4 elements, with matching offsets which should fold away easily.

For now most of the legalizations for merge/unmerge are missing so I think this might not work around at the moment.

================
Comment at: lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:155
+    .fewerElementsIf(vectorWiderThan(0, 32), halfSizeVector(0))
+    .moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
     .scalarize(0);
----------------
rtereshin wrote:
> The largest scalar size of a vector this rule could possibly fire on is 10 bits, or if we stick to the powers of 2, 8 bits (1 vectors aren't supported, 2 vector is even => 3 is the smallest number of lanes. To be small, has to be 31 bits wide or less => 10 bit is the largest scalar size).
> 
> So it would turn V3S8 to V4S8 for instance. Which is not legal, not scalar, not wider than 32 bits, not odd -> it will be scalarized,  and then the scalars widened to S32.
> 
> So, the resulting ops if the type we start with is V3S8 are 4 32-bit ops (4 x S32), one of which is on undef, unmerge / merge pair, 4 32-bit exts, and 4 32-bit truncs. It's either worse or the same as it would be if we just remove this moreElementsIf rule completely, I suspect.
> 
> Maybe you have intended making V4S8 legal? After all, these are bitwise ops with lanes being independent.
At this point I'm not expecting any of the legalization actions for sub-32 bit types to be perfectly accurate. Most of the legalizations are missing, and I'm not sure what the optimal end state looks like yet (such as whether it's better to promote elements first or split the vector). For now I've just been putting placeholders that will make use of what's implemented.

My intention was to legalize from 3->4 elements for this. When there are actual optimizations, the undef 4th component should be eliminated if this does end up promoted to 32-bits.

We have legal s16 and v2s16 ops on some subtargets, so it makes sense to treat those as just rounded to integer sized vectors. I'm less sure what should happen for 8-bit elements at this point, but for now I'm trying to keep them packed and follow along with the 16-bit vectors.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58123/new/

https://reviews.llvm.org/D58123