[PATCH] D58123: GlobalISel: Implement moreElementsVector for bit ops

Mon Feb 18 07:45:36 PST 2019

arsenm marked an inline comment as done.
arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:155
+    .fewerElementsIf(vectorWiderThan(0, 32), halfSizeVector(0))
+    .moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
     .scalarize(0);
----------------
arsenm wrote:
> rtereshin wrote:
> > The largest scalar size of a vector this rule could possibly fire on is 10 bits, or if we stick to the powers of 2, 8 bits (1 vectors aren't supported, 2 vector is even => 3 is the smallest number of lanes. To be small, has to be 31 bits wide or less => 10 bit is the largest scalar size).
> > 
> > So it would turn V3S8 to V4S8 for instance. Which is not legal, not scalar, not wider than 32 bits, not odd -> it will be scalarized,  and then the scalars widened to S32.
> > 
> > So, the resulting ops if the type we start with is V3S8 are 4 32-bit ops (4 x S32), one of which is on undef, unmerge / merge pair, 4 32-bit exts, and 4 32-bit truncs. It's either worse or the same as it would be if we just remove this moreElementsIf rule completely, I suspect.
> > 
> > Maybe you have intended making V4S8 legal? After all, these are bitwise ops with lanes being independent.
> At this point I'm not expecting any of the legalization actions for sub-32 bit types to be perfectly accurate. Most of the legalizations are missing, and I'm not sure what the optimal end state looks like yet (such as whether it's better to promote elements first or split the vector). For now I've just been putting placeholders that will make use of what's implemented.
> 
> My intention was to legalize from 3->4 elements for this. When there are actual optimizations, the undef 4th component should be eliminated if this does end up promoted to 32-bits.
> 
> We have legal s16 and v2s16 ops on some subtargets, so it makes sense to treat those as just rounded to integer sized vectors. I'm less sure what should happen for 8-bit elements at this point, but for now I'm trying to keep them packed and follow along with the 16-bit vectors.
We also have an encoding on some subtargets that allows free access to any byte offset with an implicit sext/zext and I haven't considered if we should make more s8 operations legal as a result.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58123/new/

https://reviews.llvm.org/D58123