[PATCH] D65440: GlobalISel: Add widenScalar for G_UNMERGE_VALUES sources

Sun Jan 19 17:41:40 PST 2020

arsenm marked an inline comment as done.
arsenm added inline comments.
Herald added a subscriber: kerbowa.

================
Comment at: lib/CodeGen/GlobalISel/LegalizerHelper.cpp:1084-1098
+  if (TypeIdx == 1) {
+    Register Cast = MIRBuilder.buildAnyExt(WideTy, SrcReg).getReg(0);
+    MIRBuilder.buildTrunc(MI.getOperand(0), Cast);
+
+    const unsigned DstSize = DstTy.getSizeInBits();
+    unsigned Offset = DstSize;
+    for (unsigned I = 1; I != NumDst; ++I, Offset += DstSize) {
----------------
dsanders wrote:
> arsenm wrote:
> > dsanders wrote:
> > > arsenm wrote:
> > > > dsanders wrote:
> > > > > This feels more like lower than widenScalar as it replaces the G_UNMERGE_VALUES with G_LSHR+G_TRUNC. I would have expected widenScalar to change something like:
> > > > >     %1, %2 = G_UNMERGE_VALUES %0
> > > > > into:
> > > > >     %5 = G_ANYEXT %0
> > > > >     %1, %2, %3, %4 = G_UNMERGE_VALUES %5
> > > > > where %3 and %4 are unused. The main limitation with that is that sizeof(%5) would have to be a multiple of sizeof(%1). If that's not true then it would have to lower instead as G_UNMERGE_VALUES wouldn't be able to handle it.
> > > > > 
> > > > > What was the reason for emitting G_LSHR+G_TRUNC?
> > > > I’m not quite sure where to handle the various steps for merge/unmerge. D65447 is essentially the same, except it bitcasts from a vector which seems more like a lower action. Lower however does not have a specified type for it. I thought using the type hint for these primitives is more relevant than the operation itself.
> > > > 
> > > > I’m also having trouble deciding whether to handle the source or destination type for some of these. The fact that widen/narrow are distinct actions is something of a complication. Sometimes a wider type needs to be introduced when narrowing, and it’s not clear to me how it should be handled. This is sometimes avoidable by doing something resembling the opposite on the other type index.
> > > > 
> > > > I’m also not sure in cases where the types do not evenly divide whether a single legalize step should attempt to deal with it, or if the target legalization rules should have the burden of specifying actions to intermediate types which would be simpler to handle
> > > > I’m not quite sure where to handle the various steps for merge/unmerge. D65447 is essentially the same, except it bitcasts from a vector which seems more like a lower action. Lower however does not have a specified type for it. I thought using the type hint for these primitives is more relevant than the operation itself.
> > > 
> > > I agree it's rather fuzzy. My understanding is that widenScalar/narrowScalar expect to emit the same core operations but on intermediates of the given type plus some glue MIR to make it all fit. Similarly for moreElements/fewerElements. Lower/Libcall (and Bitcast if we were to add one) expect to change the operations but stick to the same type. Custom doesn't really have any expectations and is free to do whatever.
> > > 
> > > FWIW, I'd like to push the legalizer more towards only having two real actions: Legal and ChangeIt. Where ChangeIt is given a function that makes the desired change (those functions could be from the widenScalar/lower/etc. family). Everything beyond that like widenScalar/lower/etc. is just sugar. The main reason I want to move in this direction is because it eliminates the Custom action. At the moment if you use Custom then one function has to be capable of every change you might want to make even if different cases are completely different.
> > > 
> > > > I’m also having trouble deciding whether to handle the source or destination type for some of these. The fact that widen/narrow are distinct actions is something of a complication. Sometimes a wider type needs to be introduced when narrowing, and it’s not clear to me how it should be handled. This is sometimes avoidable by doing something resembling the opposite on the other type index.
> > > 
> > > Could you show me an example of needing to widen a type as part of narrowing?
> > > 
> > > > I’m also not sure in cases where the types do not evenly divide whether a single legalize step should attempt to deal with it, or if the target legalization rules should have the burden of specifying actions to intermediate types which would be simpler to handle
> > > 
> > > You mean for things like widening/narrowing the sources of a G_MERGE_VALUES or the results of a G_UNMERGE_VALUES? Changing the other side of these two is fairly simple (you just add undefs or unused results and G_ANYEXT/G_TRUNC the one that has a remainder) but changing them from this side is tricky.
> > > 
> > > I think the intent is that a single legalize step is supposed to handle it but it's certainly possible that we haven't given that enough thought.
> > > Could you show me an example of needing to widen a type as part of narrowing?
> > 
> > We currently have a mix of different strategies to deal with cases that don't evenly divide, although it would probably be better to have one consistent one. There is a tradeoff in complexity between LegalizerHelper and the target's LegalizerInfo rules.
> > 
> > For AMDGPU, to a 1st order approximation, every type should be broken up into 32-bit pieces. One example I've been looking at in the past couple of days is:
> > 
> > ```
> > ---
> > name: test_unmerge_s16_s48
> > body: |
> >   bb.0:
> >     liveins: $vgpr0_vgpr1
> >     %0:_(s64) = COPY $vgpr0_vgpr1
> >     %1:_(s48) = G_TRUNC %0
> >     %2:_(s16), %3:_(s16), %4:_(s16) = G_UNMERGE_VALUES %1
> >     %5:_(s32) = G_ANYEXT %2
> >     %6:_(s32) = G_ANYEXT %3
> >     %7:_(s32) = G_ANYEXT %4
> >     $vgpr0 = COPY %5
> >     $vgpr1 = COPY %6
> >     $vgpr2 = COPY %7
> > ...
> > ```
> > 
> > I want this to end up with s32, s32 = G_UNMERGE_VALUES s64, with shifts on the 32-bit elements to get the used 16-bit parts. I'm not sure if this logically is a widening of the result s16, or a narrowing of the source s48. If I try to treat it as a narrowing of the source, it is convenient to widen the original source so that a sub-unmerge can be created to s32s, before unmerging those to the original output registers (with an extra unused output)
> > 
> > The current strategy widens the result type to the requested s32, and multiplies the source type to s96, and produces some ugly s96 shifts which get expanded which is what I'm now trying to avoid. This strategy is fine when the result source is at most s32.
> > 
> > I see a few too many options for handling this:
> > 
> >   # Legalize action to widen s48 source to s64. Separate legalization action to narrow s64 to s32, and create sub-unmerges. This requires the least code in LegalizerHelper, but requires more work in the target legalization rules and increases the importance of the order the legalize rules were specified.
> > 
> >   # Narrow source type by widening %1 to s64, create sub-unmerge to s32, and unmerge to original types. This gets more complicated when the types don't nicely divide like this, and more intermediate merge/unmerges are needed.
> > 
> >   # Widen source type, but this mostly has to emit the same sequence as narrowing the source type.
> >  
> > 
> > These all involve some guesswork for when it makes sense to try to use a shift in the original source type. At some point, I think any non-legal register width unmerge will need to emit some kind of shift. So far I've been trying to make a reasonable guess when the requested type ends up matching the source, and making this a lower would require more target defined legalization rules to switch to lowering at some point.
> > 
> > I can also imagine a system where the legalize action specifies the types for all type indexes, although that probably makes things more complicated for the legalizer.
> > 
> > 
> > 
> > 
> > 
> > 
> Sorry, I've been extremely slow to get back to this.
> 
> > I want this to end up with s32, s32 = G_UNMERGE_VALUES s64, with shifts on the 32-bit elements to get the used 16-bit parts. I'm not sure if this logically is a widening of the result s16, or a narrowing of the source s48.
> 
> Calling it a narrowing of the source s48 doesn't make sense to me as you're widening it to s64. Between those two, I'd call it a widening of the result s16's. Given that the distinction between lowering and widening is a bit fuzzy here, let's stick with the widening.
> 
> Doesn't this path produce an illegal unmerge? In the example you gave I'd expect to end up with `s32, s32 = G_UNMERGE_VALUES s48` which isn't valid because there's more result bits than source bits. I think there needs to be code to insert the anyext on the source.
> 
> > I can also imagine a system where the legalize action specifies the types for all type indexes, although that probably makes things more complicated for the legalizer.
> 
> The legalizer algorithm should be ok with it but it would make it easier to get into infinite loops. You'd have to be quite careful with what the action does.
The extension is added on line 1085 here. See also the follow up patches D65441 and D65473

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65440/new/

https://reviews.llvm.org/D65440