[PATCH] D61289: [globalisel] Add G_SEXT_INREG

Fri May 10 18:20:05 PDT 2019

Petar.Avramovic added inline comments.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:698
     return Legalized;
+  case TargetOpcode::G_SEXT_INREG: {
+    if (TypeIdx != 0)
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > dsanders wrote:
> > > Petar.Avramovic wrote:
> > > > NarrowScalar is good candidate for separate patch. Reason is that artifact combiner currently can not handle test that would use  G_SEXT_INREG since it will have chained artifacts.  G_SEXT will be  G_UNMERGE-d, G_UNMERGE has to wait for: 
> > > >  - G_SEXT to be combined into G_SEXT_INREG,
> > > >  - and then for G_SEXT_INREG to be narrowScalared into sequence of instructions that will end with G_MERGE_VALUES. 
> > > > Only at this point G_UNMERGE can be combined.
> > > > 
> > > > 
> > > There's a test for narrowScalar of G_SEXT_INREG in this patch. Could you elaborate on what kind of test you're looking for? I think you're looking for a test with neighbouring instructions which are also narrowScalar'd
> > Yes. For example with this test on aarch64, store is narrowScalared:
> > 
> > ```
> >   define void @func(i8 %x, i128* %p) {
> >   entry:
> >     %conv = sext i8 %x to i128
> >     store i128 %conv, i128* %p
> >     ret void
> >   }
> > ```
> > Legalizer is not able to produce something like this
> > ```
> >     %2:_(s32) = COPY $w0
> >     %1:_(p0) = COPY $x1
> >     %11:_(s64) = G_CONSTANT i64 63
> >     %12:_(s64) = G_SEXT_INREG %2, 8
> >     %13:_(s64) = G_ASHR %12, %11(s64)
> >     G_STORE %12(s64), %1(p0) :: (store 8 into %ir.p, align 16)
> >     %7:_(s64) = G_CONSTANT i64 8
> >     %6:_(p0) = G_GEP %1, %7(s64)
> >     G_STORE %13(s64), %6(p0) :: (store 8 into %ir.p + 8, align 16)
> >     RET_ReallyLR
> > ```
> > which I assume is a desired output.
> Ah ok. The issue there is that G_STORE hasn't implemented narrowScalar yet. I don't think that we should prevent any opcode from getting a narrowScalar implementation until all opcodes have one. I think we should be implementing them as we need them and gradually build up the full set over time.
The problem (if we use narrow scalar with G_MERGE/G_UNMERGE) is the order in which we attempt to combine artifacts in Legalizer. D61787 and narrow scalar for G_ANYEXT(with G_MERGE/G_UNMERGE) will be able to give output like this.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:708
+    // don't need to break it apart.
+    if (NarrowTy.getScalarSizeInBits() >= SizeInBits) {
+      Observer.changingInstr(MI);
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > dsanders wrote:
> > > Petar.Avramovic wrote:
> > > > What is the idea behind this block, we generate two instructions with types larger then NarrowTy? G_TRUNC and G_SEXT that are generated have to be narrowScalared and then we have a few more merge/unmerge combines to do.  Also LegalizerHelper does not know how to narrow scalar G_TRUNC and G_SEXT at the moment.
> > > It's not possible to eliminate types larger than NarrowTy with a single narrowScalar. narrowScalar's job is to replace this instruction with one or more that work on NarrowTy sized types and some legalization artifacts (G_SEXT and G_TRUNC in this case) which are required to maintain type correctness. Those legalization artifacts are either combined away or further legalized according to their legalization rules.
> > > 
> > > This block aims to optimize the case where all but one of the narrowed components is a wholly a copy of the sign bit. It expects the G_TRUNC and G_SEXT to be either combined away by the artifact combiner or further legalized. For example: narrowing `%0:_(s32) = G_SEXT %1:_(s32), 8` down to s16 only requires us to preserve the component containing bits 0-15. We can forget about bits 16-31 and reconstruct them with the G_SEXT. This will help chains of instructions to be deleted as they are dead.
> > I did not consider situation when G_SEXT(G_SEXT_INREG) could combine away. 
> > Currently G_SEXT can only combine into G_TRUNC.  
> > The use of this G_SEXT is something legal or G_MERGE that is waiting for G_UNMERGE. 
> > We know this since uses are legalized before defs (G_SEXT is def in this context).
> > Yet again if use is legal, then G_SEXT_INREG was legal or something else like lower but not narrow scalar.
> > Are we talking about artifact combiner, or something that is not in tree?
> > Could you provide a test where G_SEXT and G_TRUNC produced  with this narrowScalar combine away?
> > 
> It does look like the artifact combiner is missing some combines, for the `NarrowTy.getScalarSizeInBits() < SizeInBits` case the resulting:
>     %6:_(s128) = G_MERGE_VALUES %0(s64), %12(s64)
>     %7:_(s32) = G_TRUNC %6(s128)
> ought to be:
>     %7:_(s32) = G_TRUNC %0(s64)
> 
> and for the `NarrowTy.getScalarSizeInBits() >= SizeInBits` there's a:
>     %10:_(s64) = G_SEXT_INREG %9, 32
>     %6:_(s128) = G_SEXT %10(s64)
>     %7:_(s32) = G_TRUNC %6(s128)
> which firstly ought to be:
>     %10:_(s64) = G_SEXT_INREG %9, 32
>     %7:_(s32) = G_TRUNC %10(s64)
> and secondly:
>     %10:_(s64) = G_SEXT_INREG %9, 32
>     %7:_(s32) = %9(s32)
> the former of those two isn't really related to this patch (the G_SEXT_INREG isn't involved in the combine) but the latter is.
Sorry, I still don't understand. In this fragment %9 should be s64 not s32. An IR function that results in situation where it is better to narrow scalar with SEXT+TRUNC where they combine with something would be helpful.
Maybe it is possible to perform other combines before we make G_SEXT_INREG  that would benefit from narrow scalar with G_TRUNC/G_SEXT? 
That is to not create G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT and solve everything in combiner with for example some pattern that combines multiple artifacts instead of only two.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:756
+
+    // Do the operation on each small part.
+    for (int i = 0; i < NumParts; ++i) {
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > dsanders wrote:
> > > Petar.Avramovic wrote:
> > > > This loop works for `NarrowTy.getScalarSizeInBits() >= SizeInBits` as well.
> > > > `NarrowTy.getScalarSizeInBits()` -> `NarrowSize`
> > > > 
> > > It does but it potentially generates more instructions to do it. Consider `%0:_(s32) = G_SEXT %1:_(s32), 4` narrowScalar'd to s8. This method will break it into four components and will emit one G_SEXT_INREG and one G_ASHR. The other method will emit one G_SEXT_INREG and one G_SEXT which may be eliminated entirely or at worst, lower to one G_ASHR
> > Could we then add a possibility for targets to chose how they want to narrow scalar that is to have:
> > narrowScalar - always creates G_UNMERGE+...+G_MERGE
> > narrowScalarExt - creates G_TRUNC+...+G_{S|Z|ANY}EXT
> Assuming we fix the missing artifact combines, what would be the case where G_UNMERGE/G_MERGE is the better option?
%0:_(s128) = G_ANYEXT .....
%1:_(s128) = G_SEXT_INREG %0, 8
%2:_(s64), %3:_(s64) = G_UNMERGE_VALUES %1:_(s128)

At the moment most(if not all?) of narrow scalars make sequence of instructions that starts with G_UNMERGE and ends with G_MERGE. In case we emit G_SEXT as end of sequence it won't be able to combine with G_UNMERGE_VALUES. We then have to perform narrow scalar of this G_SEXT (that will end with G_MERGE) then combine this with %2:_(s64), %3:_(s64) = G_UNMERGE_VALUES %1:_(s128)
Same for G_TRUNC at the start. Not to mention that this complicates order in which we attempt to combine artifacts.
And if G_SEXT_INREG is to be narrow scalared I would expect that it is surrounded with G_MERGE and G_UNMERGE, on mips at least.
Considering llvm test-suite and mips I would say that it is always better to emit G_UNMERGE/G_MERGE. I cannot think of example where G_SEXT would be able to combine with something.
And for general answer I would say that it is better to emit G_UNMERGE/G_MERGE if def of G_SEXT_INREG  is used in G_UNMERGE_VALUES. This is problematic since it requires legalizer to decide how to narrow scalar based on surrounding instructions. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61289/new/

https://reviews.llvm.org/D61289