[PATCH] D61289: [globalisel] Add G_SEXT_INREG

Fri May 10 03:30:26 PDT 2019

Petar.Avramovic added a comment.

> Most targets I know would select G_ZEXT_INREG to an AND using either an inline or materialized immediate at which point we haven't really gained anything by protecting it against harmful 'optimization'.

My only consideration was that is faster to narrow scalar G_ZEXT_INREG, then to narrow scalar G_AND and G_CONSTANT. On the other hand AND has simple narrow scalar unlike G_SHL and G_ASHR so it is not that big performance/code size improvement compared to G_SEXT_INREG.  Also as sign and zero extend are most of the time mentioned together, I thought that we could add G_ZEXT_INREG alongside with G_SEXT_INREG.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:698
     return Legalized;
+  case TargetOpcode::G_SEXT_INREG: {
+    if (TypeIdx != 0)
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > NarrowScalar is good candidate for separate patch. Reason is that artifact combiner currently can not handle test that would use  G_SEXT_INREG since it will have chained artifacts.  G_SEXT will be  G_UNMERGE-d, G_UNMERGE has to wait for: 
> >  - G_SEXT to be combined into G_SEXT_INREG,
> >  - and then for G_SEXT_INREG to be narrowScalared into sequence of instructions that will end with G_MERGE_VALUES. 
> > Only at this point G_UNMERGE can be combined.
> > 
> > 
> There's a test for narrowScalar of G_SEXT_INREG in this patch. Could you elaborate on what kind of test you're looking for? I think you're looking for a test with neighbouring instructions which are also narrowScalar'd
Yes. For example with this test on aarch64, store is narrowScalared:

```
  define void @func(i8 %x, i128* %p) {
  entry:
    %conv = sext i8 %x to i128
    store i128 %conv, i128* %p
    ret void
  }
```
Legalizer is not able to produce something like this
```
    %2:_(s32) = COPY $w0
    %1:_(p0) = COPY $x1
    %11:_(s64) = G_CONSTANT i64 63
    %12:_(s64) = G_SEXT_INREG %2, 8
    %13:_(s64) = G_ASHR %12, %11(s64)
    G_STORE %12(s64), %1(p0) :: (store 8 into %ir.p, align 16)
    %7:_(s64) = G_CONSTANT i64 8
    %6:_(p0) = G_GEP %1, %7(s64)
    G_STORE %13(s64), %6(p0) :: (store 8 into %ir.p + 8, align 16)
    RET_ReallyLR
```
which I assume is a desired output.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:708
+    // don't need to break it apart.
+    if (NarrowTy.getScalarSizeInBits() >= SizeInBits) {
+      Observer.changingInstr(MI);
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > What is the idea behind this block, we generate two instructions with types larger then NarrowTy? G_TRUNC and G_SEXT that are generated have to be narrowScalared and then we have a few more merge/unmerge combines to do.  Also LegalizerHelper does not know how to narrow scalar G_TRUNC and G_SEXT at the moment.
> It's not possible to eliminate types larger than NarrowTy with a single narrowScalar. narrowScalar's job is to replace this instruction with one or more that work on NarrowTy sized types and some legalization artifacts (G_SEXT and G_TRUNC in this case) which are required to maintain type correctness. Those legalization artifacts are either combined away or further legalized according to their legalization rules.
> 
> This block aims to optimize the case where all but one of the narrowed components is a wholly a copy of the sign bit. It expects the G_TRUNC and G_SEXT to be either combined away by the artifact combiner or further legalized. For example: narrowing `%0:_(s32) = G_SEXT %1:_(s32), 8` down to s16 only requires us to preserve the component containing bits 0-15. We can forget about bits 16-31 and reconstruct them with the G_SEXT. This will help chains of instructions to be deleted as they are dead.
I did not consider situation when G_SEXT(G_SEXT_INREG) could combine away. 
Currently G_SEXT can only combine into G_TRUNC.  
The use of this G_SEXT is something legal or G_MERGE that is waiting for G_UNMERGE. 
We know this since uses are legalized before defs (G_SEXT is def in this context).
Yet again if use is legal, then G_SEXT_INREG was legal or something else like lower but not narrow scalar.
Are we talking about artifact combiner, or something that is not in tree?
Could you provide a test where G_SEXT and G_TRUNC produced  with this narrowScalar combine away?

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:756
+
+    // Do the operation on each small part.
+    for (int i = 0; i < NumParts; ++i) {
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > This loop works for `NarrowTy.getScalarSizeInBits() >= SizeInBits` as well.
> > `NarrowTy.getScalarSizeInBits()` -> `NarrowSize`
> > 
> It does but it potentially generates more instructions to do it. Consider `%0:_(s32) = G_SEXT %1:_(s32), 4` narrowScalar'd to s8. This method will break it into four components and will emit one G_SEXT_INREG and one G_ASHR. The other method will emit one G_SEXT_INREG and one G_SEXT which may be eliminated entirely or at worst, lower to one G_ASHR
Could we then add a possibility for targets to chose how they want to narrow scalar that is to have:
narrowScalar - always creates G_UNMERGE+...+G_MERGE
narrowScalarExt - creates G_TRUNC+...+G_{S|Z|ANY}EXT

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61289/new/

https://reviews.llvm.org/D61289