[PATCH] D61289: [globalisel] Add G_SEXT_INREG

Thu May 9 16:21:15 PDT 2019

dsanders marked 4 inline comments as done.
dsanders added a comment.

In D61289#1493205 <https://reviews.llvm.org/D61289#1493205>, @Petar.Avramovic wrote:

> Perfect. What about G_ZEXT_INREG since it is more efficient to narrow scalar G_ZEXT_INREG then bitwise instruction with some bit mask?

Some of the arguments work for a G_ZEXT_INREG but I don't think it's as widely applicable. It would essentially be a G_AND with the constraint that the mask be a contiguous series of bits starting at position 0 and I'm not aware of a target that can improve performance by preserving that constraint. Most targets I know would select G_ZEXT_INREG to an AND using either an inline or materialized immediate at which point we haven't really gained anything by protecting it against harmful 'optimization'. On MIPS for example, `andi r1, r2, #0xff` and `andi r1, r2, #0xf0` have the same performance. I can see some cases where certain immediates have better performance such as 0xffffffff on a 64-bit and for a target that can access a zero-extended 32-bit subregister in other instructions, or 0xffff and zero-extended 16-bit subregs. Mips isn't one of those targets since it sign-extends all operands and results to infinite bits but I think ARM and X86 can benefit for some sizes, I don't know the performance characteristics compared to an AND though, it could be the same cycle-count/code-size/etc. either way. Do you (or others) know of targets that would benefit from G_ZEXT_INREG?

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:698
     return Legalized;
+  case TargetOpcode::G_SEXT_INREG: {
+    if (TypeIdx != 0)
----------------
Petar.Avramovic wrote:
> NarrowScalar is good candidate for separate patch. Reason is that artifact combiner currently can not handle test that would use  G_SEXT_INREG since it will have chained artifacts.  G_SEXT will be  G_UNMERGE-d, G_UNMERGE has to wait for: 
>  - G_SEXT to be combined into G_SEXT_INREG,
>  - and then for G_SEXT_INREG to be narrowScalared into sequence of instructions that will end with G_MERGE_VALUES. 
> Only at this point G_UNMERGE can be combined.
> 
> 
There's a test for narrowScalar of G_SEXT_INREG in this patch. Could you elaborate on what kind of test you're looking for? I think you're looking for a test with neighbouring instructions which are also narrowScalar'd

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:708
+    // don't need to break it apart.
+    if (NarrowTy.getScalarSizeInBits() >= SizeInBits) {
+      Observer.changingInstr(MI);
----------------
Petar.Avramovic wrote:
> What is the idea behind this block, we generate two instructions with types larger then NarrowTy? G_TRUNC and G_SEXT that are generated have to be narrowScalared and then we have a few more merge/unmerge combines to do.  Also LegalizerHelper does not know how to narrow scalar G_TRUNC and G_SEXT at the moment.
It's not possible to eliminate types larger than NarrowTy with a single narrowScalar. narrowScalar's job is to replace this instruction with one or more that work on NarrowTy sized types and some legalization artifacts (G_SEXT and G_TRUNC in this case) which are required to maintain type correctness. Those legalization artifacts are either combined away or further legalized according to their legalization rules.

This block aims to optimize the case where all but one of the narrowed components is a wholly a copy of the sign bit. It expects the G_TRUNC and G_SEXT to be either combined away by the artifact combiner or further legalized. For example: narrowing `%0:_(s32) = G_SEXT %1:_(s32), 8` down to s16 only requires us to preserve the component containing bits 0-15. We can forget about bits 16-31 and reconstruct them with the G_SEXT. This will help chains of instructions to be deleted as they are dead.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:710
+      Observer.changingInstr(MI);
+      MI.getOperand(2).setImm(SizeInBits);
+      // We don't lose any non-extension bits by truncating the src and
----------------
This statement is redundant though. It's already SizeInBits because we just read it from there

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:756
+
+    // Do the operation on each small part.
+    for (int i = 0; i < NumParts; ++i) {
----------------
Petar.Avramovic wrote:
> This loop works for `NarrowTy.getScalarSizeInBits() >= SizeInBits` as well.
> `NarrowTy.getScalarSizeInBits()` -> `NarrowSize`
> 
It does but it potentially generates more instructions to do it. Consider `%0:_(s32) = G_SEXT %1:_(s32), 4` narrowScalar'd to s8. This method will break it into four components and will emit one G_SEXT_INREG and one G_ASHR. The other method will emit one G_SEXT_INREG and one G_SEXT which may be eliminated entirely or at worst, lower to one G_ASHR

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:767
+        }
+        DstRegs.push_back(MIRBuilder
+                              .buildInstr(TargetOpcode::G_ASHR, {NarrowTy},
----------------
Petar.Avramovic wrote:
> Considering efficiency it might be better to create only one G_ASHR that will have sign of "extension point SEXT_INREG" and copy it to remaining registers that hold higher bits.
It already does that. We save the register in FullExtensionReg and re-use it if we see any further full extensions. The test for it is in LegalizerHelperTest.cpp on line 821 (the last two operands of the G_MERGE_VALUES are both [[T6]]

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61289/new/

https://reviews.llvm.org/D61289