[PATCH] D61289: [globalisel] Add G_SEXT_INREG
Daniel Sanders via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 3 10:49:10 PDT 2019
dsanders marked an inline comment as done.
dsanders added inline comments.
================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:708
+ // don't need to break it apart.
+ if (NarrowTy.getScalarSizeInBits() >= SizeInBits) {
+ Observer.changingInstr(MI);
----------------
Petar.Avramovic wrote:
> dsanders wrote:
> > Petar.Avramovic wrote:
> > > dsanders wrote:
> > > > Petar.Avramovic wrote:
> > > > > dsanders wrote:
> > > > > > Petar.Avramovic wrote:
> > > > > > > What is the idea behind this block, we generate two instructions with types larger then NarrowTy? G_TRUNC and G_SEXT that are generated have to be narrowScalared and then we have a few more merge/unmerge combines to do. Also LegalizerHelper does not know how to narrow scalar G_TRUNC and G_SEXT at the moment.
> > > > > > It's not possible to eliminate types larger than NarrowTy with a single narrowScalar. narrowScalar's job is to replace this instruction with one or more that work on NarrowTy sized types and some legalization artifacts (G_SEXT and G_TRUNC in this case) which are required to maintain type correctness. Those legalization artifacts are either combined away or further legalized according to their legalization rules.
> > > > > >
> > > > > > This block aims to optimize the case where all but one of the narrowed components is a wholly a copy of the sign bit. It expects the G_TRUNC and G_SEXT to be either combined away by the artifact combiner or further legalized. For example: narrowing `%0:_(s32) = G_SEXT %1:_(s32), 8` down to s16 only requires us to preserve the component containing bits 0-15. We can forget about bits 16-31 and reconstruct them with the G_SEXT. This will help chains of instructions to be deleted as they are dead.
> > > > > I did not consider situation when G_SEXT(G_SEXT_INREG) could combine away.
> > > > > Currently G_SEXT can only combine into G_TRUNC.
> > > > > The use of this G_SEXT is something legal or G_MERGE that is waiting for G_UNMERGE.
> > > > > We know this since uses are legalized before defs (G_SEXT is def in this context).
> > > > > Yet again if use is legal, then G_SEXT_INREG was legal or something else like lower but not narrow scalar.
> > > > > Are we talking about artifact combiner, or something that is not in tree?
> > > > > Could you provide a test where G_SEXT and G_TRUNC produced with this narrowScalar combine away?
> > > > >
> > > > It does look like the artifact combiner is missing some combines, for the `NarrowTy.getScalarSizeInBits() < SizeInBits` case the resulting:
> > > > %6:_(s128) = G_MERGE_VALUES %0(s64), %12(s64)
> > > > %7:_(s32) = G_TRUNC %6(s128)
> > > > ought to be:
> > > > %7:_(s32) = G_TRUNC %0(s64)
> > > >
> > > > and for the `NarrowTy.getScalarSizeInBits() >= SizeInBits` there's a:
> > > > %10:_(s64) = G_SEXT_INREG %9, 32
> > > > %6:_(s128) = G_SEXT %10(s64)
> > > > %7:_(s32) = G_TRUNC %6(s128)
> > > > which firstly ought to be:
> > > > %10:_(s64) = G_SEXT_INREG %9, 32
> > > > %7:_(s32) = G_TRUNC %10(s64)
> > > > and secondly:
> > > > %10:_(s64) = G_SEXT_INREG %9, 32
> > > > %7:_(s32) = %9(s32)
> > > > the former of those two isn't really related to this patch (the G_SEXT_INREG isn't involved in the combine) but the latter is.
> > > Sorry, I still don't understand. In this fragment %9 should be s64 not s32. An IR function that results in situation where it is better to narrow scalar with SEXT+TRUNC where they combine with something would be helpful.
> > > Maybe it is possible to perform other combines before we make G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT?
> > > That is to not create G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT and solve everything in combiner with for example some pattern that combines multiple artifacts instead of only two.
> > > Sorry, I still don't understand. In this fragment %9 should be s64 not s32.
> > You're right, that step isn't quite right. It should be `%7:_(s32) = G_TRUNC %9(s64)`. The point was to consume the input rather than the output as the lower 32-bits are unchanged by the G_SEXT_INREG. Climbing the dependency chain like this allows the uses of %7 to start sooner.
> >
> > > An IR function that results in situation where it is better to narrow scalar with SEXT+TRUNC where they combine with something would be helpful.
> >
> > Any IR where many operations are too large for your target would provide good examples. Consider:
> > %2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
> > %3:_(s64) = G_SEXT_INREG %2:_(s64), 16
> > and that both have narrowScalar(/*typeidx=*/0, s32). The legalization would proceed to:
> > %2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
> > %4:_(s32), %5:_(s32) = G_UNMERGE_VALUES %2:_(s64)
> > %6:_(s32) = G_SEXT_INREG %4:_(s32), 16
> > %3:_(s64) = G_MERGE_VALUES %6:_(s32), %5:_(s32)
> > and then to:
> > %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
> > %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
> > %7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
> > %8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
> > %2:_(s64) = G_MERGE_VALUES %7:_(s32), %8:_(s32)
> > %4:_(s32), %5:_(s32) = G_UNMERGE_VALUES %2:_(s64)
> > %6:_(s32) = G_SEXT_INREG %4:_(s32), 16
> > %3:_(s64) = G_MERGE_VALUES %6:_(s32), %5:_(s32)
> > then the artifact combiner would fold the middle merge/unmerge to:
> > %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
> > %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
> > %7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
> > %8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
> > %6:_(s32) = G_SEXT_INREG %7:_(s32), 16
> > %3:_(s64) = G_MERGE_VALUES %6:_(s32), %8:_(s32)
> > Notice that we still have the `%8:_(s32) = G_ADD %10:_(s32), %12:_(s32)` at this point even though we're about to overwrite it. We're not going to be able to improve on this until a post-legalize combiner.
> >
> > Now consider the same case with this optimization:
> > %2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
> > %3:_(s64) = G_SEXT_INREG %2:_(s64), 16
> > becomes:
> > %2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
> > %4:_(s32) = G_TRUNC %2:_(s64)
> > %6:_(s32) = G_SEXT_INREG %4:_(s32), 16
> > %3:_(s64) = G_SEXT %6:_(s32)
> > then:
> > %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
> > %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
> > %7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
> > %8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
> > %2:_(s64) = G_MERGE_VALUES %7:_(s32), %8:_(s32)
> > %4:_(s32) = G_TRUNC %2:_(s64)
> > %6:_(s32) = G_SEXT_INREG %4:_(s32), 16
> > %3:_(s64) = G_SEXT %6:_(s32)
> > which the artifact combiner (should) simplify to:
> > %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
> > %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
> > %7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
> > %8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
> > %6:_(s32) = G_SEXT_INREG %7:_(s32), 16
> > %3:_(s64) = G_SEXT %6:_(s32)
> > and then to:
> > %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
> > %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
> > %7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
> > %6:_(s32) = G_SEXT_INREG %7:_(s32), 16
> > %3:_(s64) = G_SEXT %6:_(s32)
> > and then to:
> > %9:_(s32) = G_TRUNC %0:_(s64)
> > %11:_(s32) = G_TRUNC %1:_(s64)
> > %7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
> > %6:_(s32) = G_SEXT_INREG %7:_(s32), 16
> > %3:_(s64) = G_SEXT %6:_(s32)
> > which is simpler. The second G_ADD was correctly recognized as being dead and removed. As a result fewer instructions were emitted by the legalizer and subsequent passes have less work to do.
> >
> > > Maybe it is possible to perform other combines before we make G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT?
> > > That is to not create G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT and solve everything in combiner with for
> > > example some pattern that combines multiple artifacts instead of only two.
> >
> > It's not entirely clear what you're suggesting here. I'm particularly confused by the 'that would benefit from narrow scalar' bit since it's not a choice to narrow scalar or not. An operation is either too wide for the target and must be narrowScalar'd or it's not.
> >
> > If your suggestion is to try to form G_SEXT_INREG instructions in a combine pass after the legalizer then I'd say that's too late in the pipeline. There's no guarantee that the combine to form a G_SEXT_INREG will run before a combine that makes them unrecognizable. Ideally we want to make a best effort to form them in a pre-legalizer combiner in addition to the legalizer.
>
> > The point was to consume the input rather than the output as the lower 32-bits are unchanged by the G_SEXT_INREG. Climbing the dependency chain like this allows the uses of %7 to start sooner.
> Something like look through copy, but here it is look through G_SEXT_INREG depending on immediate (operand 2) ?
>
>
> Assuming Artifact combiner is able to perform all mentioned combines, it gets same output as narrowScalar using G_UNMERGE/G_MERGE + D61787.
>
> High bits in
> > `%3:_(s64) = G_MERGE_VALUES %6:_(s32), %5:_(s32)`
> should be sign bit of %6
>
> `%3:_(s64) = G_SEXT_INREG %2:_(s64), 16`
> narrowScalars to:
> ```
> %4:_(s32), %5:_(s32) = G_UNMERGE_VALUES %2:_(s64)
> %8:_(s32) = G_CONSTANT i32 31
> %6:_(s32) = G_SEXT_INREG %4:_, 16
> %7:_(s32) = G_ASHR %6:_, %8:_(s32)
> %3:_(s64) = G_MERGE_VALUES %6:_(s32), %7:_(s32)
> ```
> also
> `%3:_(s64) = G_SEXT %6:_(s32)` has to be narrowScalared/combined away by G_UNMERGE when available.
>
>
>
> I run mentioned example and got following results:
> Setup for test is: applying: D61289, D61289 and D61787, in MipsLegalizerInfo changing mips G_SEXT_INREG to:
> ```
> getActionDefinitionsBuilder(G_SEXT_INREG)
> .legalForTypeWithImm({{s32,8},{s32,16}})
> .maxScalar(0, s32)
> .lower();
> ```
> and changing G_SEXT_INREG to do narrow scalar with G_UNMERGE/G_MERGE always.
>
> running following test with -march=mipsel -global-isel -O0 -stop-after=legalizer
> ```
> define i64 @f(i64 signext %a, i64 signext %b) {
> entry:
> %add = add i64 %a, %b
> %conv = trunc i64 %add to i16
> %conv1 = sext i16 %conv to i64
> ret i64 %conv1
> }
> ```
>
> gives following result:
> ```
> %2:_(s32) = COPY $a0
> %3:_(s32) = COPY $a1
> %4:_(s32) = COPY $a2
> %5:_(s32) = COPY $a3
> %25:_(s32) = G_CONSTANT i32 0
> %22:_(s32) = G_ADD %2, %4
> %26:_(s32) = G_CONSTANT i32 1
> %27:_(s32) = COPY %25(s32)
> %23:_(s32) = G_AND %27, %26
> %16:_(s32) = G_ADD %22, %23
> %24:_(s32) = G_ICMP intpred(ult), %16(s32), %2
> %20:_(s32) = G_ADD %3, %5
> %28:_(s32) = COPY %24(s32)
> %21:_(s32) = G_AND %28, %26
> %18:_(s32) = G_ADD %20, %21
> %32:_(s32) = G_CONSTANT i32 31
> %33:_(s32) = G_SEXT_INREG %16, 16
> %34:_(s32) = G_ASHR %33, %32(s32)
> $v0 = COPY %33(s32)
> $v1 = COPY %34(s32)
> RetRA implicit $v0, implicit $v1
> ```
>
> there are few places for improvement:
> first, lets remove dead instructions (this could take place after main do/while loop in Legalizer.cpp)
> ```
> %2:_(s32) = COPY $a0
>
> %4:_(s32) = COPY $a2
>
> %25:_(s32) = G_CONSTANT i32 0
> %22:_(s32) = G_ADD %2, %4
> %26:_(s32) = G_CONSTANT i32 1
> %27:_(s32) = COPY %25(s32)
> %23:_(s32) = G_AND %27, %26
> %16:_(s32) = G_ADD %22, %23
>
>
>
>
>
> %32:_(s32) = G_CONSTANT i32 31
> %33:_(s32) = G_SEXT_INREG %16, 16
> %34:_(s32) = G_ASHR %33, %32(s32)
> $v0 = COPY %33(s32)
> $v1 = COPY %34(s32)
> RetRA implicit $v0, implicit $v1
> ```
> fragment
> ```
> %25:_(s32) = G_CONSTANT i32 0
> %22:_(s32) = G_ADD %2, %4
> %26:_(s32) = G_CONSTANT i32 1
> %27:_(s32) = COPY %25(s32)
> %23:_(s32) = G_AND %27, %26
> %16:_(s32) = G_ADD %22, %23
> ```
> comes from lower() of
> ```
> %15:_(s1) = G_CONSTANT i1 false
> %16:_(s32), %17:_(s1) = G_UADDE %11:_, %13:_, %15:_
> ```
> from narrowScalar of G_ADD. If we used
> ```
> %16:_(s32), %17:_(s1) = G_UADDO %11:_, %13:_
> ```
> for low bits, we would get
> ```
> %2:_(s32) = COPY $a0
> %4:_(s32) = COPY $a2
> %16:_(s32) = G_ADD %2, %4
> %32:_(s32) = G_CONSTANT i32 31
> %33:_(s32) = G_SEXT_INREG %16, 16
> %34:_(s32) = G_ASHR %33, %32(s32)
> $v0 = COPY %33(s32)
> $v1 = COPY %34(s32)
> RetRA implicit $v0, implicit $v1
> ```
> This should be equivalent to mentioned desired output when G_TRUNC/G_SEXT is used in narrow scalar.
>
>
> >It's not entirely clear what you're suggesting here
> ```
> %10:_(s64) = G_SEXT_INREG %9, 32
> %6:_(s128) = G_SEXT %10(s64)
> %7:_(s32) = G_TRUNC %6(s128)
> ```
>
> Was most likely generated from something like:
> ```
> %11:_(s32) = G_TRUNC %9(s64)
> %10:_(s64) = G_SEXT %11:_(s32)
> %6:_(s128) = G_SEXT %10(s64)
> %7:_(s32) = G_TRUNC %6(s128)
> ```
> I meant that we might be able to figure out that these 4 instructions are equivalent to `%7:_(s32) = G_TRUNC %9(s64)` (combine more then 2 artifact in artifact combiner) instead of combining first two into into G_SEXT_INREG.
>
> >that would benefit from narrow scalar
> Is there a test that produces better/smaller code when when G_SEXT_INREG is narrow scalar-ed with G_TRUNC/G_SEXT instead of G_UNMERGE/G_MERGE? From discussion so far I am convinced that both approaches generate/should generate same output.
> > The point was to consume the input rather than the output as the lower 32-bits are unchanged by the
> > G_SEXT_INREG. Climbing the dependency chain like this allows the uses of %7 to start sooner.
> Something like look through copy, but here it is look through G_SEXT_INREG depending on immediate
> (operand 2) ?
I suppose it can be viewed that way as both end up simplifying the MIR. I think of them as separate things as this is a pattern-match-and-replace whereas look through copy is something that's done to help the pattern-match part of that by ignoring instructions that aren't relevant to whether the pattern should match or not.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D61289/new/
https://reviews.llvm.org/D61289
More information about the llvm-commits
mailing list