[PATCH] D61787: [GlobalISel Legalizer] Improve artifact combiner

Wed Jul 3 17:17:19 PDT 2019

dsanders added a comment.

In D61787#1515685 <https://reviews.llvm.org/D61787#1515685>, @Petar.Avramovic wrote:

> 'crash' = unable-to-legalize error/warning/nothing depending on -global-isel-abort option.
>
> > It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass.
>
> That is already the case. Mentioned tests already used to crash but now they always crash on instruction that is root of problem (here it is most of the times G_TRUNC for vector) .
>  About mentioned legalization rules violations:
>  Tests that finish legalization successfully are unchanged (except one where we perform additional G_UNMERGE/G_MERGE combine). 
>  "Violations" happen for Artifacts in tests where mir output was illegal anyway.
>  That it Artifacts are either waiting for it uses to be resolved first or are about to be legalized. It just happened that there was some "artifact"/"auxiliary artifact"/instruction that was not legal.
>
>   @arsenm Do you have any comments?

I see your point that those tests to fail to legalize before and after this patch. Because the legalization order is made much more complicated by this patch you end up seeing the problematic instruction (usually G_TRUNC) before legalizing the other instructions which is confusing but isn't functionally different. I don't think anything has really changed in terms of what they fail to legalize on though, it just comes down to which problem you see first.

>> what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.
> 
> The primary goal of this patch is adding support for combining of chained artifacts.

This answer appears to say that the primary goal of making the change is to make the change :-)

> About side effects/benefits:
>  Remove some of the rules for artifacts. The ones that say artifact is legal for a type but we know it is not selectable for that type.

You shouldn't have any rules in this category. An instruction is legal if and only if it is guaranteed to be selectable. (Note: that doesn't necessarily mean that the instruction selector handles it, just that something after the legalizer is guaranteed to handle it).

> Suppose that target cannot select `s128 = G_MERGE s32 s32 s32 s32` nor  `s32 s32 s32 s32 = G_UNMERGE s128`. With current artifact combiner we could encounter following situations:
> 
> 1. G_UNMERGE combines with G_MERGE, that was easy.

Yep, the majority of G_MERGE/G_UNMERGE's are eliminated this way.

> 2. There was no G_UNMERGE, that could combine with G_MERGE. Then let's legalize G_MERGE, and hope for the best. G_MERGE is now legal, removed from all GlobalISel observer lists and waits for legalization to finish so that it can be regbankselected. During legalization of some other instruction G_UNMERGE appears and combines away G_MERGE. This was fortunate sequence of events as we couldn't select this G_MERGE if it "escaped" from Legalizer. But in LegalizerInfo it says that G_MERGE is legal for {s128, s32} despite the fact that we cannot select it. Why? Because that is how current Artifact combiner works.

It's wrong that your LegalizerInfo says G_MERGE_VALUES is legal for {s128, s32} if you cannot select it.

The main question to ask here is: How did you get a G_MERGE_VALUES whose result isn't consumed by either a legal operation or a G_UNMERGE_VALUES (or G_TRUNC) that breaks it up into something a legal operation can consume? This is supposed to be an impossible situation. While typing up an explanation of why it's impossible, I actually ran into the root problem I think you're trying to describe which is that if the LLVM-IR contains types that are bigger than the biggest register in the backend, then there's some gaps in the combine rules that leave behind some dangling G_MERGE/G_UNMERGE's.  Essentially, the narrower type used by the return value isn't propagating into the earlier code and causing excess bits to die.

Suppose we have the following AArch64 MIR:

  %0:_(s64) = COPY $x0
  %1:_(s64) = COPY $x1
  %2:_(s1024) = G_SEXT %0(s64)
  %3:_(s1024) = G_SEXT %1(s64)
  %4:_(s1024) = G_OR %2, %3
  %5:_(s64) = G_TRUNC %4(s1024)
  $x0 = COPY %5(s64)
  RET_ReallyLR implicit $x0

(yes, you really do need s1024 to trigger this on AArch64 thanks to 512-bit operations :-D)
and our legalization rules are such that s64 is the only type we can use in operations. 960-bits of the intermediates are redundant as the function only returns 64-bits. A first pass of legalization over the non-artifacts gives:

  %0:_(s64) = COPY $x0
  %1:_(s64) = COPY $x1
  %2:_(s1024) = G_SEXT %0:_(s64)
  %3:_(s1024) = G_SEXT %1:_(s64)
  %6:_(s64), %7:_(s64), %8:_(s64), %9:_(s64), %10:_(s64), %11:_(s64), %12:_(s64), %13:_(s64), %14:_(s64), %15:_(s64), %16:_(s64), %17:_(s64), %18:_(s64), %19:_(s64), %20:_(s64), %21:_(s64) = G_UNMERGE_VALUES %2:_(s1024)
  %22:_(s64), %23:_(s64), %24:_(s64), %25:_(s64), %26:_(s64), %27:_(s64), %28:_(s64), %29:_(s64), %30:_(s64), %31:_(s64), %32:_(s64), %33:_(s64), %34:_(s64), %35:_(s64), %36:_(s64), %37:_(s64) = G_UNMERGE_VALUES %3:_(s1024)
  %38:_(s64) = G_OR %6:_, %22:_
  %39:_(s64) = G_OR %7:_, %23:_
  %40:_(s64) = G_OR %8:_, %24:_
  %41:_(s64) = G_OR %9:_, %25:_
  %42:_(s64) = G_OR %10:_, %26:_
  %43:_(s64) = G_OR %11:_, %27:_
  %44:_(s64) = G_OR %12:_, %28:_
  %45:_(s64) = G_OR %13:_, %29:_
  %46:_(s64) = G_OR %14:_, %30:_
  %47:_(s64) = G_OR %15:_, %31:_
  %48:_(s64) = G_OR %16:_, %32:_
  %49:_(s64) = G_OR %17:_, %33:_
  %50:_(s64) = G_OR %18:_, %34:_
  %51:_(s64) = G_OR %19:_, %35:_
  %52:_(s64) = G_OR %20:_, %36:_
  %53:_(s64) = G_OR %21:_, %37:_
  %4:_(s1024) = G_MERGE_VALUES %38:_(s64), %39:_(s64), %40:_(s64), %41:_(s64), %42:_(s64), %43:_(s64), %44:_(s64), %45:_(s64), %46:_(s64), %47:_(s64), %48:_(s64), %49:_(s64), %50:_(s64), %51:_(s64), %52:_(s64), %53:_(s64)
  %5:_(s64) = G_TRUNC %4:_(s1024)
  $x0 = COPY %5:_(s64)
  RET_ReallyLR implicit $x0

then a pass over the artifacts should give this but actually does nothing (it might take a couple iterations to get rid of the dead code, I've deleted it early for brevity):

  %0:_(s64) = COPY $x0
  %1:_(s64) = COPY $x1
  %57:_(s64) = G_ASHR %0:_(s64), i64 63
  %2:_(s1024) = G_MERGE_VALUES %0:_(s64), %57:_(s64), %57:_(s64), ...
  %56:_(s64) = G_ASHR %1:_(s64), i64 63
  %3:_(s1024) = G_MERGE_VALUES %1:_(s64), %56:_(s64), %56:_(s64), ...
  %58:_(s512), %59:_(s512) = G_UNMERGE_VALUES %2:_(s1024)
  %6:_(s64), %7:_(s64), %8:_(s64), %9:_(s64), %10:_(s64), %11:_(s64), %12:_(s64), %13:_(s64) = G_UNMERGE_VALUES %58:_(s512)
  %60:_(s512), %61:_(s512) = G_UNMERGE_VALUES %3:_(s1024)
  %22:_(s64), %23:_(s64), %24:_(s64), %25:_(s64), %26:_(s64), %27:_(s64), %28:_(s64), %29:_(s64) = G_UNMERGE_VALUES %60:_(s512)
  %38:_(s64) = G_OR %6:_, %22:_
  $x0 = COPY %38:_(s64) // %5 -> %38
  RET_ReallyLR implicit $x0

then another pass over the changed artifacts should give:

  %0:_(s64) = COPY $x0
  %1:_(s64) = COPY $x1
  %57:_(s64) = G_ASHR %0:_(s64), i64 63
  %65:_(s512) = G_MERGE_VALUES %0:_(s64), %57:_(s64), %57:_(s64), ... // fewer %56's than before
  %56:_(s64) = G_ASHR %1:_(s64), i64 63
  %66:_(s512) = G_MERGE_VALUES %1:_(s64), %56:_(s64), %56:_(s64), ... // fewer %56's than before
  %6:_(s64), %7:_(s64), %8:_(s64), %9:_(s64), %10:_(s64), %11:_(s64), %12:_(s64), %13:_(s64) = G_UNMERGE_VALUES %65:_(s512)
  %22:_(s64), %23:_(s64), %24:_(s64), %25:_(s64), %26:_(s64), %27:_(s64), %28:_(s64), %29:_(s64) = G_UNMERGE_VALUES %66:_(s512)
  %38:_(s64) = G_OR %6:_, %22:_
  $x0 = COPY %38:_(s64) // %5 -> %38
  RET_ReallyLR implicit $x0

and one more

  %0:_(s64) = COPY $x0
  %1:_(s64) = COPY $x1
  %57:_(s64) = G_ASHR %0:_(s64), i64 63 // This is dead but hasn't been noticed yet
  %56:_(s64) = G_ASHR %1:_(s64), i64 63 // This is dead but hasn't been noticed yet
  %38:_(s64) = G_OR %0:_, %1:_
  $x0 = COPY %38:_(s64) // %5 -> %38
  RET_ReallyLR implicit $x0

The main bit that's missing is a combine that changes:

  %1 = G_MERGE_VALUES %0, ...
  %2 = G_TRUNC %1

to this when the types are ok:

  %2 = %0

or with a G_TRUNC or a smaller G_MERGE_VALUES when they differ.

There is also another means of doing the right thing. If G_TRUNC correctly specified the min/max types with clampScalar() and a G_TRUNC narrowScalar were implemented as G_UNMERGE_VALUES/G_MERGE_VALUES then the MERGE/UNMERGE would pair up and use the existing combines. Similarly for G_SEXT. The snag there is the one I think you're trying to solve which is that the MERGE/UNMERGE have already left the artifact list and won't retry the combines again.

I think adding the missing combine(s) is my first preference at the moment but I ought to give it a bit more thought and weigh up the different options. I'm hoping a small number of combines prevent us from getting into the situations that lead us to complicating the processing order. One other thought that I haven't completely gone through is that artifacts should only be artifacts if the legalizer produced them. In other words, on the first pass things like G_TRUNC are treated just like every other instruction and only those that are created by the legalizer would be artifacts. This would ensure that the first artifact combiner pass can see all the merge/unmerge pairs caused by the first pass instead of some of them turning up on the second pass.

> 3. G_UNMERGE couldn't find G_MERGE to combine with. G_UNMERGE is "Artifact" nothing can combine it away, then let's make it legal. G_UNMERGE is now a legalized instruction that is no more in ArtifactList and waits legalizer to end so that it can crash in Instruction Select. But there is more: other instruction get legalized (e.g. G_SEXT_INREG get narrowscalar-ed with G_UNMERGE/G_MERGE) and the G_MERGE appears. G_UNMERGE could combine with this G_MERGE but it is not in ArtifactList and we end up with uncombined Artifacts.

I believe this is the same problem as 2. It's wrong for G_UNMERGE_VALUE to be legal for types you can't handle later in the backend and the same gap in the artifact combiners is the root cause of illegal type combinations surviving the combiner.

> Artifact combiner at the moment does 1 and 2. With this change Artifact combiner does 1,2 and 3(meaning it can perform this combine) + combines of type 2. are performed without requirement to have Artifacts legal for types that are not Instr Selectable ( in example above: {s128, s32} for G_MERGE and {s32, s128} for G_UNMERGE). 
>  By the way 3. is the reason D61289 <https://reviews.llvm.org/D61289> can't have mir test for narrow scalar of G_SEXT_INREG, with expected optimal output.

That patch has unit tests that check narrow scalar of G_SEXT_INREG works correctly. The reason you can't have an MIR test for it in that patch is that no target does anything other than lower(). D61290 <https://reviews.llvm.org/D61290> makes narrowScalar a possibility for AArch64. The reason there isn't an MIR test for it there is that it's already covered by the unit tests.

> With this change
>  Target that has only s32 as legal type (everything else is narrow/widen scalar-ed to s32) is free to declare all artifacts as unsupported as a last rule. This is it could have to narrowScalar G_ANYEXT to s32 but there is no need to declare G_ANYEXT as legal for any types.

At minimum you need G_TRUNC/G_ANYEXT between the types produced by legal operations to types consumed by legal operations. They're the last means of changing the size of a type that legalization can rely on.

> We could argue that this is not that beneficial by itself, but since it is a consequence of allowing combine of chained artifacts, I would say it makes legalization rules for artifacts much more straightforward.
> 
>> Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)
> 
> Instruction legalization can already be done in any order, but artifact combine has to wait for all instructions to be legalized first, right?

We do a pass over the non-artifacts, then a pass over the artifacts and repeat that cycle until we're finished. There's no actual requirement to do distinct passes like this, it just made it easier to understand, implement, and be sure we were combining away all the problematic artifacts.

> Now we allow for chained artifacts to wait for another artifact/instruction to be handled first by introducing TryLaterArtifactList. To me, this feels like a natural extension of existing algorithm.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61787/new/

https://reviews.llvm.org/D61787