[llvm-dev] GlobalISel legalization guarantees

Wed Jul 19 14:40:18 PDT 2017

Hi Tim,

Thanks for writing that up.

The way I initially envisioned it could be summarized in the following pseudo algorithm:
List IllegalOps = gatherAllIllegalOps;
List LegalizationArtifacts;
do {
  Assert(LegalizationArtifacts.empty());
  // Do the bulk of legalization.
  While (IllegalOps.empty()) {
    Instr I = IllegalOps.pop_back_val();
    // Legalizing an instruction may create some (less) illegal instructions and
    // LegalizationArtifact like trunc, anyext, merge, unmerge.
    Legalize(I, &IllegalOps, &LegalizationArtifacts);
  }
  // Clean-ups the artifacts.
  While (LegalizationArtifacts.empty()) {
     Instr I = LegalizationArtifacts.pop_back_val();
     // An artifact may not be cleaned up, in that case, it needs to be legalize
     CleanUpArtifact(I, &LegalizationArtifacts, &IllegalOps);
  }
} while(!IllegalOps.empty());

So roughly, legalize, then clean-up (via combine), and legalize again what’s left.
The clean-up part should move the artifacts (unmerge, anyext, and so on) past PHIs.
In the end, it should remain only things that are fed via legal instructions or fed legal instructions. Thus the set of things to legalize (via loads and stores, if the target wants something fancier, it custom lowers it) is much narrower and somewhat expected by the target.

> On Jul 19, 2017, at 2:08 PM, Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi all,
> 
> I've been thinking about what guarantees the legalization pass needs
> to make in GlobalISel and how it should do it. Turns out it's really
> quite difficult so I thought I'd send a message out to the list with
> the current status and my thoughts for the future.
> 
> It got really quite long unfortunately, so the important bit is
> probably the second section "High level solutions" on scary changes
> that I can't see a way around.
> 
> Does anyone else either see issues I've missed with guaranteeing
> legalization, or see simpler solutions (please please please)?
> 
> Cheers.
> 
> Tim.
> 
> ==========
> Background
> ==========
> 
> By some means we need the legalizer to be able to completely remove all illegal
> types (a nebulous concept, but it should be obvious that you don't want to try
> RegBankSelect or ISel on a s1234). However, each stepwise change of the
> legalizer has the input and output VRegs of an instruction as its interface so
> it cannot remove these types.
> 
> So, the bulk of legalization for larger types is expressed in terms of
> G_MERGE_VALUES and G_UNMERGE_VALUES instructions, with the intent that a final
> combining pass will be run after the main legalization to remove matching pairs
> and leave the MachineFunction legal.
> 
> However, it does an incomplete job right now.
> 
> In ~630 cases (in the test-suite at -O0) we have a G_UNMERGE_VALUES but it can't
> be matched up with G_MERGE_VALUES and removed:
> 
> 600 G_UNMERGE hits a PHI

That’s fine, we should push them around.

> 30  G_UNMERGE hits an AArch64 intrinsic (e.g. @llvm.aarch64.ld2)

What does this mean for the lowering?

> 
> On the other hand about 8000 G_MERGE_VALUES remain after the combining that
> don't match a G_UNMERGE_VALUES:
> 
> 5300 used by a G_STORE
> 3090 used by a COPY (to physregs or dead?).
> 60 used by a G_EXTRACT
> 15 are dead

What is the total #instrs?
I’d like to have a sense of how big that is.

Also, how much of that could be avoided by splitting the structures up-front? (I expect this is what the G_STOREs come from.)
Why can’t we push the unmerge to the G_STORE/COPY by splitting them?

> 
> There are also more or less theoretical issues that don't arise in the
> test-suite but need to be resolved. In particular even when a G_MERGE_VALUES is
> consumed by a G_UNMERGE_VALUES the input types of the merge may not be the same
> as the outputs of the unmerge. The only way to resolve this conflict is by
> inserting explicit extract/insert instructions which may or may not themselves
> be legal (at least in principle).
> 
> ====================
> High level solutions
> ====================
> 
> Unless anyone comes up with more bright ideas, I think we can see the following
> complications before we get to the state where the legalizer has a hope of
> replacing SDAG with a straight face:
> 
> 1. G_MERGE_VALUES and G_UNMERGE_VALUES will not always be eliminated and must be
>   mappable+selectable to complex multi-instruction, multi-register
>   copy/inserts.

That’s expected, and after the combines/clean-ups are done, the generic code should go with loads and stores.

We need to understand why we get so many though.

> 2. We must have a complex enough combiner in the legalizer pass alone to perform
>   multi-step legalization intermixed with combines in the hope that we reach a
>   fixed point.

Yup, that was the plan.
We should already support multi-step legalization, right?

> 
> Below is a bit of expanded detail on the issues found by the test-suite
> above. I'm not quite sure how coherent it is, but it made sense to me so I can
> probably try to reword if anything is unclear.
> 
> =========================================
> Expanded description of test-suite issues
> =========================================
> 
> G_STORE and G_EXTRACT
> ---------------------
> 
> These are instances of a merge feeding legal instructions. I believe this has to
> be permitted but it means that selection is going to contend with a wide range
> of possible different merge instructions. At best all legal input and output
> register classes must be handled.
> 
> On AArch64 the likely list is:
> 
>  + "mov vD.2d[0], dN; mov vD.2d[1], dM" (and s variants with 4 insts).
>  + "mov vD.4s[0], wN, ..." (and d variants).
>  + "ubfi xD, xN, #0, #32; ubfi xD, xM, #32, #32"
>  + "COPY; COPY; COPY" or multiples of the above when stitching together a QQQ.
> 
> Machine intrinsics
> ------------------
> 
> The versions of this problem I've seen come from code like this::
> 
>  {<2 x i32>, <2 x i32>, <2 x i32>} %val = call {...} @llvm.aarch64.ld3(%ptr)
>  %vec0 = extractelement {...} %val, 1
> 
> which becomes::
> 
>  s192 = G_INTRINSIC_W_SIDE_EFFECTS …

Isn’t this lower into something like:
Vreg1, vreg2, vreg3 by call lowering.

>  s64, s64, s64 = G_UNMERGE_VALUES s192
> 
> Because intrinsics have to have sane types (they're not really legalized
> currently, even in SDag) this is very much like the issue with G_STORE and
> G_EXTRACT above. We probably have to support RegBankSelect and ISel on this kind
> of G_UNMERGE_VALUES, which becomes a tractable but complex sequence of copies.

If that’s legal, it should indeed by supported by RBS and ISel. If not, then the legalizer should lower them.

> 
> PHIs
> ----
> 
> For example:
> 
> ::
>   s128 = PHI [%v0, BB#0], [%v1, BB#1]
>   s64, s64 = G_UNMERGE_VALUES s128
> 
> I believe we have to split the PHI into sizes corresponding to the
> G_UNMERGE_VALUES. If these correspond to the sizes of the (expected) merges on
> the other side that's not too bad::
> 
>  s64 = PHI [%v0.0, BB#0], [%v1.0, BB#1]
>  s64 = PHI [%v0.1, BB#0], [%v1.1, BB#1]

Yep.

> 
> On the other hand we also have the possibility of mismatching types here, with
> the added fun that PHI instructions have to be the first in their block.

What’s the problem here.

> 
> I believe we'll have to insert messy extract/insert code into a separate,
> fallthrough block just before this one. The main question is whether it should
> be done as part of the main legalization loop or in the post-processing pass.

Clean-up part.

> 
> Since we have to deal with illegal extracts & inserts anyway, delaying PHIs
> until everything else is decided seems reasonable to me.
> 
> Mismatching types in paired merge/unmerge
> -----------------------------------------
> 
> Friendly example::
> 
>  s384 = G_MERGE_VALUES s192, s192
>  s128, s128, s128 = G_UNMERGE_VALUES s384
> 
> Both of these instructions have to go (s384 is likely horribly illegal), but
> they can't simply be combined away. The first thing to try is converting to
> extracts & inserts::
> 
>  s128 = G_EXTRACT s192, 0
> 
>  s64 = G_EXTRACT s192, 128
>  s64 = G_EXTRACT s192, 0
>  s128 = G_IMPLICIT_DEF
>  s128 = G_INSERT s128, s64, 0
>  s128 = G_INSERT s128, s64, 64
> 
>  s128 = G_EXTRACT s192, 64

I would go with just loads and stores:
Store s192
Store s192
s128 = load
s128 = load
s128 = load

> 
> The good side of this is that I believe it is capable of eliminating the bad
> s384 type. The bad side is that we have no reason to expect the actual extract
> and insert operations to be legal. Each one potentially corresponds to some
> reasonably complex shifting and masking across multiple registers.
> 
> To get around this we could either demand that extract and insert **are** legal
> whenever a type has an available register class, or attempt to legalize them in
> turn (leading to more merges and a fixed-point at the end if we're lucky).

The only way I can see we end up in this situation is when the input IR has already weird (struct) bit cast. At this point, I’d say garbage in, garbage out.

Hope this helps.

Cheers,
-Quentin

> 
> Used by COPY
> ------------
> 
> Too many to be sure what's going on, but a sampling suggests most are either
> ending up in PhysRegs (fine) or dead.
> 
> A merge being used by dead COPYs could be a problem.
> 
> 
> Dead instructions
> -----------------
> 
> Not a concern. Delete them if we want, or maybe ignore them.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev