[PATCH] D61787: [GlobalISel Legalizer] Improve artifact combiner

Thu May 23 15:58:12 PDT 2019

dsanders requested changes to this revision.
dsanders added a comment.
This revision now requires changes to proceed.

Hi Petar,

I haven't gone through the code for this but I've gone through the test changes. It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass. The most common case was neglecting to scalarize G_ANYEXT in the AMDGPU tests when the legalization rules would have called `.scalarize(0)` but there were a couple others.

At the high level, I'm not very keen on complicating the legalizer further by splitting the artifacts into multiple kinds and specifying a processing order. Personally I'd prefer not to have artifacts at all but they were a necessary concession to resolve two (somewhat conflicting) requirements:

- Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)
- The MIR must be valid after each legalization step

That said, I could be persuaded to complicate things if it buys a significant win. At the moment it looks like a correctness regression (going by the tests) but assuming the correctness issues were fixed, what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.

================
Comment at: test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir:5
-# ERROR: unable to legalize instruction: %3:_(s64) = G_ANYEXT %1:_(s4) (in function: test_unmerge_s4)
+# RUN: llc -march=aarch64 -O0 -run-pass=legalizer %s -o -  | FileCheck %s

 ---
----------------
Petar.Avramovic wrote:
> Originally: 
>  We crash on attempt to legalize `%3:_(s64) = G_ANYEXT %1:_(s4)`.
>  When we attempt to combine Artifact 
>  `%3:_(s64) = G_ANYEXT %1:_(s4)` 
>  no combine is available and this gets moved to InstList even though it had unresolved use 
>  `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
>  Also,
>  `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
>  is the next artifact that had nothing to combine with and is moved to InstList.
>  In InstList 
>  `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
>  is legalized with widen scalar to s8 and in the process 
>  `%1:_(s4) = G_TRUNC %9:_(s8)` is created.
>  Next,  
>  `%3:_(s64) = G_ANYEXT %1:_(s4)`
>  is legalized like an instruction, and we crash.
>  However, we could have combined this with 
>  `%1:_(s4) = G_TRUNC %9:_(s8)`
>  (if we were patient).
> 
> Now: 
>  `%3:_(s64) = G_ANYEXT %1:_(s4)`
>  has unresolved use and is moved to TryLaterArtifactList. 
>  After 
>  `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
>  is legalized (it did not have unresolved use)
>  `%1:_(s4) = G_TRUNC %9:_(s8)` 
>  is created. Later 
>  `%3:_(s64) = G_ANYEXT %1:_(s4)` 
>  performs combine. Legalization completes successfully.
>  I removed global-isel-abort=0 since we don't crash any more. 
>  `%9:_(s8), %10:_(s8) = G_UNMERGE_VALUES %8(s16)` 
> will crash in instruction select. 
>  Since this UNMERGE was declared as legal for pair `{s8, s16}` Artifact Combiner did good job.
Just to clarify terminology here. When you say 'crash' you don't appear to mean a crash but rather an unable-to-legalize error. Is that right?

This test is artificial and should probably be moved to the unit tests. The intent is to test the expansion of widenScalar on G_UNMERGE_VALUES and is making use of an impossible input to ensure that the legalizer picks widenScalar for it. There should be no way to produce `s4,s4 = G_UNMERGE_VALUES s8` in the current AArch64 target since the IRTranslator doesn't create G_UNMERGE_VALUES and the legalizer would only create them if an s8 operation narrowScalar'd to s4 which isn't the case for AArch64. The closest you can get from valid input is:
  %1:_(s4) = G_TRUNC %0
  %2 = G_ANYEXT %1:_(s4)

If we pretend that the input was valid, then AArch64 is breaking one of the few legalization rule requirements which is that G_ANYEXT must be legal for all type combinations that can occur (and makes sense there must be a legal G_ANYEXT that can consume the legal output of any other instruction that produces a scalar output and must add a `.legalFor({X, s4})` to its G_ANYEXT rules

It looks like we never actually wrote down the minimum legalization rules. I'll look into that

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir:56

 ---
 name: test_sext_trunc_v2s32_to_v2s8_to_v2s16
----------------
Petar.Avramovic wrote:
>  We start with following MIR:
> ```
>   %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
>   %1:_(<2 x s8>) = G_TRUNC %0
>   %2:_(<2 x s16>) = G_SEXT %1
>   $vgpr0 = COPY %2
> ```
>  `%2:_(<2 x s16>) = G_SEXT %1` 
>  is combined, as a result we are left with:
> ```
>   %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
>   %3:_(s16) = G_CONSTANT i16 8
>   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
>   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>   %6:_(<2 x s16>) = G_SHL %5:_, %4:_(<2 x s16>)
>   %2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)
>   $vgpr0 = COPY %2:_(<2 x s16>)
> ```
> Originally: 
>  Two Artifacts failed to combine and ended up in InstList:
> ```
>   ...
>   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
> ```
>  `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)` 
>  was legal, 
>  We have crash on `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
> 	
> Now:
> ```
>   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
> ```
>  are Auxiliary Artifacts and will have to wait for something to combine with them.
>  Eventually `%11:_(s16), %12:_(s16) = G_UNMERGE_VALUES %4:_(<2 x s16>)`
>  from legalization of `%2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)`
>  will combine with `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)`
> 
>  Combine attempt of `%19:_(s16), %20:_(s16) = G_UNMERGE_VALUES %5:_(<2 x s16>)`
>  will move `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)` to InstList
>  Once we finish with artifacts, we crash on attempt to legalize Instruction 
>  `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
AMDGPU hasn't marked any G_TRUNC's legal. I think this test is behaving correctly until it adds the minimum set of legal truncates for their target. The minimum set is the same as for G_ANYEXT but with the type indices swapped.

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-zext.mir:46

 ---
 name: test_zext_trunc_v2s32_to_v2s8_to_v2s16
----------------
Petar.Avramovic wrote:
> In:
> ```
>   %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
>   %1:_(<2 x s8>) = G_TRUNC %0
>   %2:_(<2 x s16>) = G_ZEXT %1
>   $vgpr0 = COPY %2
> ```
>  `%2:_(<2 x s16>) = G_ZEXT %1` 
>  gets combined first into:
> ```
>   %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
>   %3:_(s16) = G_CONSTANT i16 255
>   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
>   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>   %2:_(<2 x s16>) = G_AND %5:_, %4:_
>   $vgpr0 = COPY %2:_(<2 x s16>)
> ```
> Originally: 
> All new instructions (created in process of G_ZEXT combine) fail to combine and get moved to InstList:
> ```
>   %3:_(s16) = G_CONSTANT i16 255
>   %2:_(<2 x s16>) = G_AND %5:_, %4:_
>   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
> ```
>  `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)`
>  was legal and we crash on:
>  `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
> 
> Now: 
>  `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
>  is left to be processed as the last one, since it cannot perform combine (Auxiliary Artifact)
>  `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)`
>  G_BUILD_VECTOR is also Auxiliary Artifact. 
>  InstList has
> ```
>   %3:_(s16) = G_CONSTANT i16 255
>   %2:_(<2 x s16>) = G_AND %5:_, %4:_
> ```
>  they get legalized and we get:
> ```
>   %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
>   %6:_(s32) = G_CONSTANT i32 255
>   %3:_(s16) = G_TRUNC %6:_(s32)
>   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
>   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>   %2:_(<2 x s16>) = G_AND %5:_, %4:_
>   $vgpr0 = COPY %2:_(<2 x s16>)
> ```
>  This is an interesting situation, there are no artifacts in ArtifactList or Instructions in InstList, so we are left with Auxiliary Artifacts. 
>  None of real artifacts was able to combine with them and they are all moved to InstList:
> ```
>    %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
>    %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
>    %3:_(s16) = G_TRUNC %6:_(s32) 
> ```
>   `%3:_(s16) = G_TRUNC %6:_(s32)` 
>  was legal. Crash on `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
> 
>  Difference in diff comes from legalization of  `%3:_(s16) = G_CONSTANT i16 255`
> 
This one comes down to AMDGPU not defining any legal G_TRUNC's too

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir:514
     ; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s8>) = G_BUILD_VECTOR [[TRUNC2]](s8), [[TRUNC3]](s8), [[TRUNC4]](s8), [[TRUNC5]](s8)
-    ; CHECK: [[UV8:%[0-9]+]]:_(s8), [[UV9:%[0-9]+]]:_(s8), [[UV10:%[0-9]+]]:_(s8), [[UV11:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<4 x s8>)
-    ; CHECK: [[ANYEXT8:%[0-9]+]]:_(s32) = G_ANYEXT [[UV8]](s8)
-    ; CHECK: [[ANYEXT9:%[0-9]+]]:_(s32) = G_ANYEXT [[UV9]](s8)
-    ; CHECK: [[ANYEXT10:%[0-9]+]]:_(s32) = G_ANYEXT [[UV10]](s8)
-    ; CHECK: [[ANYEXT11:%[0-9]+]]:_(s32) = G_ANYEXT [[UV11]](s8)
-    ; CHECK: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[ANYEXT8]](s32), [[ANYEXT9]](s32), [[ANYEXT10]](s32), [[ANYEXT11]](s32)
-    ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<4 x s32>)
+    ; CHECK: [[ANYEXT8:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[BUILD_VECTOR]](<4 x s8>)
+    ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[ANYEXT8]](<4 x s32>)
----------------
This change is violating AMDGPU's legalization rules. G_ANYEXT is defined as:
  getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})
    .legalFor({{S64, S32}, {S32, S16}, {S64, S16},
               {S32, S1}, {S64, S1}, {S16, S1},
               // FIXME: Hack
               {S64, LLT::scalar(33)},
               {S32, S8}, {S128, S32}, {S128, S64}, {S32, LLT::scalar(24)}})
    .scalarize(0);
which is instructing the legalizer to scalarize all vector types.

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir:444

 ---
 name: extract_s8_v4s8_offset0
----------------
Petar.Avramovic wrote:
> In test function extract_s8_v4s8_offset0
> ```
>   %3:_(<4 x s32>) = G_IMPLICIT_DEF
>   %0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
>   %1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
>   %2:_(s32) = G_ANYEXT %1:_(s8)
>   $vgpr0 = COPY %2:_(s32)
>  ```
> Originally: 
> ```
>   %1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0 was legalized first into: 
>   %3:_(<4 x s32>) = G_IMPLICIT_DEF
>   %0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
>   %4:_(<4 x s16>) = G_ANYEXT %0:_(<4 x s8>)
>   %5:_(s16) = G_EXTRACT %4:_(<4 x s16>), 0
>   %1:_(s8) = G_TRUNC %5:_(s16)
>   %2:_(s32) = G_ANYEXT %1:_(s8)
>   $vgpr0 = COPY %2:_(s32)
> ```
>  then  
>  `%2:_(s32) = G_ANYEXT %1:_(s8)` 
>  is legalized (!), it turned out to be legal
>  but it should have been combined with 
>  `%1:_(s8) = G_TRUNC %5:_(s16)`
>  
>  `%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)`
>  is next to legalize and we crash.
> 
> Now:
> ```
>  %1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
>  %2:_(s32) = G_ANYEXT %1:_(s8)
> ```
>  are chained to 
>  `%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)`
>  G_TRUNC gets legalized first (and we crash)
> 
>  same story for test functions:
>  extract_s8_v4s8_offset8
>  extract_s8_v4s8_offset16
>  extract_s8_v4s8_offset24
>  extract_s8_v3s8_offset16
>  extract_s8_v5s1_offset4
This one boils down to AMDGPU not declaring any legalization rules for G_TRUNC

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir:911
 ---
 name: extract_v2s16_v5s16_offset0
 body: |
----------------
Petar.Avramovic wrote:
> in test function extract_v2s16_v5s16_offset0
> ```
>   %3:_(<6 x s32>) = G_IMPLICIT_DEF
>   %2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
>   %0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
>   %1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
>   $vgpr0 = COPY %1:_(<2 x s16>)
> ```
> Originally:
>  All Artifacts:
> ``` 
>   %1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
>   %0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
>   %2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
> ```
>  fail to combine and are moved to InstList that now contains (processed bottom up):
> ```
>   %2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
>   %0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
>   %1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
> ```
>  `%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0`
>  is legalized with moreElements, 
>  `%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0`
>  is legal, and we crash on 
>  `%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)`
> 
> Now: 
>  Since we have chained artifacts 
>  `%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)`
>  is legalized first and we crash. Remaining two wait for this legalization to finish.
This one boils down to AMDGPU not declaring any legalization rules for G_TRUNC

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir:907
     ; CHECK: [[EXTRACT:%[0-9]+]]:_(<5 x s16>) = G_EXTRACT [[TRUNC]](<6 x s16>), 0
-    ; CHECK: [[DEF1:%[0-9]+]]:_(<6 x s32>) = G_IMPLICIT_DEF
-    ; CHECK: [[TRUNC1:%[0-9]+]]:_(<6 x s16>) = G_TRUNC [[DEF1]](<6 x s32>)
-    ; CHECK: [[INSERT:%[0-9]+]]:_(<6 x s16>) = G_INSERT [[TRUNC1]], [[EXTRACT]](<5 x s16>), 0
-    ; CHECK: [[EXTRACT1:%[0-9]+]]:_(<2 x s16>) = G_EXTRACT [[INSERT]](<6 x s16>), 0
+    ; CHECK: [[EXTRACT1:%[0-9]+]]:_(<2 x s16>) = G_EXTRACT [[EXTRACT]](<5 x s16>), 0
     ; CHECK: $vgpr0 = COPY [[EXTRACT1]](<2 x s16>)
----------------
This change is violating AMDGPU's legalization rules. The relevant one for the G_EXTRACT is:
        .moreElementsIf(isSmallOddVector(BigTyIdx), oneMoreElement(BigTyIdx))
which is supposed to add an extra element if the number of elements is odd and the size of the elements is <32. The legalIf that preceeds it only matches if the bit type is a multiple of 32 in size but 5*16 isn't

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir:324
     ; CHECK: [[EXTRACT:%[0-9]+]]:_(<3 x s1>) = G_EXTRACT [[TRUNC]](<4 x s1>), 0
-    ; CHECK: [[UV:%[0-9]+]]:_(s1), [[UV1:%[0-9]+]]:_(s1), [[UV2:%[0-9]+]]:_(s1) = G_UNMERGE_VALUES [[EXTRACT]](<3 x s1>)
-    ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UV]](s1)
-    ; CHECK: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[UV1]](s1)
-    ; CHECK: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[UV2]](s1)
-    ; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<3 x s32>) = G_BUILD_VECTOR [[ANYEXT]](s32), [[ANYEXT1]](s32), [[ANYEXT2]](s32)
-    ; CHECK: $vgpr0_vgpr1_vgpr2 = COPY [[BUILD_VECTOR]](<3 x s32>)
+    ; CHECK: [[ANYEXT:%[0-9]+]]:_(<3 x s32>) = G_ANYEXT [[EXTRACT]](<3 x s1>)
+    ; CHECK: $vgpr0_vgpr1_vgpr2 = COPY [[ANYEXT]](<3 x s32>)
----------------
This is another case of violating AMDGPU's G_ANYEXT legalization rules by not scalarizing vector types

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-or.mir:514
     ; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s8>) = G_BUILD_VECTOR [[TRUNC2]](s8), [[TRUNC3]](s8), [[TRUNC4]](s8), [[TRUNC5]](s8)
-    ; CHECK: [[UV8:%[0-9]+]]:_(s8), [[UV9:%[0-9]+]]:_(s8), [[UV10:%[0-9]+]]:_(s8), [[UV11:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<4 x s8>)
-    ; CHECK: [[ANYEXT8:%[0-9]+]]:_(s32) = G_ANYEXT [[UV8]](s8)
-    ; CHECK: [[ANYEXT9:%[0-9]+]]:_(s32) = G_ANYEXT [[UV9]](s8)
-    ; CHECK: [[ANYEXT10:%[0-9]+]]:_(s32) = G_ANYEXT [[UV10]](s8)
-    ; CHECK: [[ANYEXT11:%[0-9]+]]:_(s32) = G_ANYEXT [[UV11]](s8)
-    ; CHECK: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[ANYEXT8]](s32), [[ANYEXT9]](s32), [[ANYEXT10]](s32), [[ANYEXT11]](s32)
-    ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<4 x s32>)
+    ; CHECK: [[ANYEXT8:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[BUILD_VECTOR]](<4 x s8>)
+    ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[ANYEXT8]](<4 x s32>)
----------------
This is another case of violating AMDGPU's G_ANYEXT legalization rules by not scalarizing vector types

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir:2
 # RUN: not llc -mtriple=amdgcn-- -O0 -run-pass=legalizer -o - %s 2>&1 | FileCheck %s

-# CHECK: LLVM ERROR: unable to legalize instruction: %1:_(s1), %2:_(s1) = G_UNMERGE_VALUES %0:_(<2 x s1>) (in function: test_unmerge_v2s1)
----------------
Petar.Avramovic wrote:
> Originally: 
> We used to crash on:
>  `%1:_(s1), %2:_(s1) = G_UNMERGE_VALUES %0`
> 
> Now: 
> We noticed that this artifact has use 
>  `%0:_(<2 x s1>) = G_TRUNC %3:_(<2 x s32>)`, this is where we crash now.
This one is interesting. AMDGPU's legalization rules say to scalarize this (assuming I'm reading it right, this opcode has very complicated legalization rules), but how do you scalarize G_UNMERGE_VALUES. The normal answer is to emit G_UNMERGE_VALUES but that's not going to work here as we'll just infinitely loop replacing an instruction with the same instruction. I guess G_EXTRACT_VECTOR_ELT is probably the answer here but suppose we want to widenScalar that. We'd probably want to G_UNMERGE_VALUES (bring us back into the loop) and G_BUILD_VECTOR. I'll have to give this one some more thought

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir:239
 ...

 ---
----------------
Petar.Avramovic wrote:
> In test function test_unmerge_s1_s8:
>  We have a combine for G_MERGE + G_UNMERGE that did not happen before.
>  Remaining changes are variable name changes by the script, because of combine, 
>  first merge {MV} and first unmerge{UV} are now missing, and MV1 is now MV, MV2 is MV1 and so on. 
>  There was no crash here so I removed -global-isel-abort=0
I believe this is the test where I mentioned to Matt that the lack of G_TRUNC legalization rules caused it to stop processing before the combines could happen. It looks like this patch makes the combines happen a bit earlier and before the G_TRUNC stops further processing

================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir:514
     ; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s8>) = G_BUILD_VECTOR [[TRUNC2]](s8), [[TRUNC3]](s8), [[TRUNC4]](s8), [[TRUNC5]](s8)
-    ; CHECK: [[UV8:%[0-9]+]]:_(s8), [[UV9:%[0-9]+]]:_(s8), [[UV10:%[0-9]+]]:_(s8), [[UV11:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<4 x s8>)
-    ; CHECK: [[ANYEXT8:%[0-9]+]]:_(s32) = G_ANYEXT [[UV8]](s8)
-    ; CHECK: [[ANYEXT9:%[0-9]+]]:_(s32) = G_ANYEXT [[UV9]](s8)
-    ; CHECK: [[ANYEXT10:%[0-9]+]]:_(s32) = G_ANYEXT [[UV10]](s8)
-    ; CHECK: [[ANYEXT11:%[0-9]+]]:_(s32) = G_ANYEXT [[UV11]](s8)
-    ; CHECK: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[ANYEXT8]](s32), [[ANYEXT9]](s32), [[ANYEXT10]](s32), [[ANYEXT11]](s32)
-    ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<4 x s32>)
+    ; CHECK: [[ANYEXT8:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[BUILD_VECTOR]](<4 x s8>)
+    ; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[ANYEXT8]](<4 x s32>)
----------------
This one violates AMDGPU's G_ANYEXT rules by not scalarizing

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61787/new/

https://reviews.llvm.org/D61787