[PATCH] D61787: [GlobalISel Legalizer] Improve artifact combiner

Petar Avramovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 10 08:38:10 PDT 2019


Petar.Avramovic marked 10 inline comments as done.
Petar.Avramovic added a comment.

@volkan @aemerson @arsenm Here is brief explanation of changes in tests.



================
Comment at: test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir:5
-# ERROR: unable to legalize instruction: %3:_(s64) = G_ANYEXT %1:_(s4) (in function: test_unmerge_s4)
+# RUN: llc -march=aarch64 -O0 -run-pass=legalizer %s -o -  | FileCheck %s
 
 ---
----------------
Originally: 
 We crash on attempt to legalize `%3:_(s64) = G_ANYEXT %1:_(s4)`.
 When we attempt to combine Artifact 
 `%3:_(s64) = G_ANYEXT %1:_(s4)` 
 no combine is available and this gets moved to InstList even though it had unresolved use 
 `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
 Also,
 `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
 is the next artifact that had nothing to combine with and is moved to InstList.
 In InstList 
 `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
 is legalized with widen scalar to s8 and in the process 
 `%1:_(s4) = G_TRUNC %9:_(s8)` is created.
 Next,  
 `%3:_(s64) = G_ANYEXT %1:_(s4)`
 is legalized like an instruction, and we crash.
 However, we could have combined this with 
 `%1:_(s4) = G_TRUNC %9:_(s8)`
 (if we were patient).

Now: 
 `%3:_(s64) = G_ANYEXT %1:_(s4)`
 has unresolved use and is moved to TryLaterArtifactList. 
 After 
 `%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)`
 is legalized (it did not have unresolved use)
 `%1:_(s4) = G_TRUNC %9:_(s8)` 
 is created. Later 
 `%3:_(s64) = G_ANYEXT %1:_(s4)` 
 performs combine. Legalization completes successfully.
 I removed global-isel-abort=0 since we don't crash any more. 
 `%9:_(s8), %10:_(s8) = G_UNMERGE_VALUES %8(s16)` 
will crash in instruction select. 
 Since this UNMERGE was declared as legal for pair `{s8, s16}` Artifact Combiner did good job.


================
Comment at: test/CodeGen/AArch64/GlobalISel/legalizer-combiner-zext-trunc-crash.mir:5
 # This test checks we don't crash when doing zext(trunc) legalizer combines.
 ---
 name:            zext_trunc_dead_inst_crash
----------------
Difference here comes from different order in which MachineInstructions are processed.

Originally: 
InstList has
```
  %20:_(s32) = G_ZEXT %4:_(s8)
  ...
  %31:_(s32) = G_ZEXT %6:_(s8)
  ...
  %6:_(s8) = G_TRUNC %35:_(s16)
  %4:_(s8) = G_TRUNC %36:_(s32)
  ...
```
 `%4:_(s8) = G_TRUNC %36:_(s32)`
 jumps to middle of InstList to 
 `%20:_(s32) = G_ZEXT %4:_(s8)` 
 and combines two of them.
 `%6:_(s8) = G_TRUNC %35:_(s16)` 
 will jump to 
 `%31:_(s32) = G_ZEXT %6:_(s8)`
 In summary: 
 `%20:_(s32) = G_ZEXT %4:_(s8)` is combined first
 `%31:_(s32) = G_ZEXT %6:_(s8)` is combined second

Now: 
InstList will not have G_TRUNCs and will avoid jumping around the list
 Content of InstList is: 
```
  %20:_(s32) = G_ZEXT %4:_(s8)
  ...
  %31:_(s32) = G_ZEXT %6:_(s8)
  ...
```
 `%31:_(s32) = G_ZEXT %6:_(s8)` 
 will be combined first, 
 `%20:_(s32) = G_ZEXT %4:_(s8)` 
 will be combined second.
 That is why script renamed some variables in diff.


================
Comment at: test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir:56
 
 ---
 name: test_sext_trunc_v2s32_to_v2s8_to_v2s16
----------------
 We start with following MIR:
```
  %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
  %1:_(<2 x s8>) = G_TRUNC %0
  %2:_(<2 x s16>) = G_SEXT %1
  $vgpr0 = COPY %2
```
 `%2:_(<2 x s16>) = G_SEXT %1` 
 is combined, as a result we are left with:
```
  %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
  %3:_(s16) = G_CONSTANT i16 8
  %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
  %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
  %6:_(<2 x s16>) = G_SHL %5:_, %4:_(<2 x s16>)
  %2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)
  $vgpr0 = COPY %2:_(<2 x s16>)
```
Originally: 
 Two Artifacts failed to combine and ended up in InstList:
```
  ...
  %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
  %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
```
 `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)` 
 was legal, 
 We have crash on `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
	
Now:
```
  %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
  %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
```
 are Auxiliary Artifacts and will have to wait for something to combine with them.
 Eventually `%11:_(s16), %12:_(s16) = G_UNMERGE_VALUES %4:_(<2 x s16>)`
 from legalization of `%2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)`
 will combine with `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)`

 Combine attempt of `%19:_(s16), %20:_(s16) = G_UNMERGE_VALUES %5:_(<2 x s16>)`
 will move `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)` to InstList
 Once we finish with artifacts, we crash on attempt to legalize Instruction 
 `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`


================
Comment at: test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-zext.mir:46
 
 ---
 name: test_zext_trunc_v2s32_to_v2s8_to_v2s16
----------------
In:
```
  %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
  %1:_(<2 x s8>) = G_TRUNC %0
  %2:_(<2 x s16>) = G_ZEXT %1
  $vgpr0 = COPY %2
```
 `%2:_(<2 x s16>) = G_ZEXT %1` 
 gets combined first into:
```
  %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
  %3:_(s16) = G_CONSTANT i16 255
  %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
  %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
  %2:_(<2 x s16>) = G_AND %5:_, %4:_
  $vgpr0 = COPY %2:_(<2 x s16>)
```
Originally: 
All new instructions (created in process of G_ZEXT combine) fail to combine and get moved to InstList:
```
  %3:_(s16) = G_CONSTANT i16 255
  %2:_(<2 x s16>) = G_AND %5:_, %4:_
  %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
  %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
```
 `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)`
 was legal and we crash on:
 `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`

Now: 
 `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`
 is left to be processed as the last one, since it cannot perform combine (Auxiliary Artifact)
 `%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)`
 G_BUILD_VECTOR is also Auxiliary Artifact. 
 InstList has
```
  %3:_(s16) = G_CONSTANT i16 255
  %2:_(<2 x s16>) = G_AND %5:_, %4:_
```
 they get legalized and we get:
```
  %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
  %6:_(s32) = G_CONSTANT i32 255
  %3:_(s16) = G_TRUNC %6:_(s32)
  %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
  %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
  %2:_(<2 x s16>) = G_AND %5:_, %4:_
  $vgpr0 = COPY %2:_(<2 x s16>)
```
 This is an interesting situation, there are no artifacts in ArtifactList or Instructions in InstList, so we are left with Auxiliary Artifacts. 
 None of real artifacts was able to combine with them and they are all moved to InstList:
```
   %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
   %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
   %3:_(s16) = G_TRUNC %6:_(s32) 
```
  `%3:_(s16) = G_TRUNC %6:_(s32)` 
 was legal. Crash on `%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)`

 Difference in diff comes from legalization of  `%3:_(s16) = G_CONSTANT i16 255`



================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir:444
 
 ---
 name: extract_s8_v4s8_offset0
----------------
In test function extract_s8_v4s8_offset0
```
  %3:_(<4 x s32>) = G_IMPLICIT_DEF
  %0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
  %1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
  %2:_(s32) = G_ANYEXT %1:_(s8)
  $vgpr0 = COPY %2:_(s32)
 ```
Originally: 
```
  %1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0 was legalized first into: 
  %3:_(<4 x s32>) = G_IMPLICIT_DEF
  %0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
  %4:_(<4 x s16>) = G_ANYEXT %0:_(<4 x s8>)
  %5:_(s16) = G_EXTRACT %4:_(<4 x s16>), 0
  %1:_(s8) = G_TRUNC %5:_(s16)
  %2:_(s32) = G_ANYEXT %1:_(s8)
  $vgpr0 = COPY %2:_(s32)
```
 then  
 `%2:_(s32) = G_ANYEXT %1:_(s8)` 
 is legalized (!), it turned out to be legal
 but it should have been combined with 
 `%1:_(s8) = G_TRUNC %5:_(s16)`
 
 `%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)`
 is next to legalize and we crash.

Now:
```
 %1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
 %2:_(s32) = G_ANYEXT %1:_(s8)
```
 are chained to 
 `%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)`
 G_TRUNC gets legalized first (and we crash)

 same story for test functions:
 extract_s8_v4s8_offset8
 extract_s8_v4s8_offset16
 extract_s8_v4s8_offset24
 extract_s8_v3s8_offset16
 extract_s8_v5s1_offset4


================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir:911
 ---
 name: extract_v2s16_v5s16_offset0
 body: |
----------------
in test function extract_v2s16_v5s16_offset0
```
  %3:_(<6 x s32>) = G_IMPLICIT_DEF
  %2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
  %0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
  %1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
  $vgpr0 = COPY %1:_(<2 x s16>)
```
Originally:
 All Artifacts:
``` 
  %1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
  %0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
  %2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
```
 fail to combine and are moved to InstList that now contains (processed bottom up):
```
  %2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
  %0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
  %1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
```
 `%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0`
 is legalized with moreElements, 
 `%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0`
 is legal, and we crash on 
 `%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)`

Now: 
 Since we have chained artifacts 
 `%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)`
 is legalized first and we crash. Remaining two wait for this legalization to finish.


================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir:316
 ---
 name: test_implicit_def_v3s1
 body: |
----------------
In test function test_implicit_def_v3s1:
```
  %3:_(<4 x s32>) = G_IMPLICIT_DEF
  %2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>)
  %0:_(<3 x s1>) = G_EXTRACT %2:_(<4 x s1>), 0
  %1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)
  $vgpr0_vgpr1_vgpr2 = COPY %1:_(<3 x s32>)
```
 We used to crash on   `%0:_(<3 x s1>) = G_EXTRACT %2:_(<4 x s1>), 0`
 but since this artifact has unresolved artifact use `%2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>)` this is where we crash now. 
 Also we did not try to legalize   `%1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)` since it has unresolved artifact use, 
 difference in diff comes from legalization of `%1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)`

 It is the same story in test_implicit_def_v3s8, only type is different (s8 instead of s1).



================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir:2
 # RUN: not llc -mtriple=amdgcn-- -O0 -run-pass=legalizer -o - %s 2>&1 | FileCheck %s
 
-# CHECK: LLVM ERROR: unable to legalize instruction: %1:_(s1), %2:_(s1) = G_UNMERGE_VALUES %0:_(<2 x s1>) (in function: test_unmerge_v2s1)
----------------
Originally: 
We used to crash on:
 `%1:_(s1), %2:_(s1) = G_UNMERGE_VALUES %0`

Now: 
We noticed that this artifact has use 
 `%0:_(<2 x s1>) = G_TRUNC %3:_(<2 x s32>)`, this is where we crash now.


================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir:239
 ...
 
 ---
----------------
In test function test_unmerge_s1_s8:
 We have a combine for G_MERGE + G_UNMERGE that did not happen before.
 Remaining changes are variable name changes by the script, because of combine, 
 first merge {MV} and first unmerge{UV} are now missing, and MV1 is now MV, MV2 is MV1 and so on. 
 There was no crash here so I removed -global-isel-abort=0


================
Comment at: test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir:484
 ...
 
 ---
----------------
I will explain change in legalize-xor.mir, it is the same thing in:
I will explain change in legalize-and.mir and
I will explain change in legalize-or.mir, 
```
  %29:_(<4 x s32>) = G_IMPLICIT_DEF
  %0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
  %28:_(<4 x s32>) = G_IMPLICIT_DEF
  %1:_(<4 x s8>) = G_TRUNC %28:_(<4 x s32>)
  %4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
  %8:_(s8), %9:_(s8), %10:_(s8), %11:_(s8) = G_UNMERGE_VALUES %1:_(<4 x s8>)
  %25:_(s32) = G_ANYEXT %4:_(s8)
  %26:_(s32) = G_ANYEXT %8:_(s8)
  %27:_(s32) = G_XOR %25:_, %26:_
  %12:_(s8) = G_TRUNC %27:_(s32)
  %22:_(s32) = G_ANYEXT %5:_(s8)
  %23:_(s32) = G_ANYEXT %9:_(s8)
  %24:_(s32) = G_XOR %22:_, %23:_
  %13:_(s8) = G_TRUNC %24:_(s32)
  %19:_(s32) = G_ANYEXT %6:_(s8)
  %20:_(s32) = G_ANYEXT %10:_(s8)
  %21:_(s32) = G_XOR %19:_, %20:_
  %14:_(s8) = G_TRUNC %21:_(s32)
  %16:_(s32) = G_ANYEXT %7:_(s8)
  %17:_(s32) = G_ANYEXT %11:_(s8)
  %18:_(s32) = G_XOR %16:_, %17:_
  %15:_(s8) = G_TRUNC %18:_(s32)
  %2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)
  %3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
  $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %3:_(<4 x s32>)
``` 
 Artifact List had in one moment (processed bottom up):
```
  %3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
  %4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
  %8:_(s8), %9:_(s8), %10:_(s8), %11:_(s8) = G_UNMERGE_VALUES %1:_(<4 x s8>)
  ...
```
Originally:
All artifacts fail to combine, and are moved to InstList:
```
  ...
  %4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
  %3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
```
 `%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)`
 is legalized with FewerElements.
 We crash on
 `%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)`

Now:
 Artifacts failed to combine with available Auxiliary Artifacts: 
```
  %1:_(<4 x s8>) = G_TRUNC %28:_(<4 x s32>)
  %0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
  %2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)
```
 Because of that Auxiliary Artifacts are now in InstList.

 `%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)`
 is legal
 `%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>) `
 is not legal and we crash.
 Difference in diff comes from legalization of `%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)` because it failed to combine and was moved to InstList,
 new algorithm crashed before attempt to legalize this instruction.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61787/new/

https://reviews.llvm.org/D61787





More information about the llvm-commits mailing list