[PATCH] D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB
Guozhi Wei via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 15 12:04:52 PST 2021
Carrot added inline comments.
================
Comment at: llvm/test/CodeGen/X86/uadd_sat_vec.ll:1079
+; SSE-NEXT: por %xmm5, %xmm3
+; SSE-NEXT: por %xmm4, %xmm3
; SSE-NEXT: retq
----------------
pengfei wrote:
> The tests in this file seem all become bad. Do you have any idea to optimizate for them?
Let's take function v4i32 as an example
With this patch
```
liveins: $xmm0, $xmm1
%1:vr128 = COPY killed $xmm1
%0:vr128 = COPY killed $xmm0
%2:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
%3:vr128 = COPY %0:vr128
%3:vr128 = PXORrr %3:vr128(tied-def 0), %2:vr128
%4:vr128 = COPY killed %0:vr128
%4:vr128 = PADDDrr %4:vr128(tied-def 0), killed %1:vr128
%5:vr128 = COPY killed %2:vr128
%5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
%6:vr128 = COPY killed %3:vr128
%6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), killed %5:vr128
%7:vr128 = COPY killed %4:vr128
%7:vr128 = PORrr %7:vr128(tied-def 0), killed %6:vr128
$xmm0 = COPY killed %7:vr128
RET 0, killed $xmm0
```
The final extra movdqa comes from
```
3:vr128 = COPY %0:vr128
```
It can't be deleted because %0 is not kill at this point. All other COPY instructions can be coalesced and deleted.
Without this patch
```
liveins: $xmm0, $xmm1
%1:vr128 = COPY killed $xmm1
%0:vr128 = COPY killed $xmm0
%2:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
%3:vr128 = COPY %0:vr128
%3:vr128 = PXORrr %3:vr128(tied-def 0), %2:vr128
%4:vr128 = COPY killed %1:vr128
%4:vr128 = PADDDrr %4:vr128(tied-def 0), killed %0:vr128
%5:vr128 = COPY killed %2:vr128
%5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
%6:vr128 = COPY killed %3:vr128
%6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), killed %5:vr128
%7:vr128 = COPY killed %6:vr128
%7:vr128 = PORrr %7:vr128(tied-def 0), killed %4:vr128
$xmm0 = COPY killed %7:vr128
RET 0, killed $xmm0
```
It also contains
```
3:vr128 = COPY %0:vr128
```
and %0 is not killed, so %3 is not xmm0 and this COPY is a real MOV at this point. The SrcRegMap contains %7 -> %6 -> %3, so %7 can't be coalesced with xmm0, and the last COPY is also a real MOV.
It can seen clearly after coalescing
```
0B bb.0 (%ir-block.0):
liveins: $xmm0, $xmm1
16B %4:vr128 = COPY $xmm1
32B %0:vr128 = COPY $xmm0
48B %5:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
64B %6:vr128 = COPY %0:vr128
80B %6:vr128 = PXORrr %6:vr128(tied-def 0), %5:vr128
112B %4:vr128 = PADDDrr %4:vr128(tied-def 0), %0:vr128
144B %5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
176B %6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), %5:vr128
208B %6:vr128 = PORrr %6:vr128(tied-def 0), %4:vr128
224B $xmm0 = COPY %6:vr128
240B RET 0, $xmm0
```
So the two address conversion result is actually better with my new patch.
But after scheduling, the %0 in the COPY instruction becomes killed operand, so now %6 and %0 and xmm0 can be coalesced, and both COPY instructions can be deleted.
```
0B bb.0 (%ir-block.0):
liveins: $xmm0, $xmm1
16B %4:vr128 = COPY $xmm1
32B %0:vr128 = COPY $xmm0
48B %5:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
112B %4:vr128 = PADDDrr %4:vr128(tied-def 0), %0:vr128
116B %6:vr128 = COPY %0:vr128
120B %6:vr128 = PXORrr %6:vr128(tied-def 0), %5:vr128
144B %5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
176B %6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), %5:vr128
208B %6:vr128 = PORrr %6:vr128(tied-def 0), %4:vr128
224B $xmm0 = COPY %6:vr128
240B RET 0, $xmm0
```
So this is another pass order problem between scheduling/TwoAddressInstructionPass.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D113193/new/
https://reviews.llvm.org/D113193
More information about the llvm-commits
mailing list