[PATCH] D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
Guozhi Wei via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 3 18:30:24 PDT 2021
Carrot added inline comments.
================
Comment at: llvm/test/CodeGen/X86/vec_smulo.ll:118
+; SSE41-NEXT: pcmpeqd %xmm2, %xmm1
+; SSE41-NEXT: pcmpeqd %xmm0, %xmm0
; SSE41-NEXT: pxor %xmm1, %xmm0
----------------
xbolva00 wrote:
> https://reviews.llvm.org/D52109#inline-545876
With this patch, TwoAddressInstructionPass generates
```
liveins: $xmm0, $xmm1, $rdi
%2:gr64 = COPY killed $rdi
%1:vr128 = COPY killed $xmm1
%0:vr128 = COPY killed $xmm0
%3:vr128 = PSHUFDri %1:vr128, -11
%4:vr128 = PSHUFDri %0:vr128, -11
%5:vr128 = COPY killed %4:vr128
%5:vr128 = PMULDQrr %5:vr128(tied-def 0), killed %3:vr128
%6:vr128 = COPY %0:vr128
%6:vr128 = PMULDQrr %6:vr128(tied-def 0), %1:vr128
%7:vr128 = PSHUFDri killed %6:vr128, -11
%8:vr128 = COPY killed %7:vr128
%8:vr128 = PBLENDWrri %8:vr128(tied-def 0), killed %5:vr128, -52
%9:vr128 = COPY killed %0:vr128
%9:vr128 = PMULLDrr %9:vr128(tied-def 0), killed %1:vr128
%10:vr128 = COPY %9:vr128
%10:vr128 = PSRADri %10:vr128(tied-def 0), 31
%11:vr128 = COPY killed %10:vr128
%11:vr128 = PCMPEQDrr %11:vr128(tied-def 0), killed %8:vr128
%12:vr128 = V_SETALLONES
%13:vr128 = COPY killed %12:vr128
%13:vr128 = PXORrr %13:vr128(tied-def 0), killed %11:vr128
MOVPQI2QImr killed %2:gr64, 1, $noreg, 0, $noreg, killed %9:vr128 :: (store (s64) into %ir.p2)
$xmm0 = COPY killed %13:vr128
RET 0, killed $xmm0
```
Without this patch, TwoAddressInstructionPass generates:
```
liveins: $xmm0, $xmm1, $rdi
%2:gr64 = COPY killed $rdi
%1:vr128 = COPY killed $xmm1
%0:vr128 = COPY killed $xmm0
%3:vr128 = PSHUFDri %1:vr128, -11
%4:vr128 = PSHUFDri %0:vr128, -11
%5:vr128 = COPY killed %4:vr128
%5:vr128 = PMULDQrr %5:vr128(tied-def 0), killed %3:vr128
%6:vr128 = COPY %0:vr128
%6:vr128 = PMULDQrr %6:vr128(tied-def 0), %1:vr128
%7:vr128 = PSHUFDri killed %6:vr128, -11
%8:vr128 = COPY killed %7:vr128
%8:vr128 = PBLENDWrri %8:vr128(tied-def 0), killed %5:vr128, -52
%9:vr128 = COPY killed %0:vr128
%9:vr128 = PMULLDrr %9:vr128(tied-def 0), killed %1:vr128
%10:vr128 = COPY %9:vr128
%10:vr128 = PSRADri %10:vr128(tied-def 0), 31
%11:vr128 = COPY killed %10:vr128
%11:vr128 = PCMPEQDrr %11:vr128(tied-def 0), killed %8:vr128
%12:vr128 = V_SETALLONES
%13:vr128 = COPY killed %11:vr128
%13:vr128 = PXORrr %13:vr128(tied-def 0), killed %12:vr128
MOVPQI2QImr killed %2:gr64, 1, $noreg, 0, $noreg, killed %9:vr128 :: (store (s64) into %ir.p2)
$xmm0 = COPY killed %13:vr128
RET 0, killed $xmm0
```
The only difference is the PXOR instruction and related COPY. The operands order(commuting decision) of PXOR is actually impacted the mapping of SrcRegMap[%10] = %9. In this instruction sequence, the old result is worse. Here we have SrcRegMap[%9] = xmm0, it lives until the memory store, so %10 must be assigned to a different physical register, and the COPY is a real one. And later %10 must be copied back to xmm0. In the new result, the %9 -> %10 is also a real copy, but the last %13 -> xmm0 COPY can be removed because %13 can be assigned to xmm0.
What makes the old result generate better final instructions? The answer is instruction scheduling. The memory store is moved before the %9 -> %10 copy, so in the COPY %9 is the last use, can be coalesced with %10 and assigned to xmm0, then both COPY instructions are removed. So the better old result is just lucky.
It implies a pass order problem here, different operands are killed in different instruction sequences, it impacts the optimal commuting decisions.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108731/new/
https://reviews.llvm.org/D108731
More information about the llvm-commits
mailing list