[PATCH] D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation

Guozhi Wei via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 3 18:30:24 PDT 2021


Carrot added inline comments.


================
Comment at: llvm/test/CodeGen/X86/vec_smulo.ll:118
+; SSE41-NEXT:    pcmpeqd %xmm2, %xmm1
+; SSE41-NEXT:    pcmpeqd %xmm0, %xmm0
 ; SSE41-NEXT:    pxor %xmm1, %xmm0
----------------
xbolva00 wrote:
> https://reviews.llvm.org/D52109#inline-545876
With this patch, TwoAddressInstructionPass generates
```
  liveins: $xmm0, $xmm1, $rdi
  %2:gr64 = COPY killed $rdi
  %1:vr128 = COPY killed $xmm1
  %0:vr128 = COPY killed $xmm0
  %3:vr128 = PSHUFDri %1:vr128, -11
  %4:vr128 = PSHUFDri %0:vr128, -11
  %5:vr128 = COPY killed %4:vr128
  %5:vr128 = PMULDQrr %5:vr128(tied-def 0), killed %3:vr128
  %6:vr128 = COPY %0:vr128
  %6:vr128 = PMULDQrr %6:vr128(tied-def 0), %1:vr128
  %7:vr128 = PSHUFDri killed %6:vr128, -11
  %8:vr128 = COPY killed %7:vr128
  %8:vr128 = PBLENDWrri %8:vr128(tied-def 0), killed %5:vr128, -52
  %9:vr128 = COPY killed %0:vr128
  %9:vr128 = PMULLDrr %9:vr128(tied-def 0), killed %1:vr128
  %10:vr128 = COPY %9:vr128
  %10:vr128 = PSRADri %10:vr128(tied-def 0), 31
  %11:vr128 = COPY killed %10:vr128
  %11:vr128 = PCMPEQDrr %11:vr128(tied-def 0), killed %8:vr128
  %12:vr128 = V_SETALLONES
  %13:vr128 = COPY killed %12:vr128
  %13:vr128 = PXORrr %13:vr128(tied-def 0), killed %11:vr128
  MOVPQI2QImr killed %2:gr64, 1, $noreg, 0, $noreg, killed %9:vr128 :: (store (s64) into %ir.p2)
  $xmm0 = COPY killed %13:vr128
  RET 0, killed $xmm0
```

Without this patch, TwoAddressInstructionPass generates:
```
  liveins: $xmm0, $xmm1, $rdi 
  %2:gr64 = COPY killed $rdi 
  %1:vr128 = COPY killed $xmm1
  %0:vr128 = COPY killed $xmm0
  %3:vr128 = PSHUFDri %1:vr128, -11
  %4:vr128 = PSHUFDri %0:vr128, -11
  %5:vr128 = COPY killed %4:vr128
  %5:vr128 = PMULDQrr %5:vr128(tied-def 0), killed %3:vr128
  %6:vr128 = COPY %0:vr128
  %6:vr128 = PMULDQrr %6:vr128(tied-def 0), %1:vr128
  %7:vr128 = PSHUFDri killed %6:vr128, -11
  %8:vr128 = COPY killed %7:vr128
  %8:vr128 = PBLENDWrri %8:vr128(tied-def 0), killed %5:vr128, -52
  %9:vr128 = COPY killed %0:vr128
  %9:vr128 = PMULLDrr %9:vr128(tied-def 0), killed %1:vr128
  %10:vr128 = COPY %9:vr128
  %10:vr128 = PSRADri %10:vr128(tied-def 0), 31
  %11:vr128 = COPY killed %10:vr128
  %11:vr128 = PCMPEQDrr %11:vr128(tied-def 0), killed %8:vr128
  %12:vr128 = V_SETALLONES
  %13:vr128 = COPY killed %11:vr128
  %13:vr128 = PXORrr %13:vr128(tied-def 0), killed %12:vr128
  MOVPQI2QImr killed %2:gr64, 1, $noreg, 0, $noreg, killed %9:vr128 :: (store (s64) into %ir.p2)
  $xmm0 = COPY killed %13:vr128
  RET 0, killed $xmm0
```

The only difference is the PXOR instruction and related COPY. The operands order(commuting decision) of PXOR is actually impacted the mapping of SrcRegMap[%10] = %9. In this instruction sequence, the old result is worse. Here we have SrcRegMap[%9] = xmm0, it lives until the memory store, so %10 must be assigned to a different physical register, and the COPY is a real one. And later %10 must be copied back to xmm0. In the new result, the %9 -> %10 is also a real copy, but the last %13 -> xmm0 COPY can be removed because %13 can be assigned to xmm0.

What makes the old result generate better final instructions? The answer is instruction scheduling. The memory store is moved before the %9 -> %10 copy, so in the COPY %9 is the last use, can be coalesced with %10 and assigned to xmm0, then both COPY instructions are removed.  So the better old result is just lucky. 

It implies a pass order problem here, different operands are killed in different instruction sequences, it impacts the optimal commuting decisions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108731/new/

https://reviews.llvm.org/D108731



More information about the llvm-commits mailing list