[PATCH] D22466: Avoid false dependencies of undef machine operands + fix bug in clearance calculation

Marina Yatsina via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 21 09:48:45 PDT 2016


myatsina marked 2 inline comments as done.

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:486
@@ +485,3 @@
+
+  // Update only undef operands that are mapped to one register.
+  if (AliasMap[OriginalReg].size() != 1)
----------------
mkuper wrote:
> I'm a bit confused about this condition. Could you explain?
Commit 224447 by Matthias Braun added some ARM specific fix that allows one register to be mapped to several registers from the register class (for ARM a wider register like Q0 can be mapped to several consecutive D registers).
Before his commit the Q register was mapped to the last D register of the sequence. 
I want to avoid these cases where one register can be mapped to several registers because in these cases I cannot replace the original register in the instruction.

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:503
@@ +502,3 @@
+
+  // Go over all registers in the register class and find the register with the
+  // max clearance.
----------------
mkuper wrote:
> This happens only when the real input operand is in memory, right?
> If so, is this better than undoing the load folding, and using the loaded register as undef? Or does that depend on the clearance / being able to xor the register? 
> 
> (This patch looks like a a net improvement regardless, I'm wondering about a possible TODO/follow-up patch)
This catches cases like this:
vcvtsi2ssl %eax, %xmm0, %xmm1
This instruction does not have a form with xmm as a source.
"CVTDQ2PS %xmm0, %xmm1" is the packed version of the instruction.
Are you suggesting that if I see something like:
vpextrl $1, %xmm0, %eax
vcvtsi2ssl %eax, %xmm1, %xmm1
I should replace it with
CVTDQ2PS %xmm0, %xmm1 ?

BTW, I think one possible (and probably rare) follow up is handling cases where the register with the highest clearance is alive (and therefore we cannot add an xor), while we have another register, with lower clearance, that is not alive. In this case we should make a smarter decision regarding which one to choose.

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:511
@@ +510,3 @@
+      continue;
+    MaxClearance = Clearance;
+    MaxClearanceReg = RC->getRegister(rx);
----------------
mkuper wrote:
> Can we break if we've passed the "minimum required clearance" (that is, Pref)?
I've measured both options, and choosing a register with max clearance gives slightly better results than choosing a register with clearance > Pref.

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:523
@@ -522,3 +567,1 @@
       continue;
-    if (MO.isImplicit())
-      break;
----------------
mkuper wrote:
> myatsina wrote:
> > This is a fix to bug - the clearance calculation did not take into account registers defined as outputs or clobbers in inline assembly machine instructions because these register defs are implicit.
> Any reason not to commit this separately?
https://reviews.llvm.org/D22580



Repository:
  rL LLVM

https://reviews.llvm.org/D22466





More information about the llvm-commits mailing list