[PATCH] D28915: [ExecutionDepsFix] Optimize instruction insertion

Fri Feb 17 04:36:24 PST 2017

myatsina added a comment.

I'm sorry, I'm getting too lost in this review with the interaction between all the changes and especially trying to check if all the corner cases were tested sufficiently.
I want to make sure we implement each of your changes the best possible way.

Let's do the separation to the 3 incremental reviews. I think the order should be:
Trying to add only true dependency to the undef regs
isDependencyBreak support (which is the "xor fusion" if I understand it correctly)
Register re-picking.
(and the call affect on clearance calculations can follow)

I'm emphasizing the fact that the "xor fusion" and "register re-picking" should have several unit tests dedicated only to these features, testing both simple and comlex cases.

Also, I think a cleaner implementation can be done for the register re-picking, but I need to stew on it a bit more until I have a good concrete suggestion.

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:499
 /// is truly dependent on, or use a register with clearance higher than Pref.
 void ExeDepsFix::pickBestRegisterForUndef(MachineInstr *MI, unsigned OpIdx,
+                                          unsigned Pref, bool &TrueDependency) {
----------------
Why not make it return true/false (true dependency) instead of void?

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:584
+      bool TrueDependency = false;
+      pickBestRegisterForUndef(MI, OpNum, Pref, TrueDependency);
+      // Don't bother adding true dependencies to UndefReads. All we'd find out
----------------
And then here you will have :
bool TrueDependency = pickBest...()

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:628
+      // issues, reset the counter.
+      LiveRegs[rx].Def = -(1 << 20);
+    }
----------------
This feature should be heavily tested.
This is the "xor fusion" logic, right?

We're adding xors for dependency break in a late stage of the program, after clearance calculation of everyone was already done, including loops (i.e. you're previous patch that changes the calculation algorithm that you've comitted).
Now you are changing the clearance for the xor's we've added, if I understand this correctly.
So somehow you need to update your old clearance calculation, no?
Or is this code aimed to only catch "xor"s that were in the original code.

I would like to see a lot of tests for these feature (without the affect of the other features):
How does it affect simple one BB code?
What is the affect on simple loops with one xor to this register and no dependency?
What is the affect on complex loops that no xors and assigns to the same register:

loop:

cvt
xor xmm0
cvt
xmm0 = 
cvt
cvt
xor xmm0
cvt
xmm0 = 

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:656
+    if (LiveRegSet.contains(Reg))
+      ChoosableRegs.erase(Reg);
+    else if (add) {
----------------
myatsina wrote:
> formatting issue - please surround the if with {} and thus be consistent with the style of the other conditions in this if- if - else if-else
There's something that doesn't make sense for me.
Sometimes we're adding registers to ChoosableRegs and sometimes we're erasing registers from it.
Why did we add something "bad" in the first place?
It seem like this function is trying to mix 2 different (but close) functionalities, and I don't think it's right.

================
Comment at: lib/CodeGen/ExecutionDepsFix.cpp:681
+  // Suppose we have some instruction `vrandom %in, %out` and the following code
+  //    vrandom %xmm0<undef>, %xmm0<def>
+  //    vrandom %xmm1<undef>, %xmm1<def>
----------------
Do you mean vroundss? In this case it's a bad example. It uses 3 xmms and you should hide the false dependency under the true dependency of the source xmm.

I think you should put here the full instruction with all the operands (vcvtsi2sd will probably be good enough for your point).

================
Comment at: lib/Target/X86/X86InstrInfo.cpp:7508
+    if (!MO.isReg() || (Reg != 0 && MO.getReg() != Reg))
+      return false;
+    Reg = MO.getReg();
----------------
myatsina wrote:
> I think this whole section of could would be a bit clearer if you write it in this spirit:
> 
> 
> for ()
Ignore the comment above :) I forgot to delete it.

================
Comment at: lib/Target/X86/X86InstrInfo.cpp:7511
+  }
+  if (OutReg)
+    *OutReg = Reg;
----------------
loladiro wrote:
> myatsina wrote:
> > I see that you write a lot of functions so that they could return 2 arguments and that you've used the same "trick" in other functions as well.
> > I don't thing it's a good approach in general. This need of a function to return 2 values generally indicates that you need to extract the common logic somehow or define a new type or do some other form of refactoring refactoring.
> > In this case I think "isDependencyBreak" should only return true/false (as the name indicates). If the module asking wants to know which register it is then it can find the dest reg on its own, or we can provide the module a "getDependencyBreakReg". In general it will behave similar to the already existing "isReg()" and "getReg()".
> Fair enough (maybe I've been writing too much C code). In general I dislike splitting such logic across two functions since it needs to be kept in sync (e.g. when adding additional dependency breaking instructions to be recognized). How about adding an `getDependencyBreakReg` that just returns 0 or an llvm::Optional, if it's not a dependency breaking instruction. 
The llvm:optional seems very cool :) I wasn't familiar with it :) I think it's the best for us. I like it better than reg == 0 (I don't think there's a 0 is "non reg" for all targets). 

Anyway, here are some additional options I thought of as well:

1. implement getDepRegBreak and isDepRegBreak when one uses the other
isDepsRegBreak() {
return getRegBreak() != X86::NON_REG
}
This way you don't have the duplication of the logic that you've talked about.
2. getDependencyBreak will return operand number (of dest reg) or -1 if it's not breaking

================
Comment at: test/CodeGen/X86/break-false-dep.ll:367
+  ret double %fadd3
+}
----------------
loladiro wrote:
> myatsina wrote:
> > where are the tests for all the other optimizations you've added?
> > The "isBreakingDependency", "register re-picking", etc?
> This is the test for register re-picking.
I think you need added additional more complex tests fore the re-picking.
What happens if we're in a loop? or in a loop with a carry on dependency from the previous iteration?
What happens if you have instructions between the converts that make some of you original potential registers "alive" and then you can no longer use them?

https://reviews.llvm.org/D28915