[PATCH] D34769: [X86] X86::CMOV to Branch heuristic based optimization

Sun Jul 9 13:57:57 PDT 2017

zvi added inline comments.

================
Comment at: lib/Target/X86/X86CmovConversion.cpp:343
+          auto &RDM = RegDefMaps[TargetRegisterInfo::isVirtualRegister(Reg)];
+          MachineInstr *DefMI = RDM.count(Reg) ? RDM[Reg] : nullptr;
+          OperandToDefMap[&MO] = DefMI;
----------------
Please avoid using operator[] for lookups - use DenseMap::lookup() instead.
And maybe mapping operands to null's is just inefficient? The non-existence of the mapping can be also used for choosing depth = 0?
So this code can be instead something like:

  if (MachineInstr *DefMI = RDM.lookup(Reg)) {
    OperandToDefMap[&MO] = DefMI;
    MIDepth = std::max(MIDepth, DepthMap.lookup(DefMI).Depth);
    if (!IsCMOV)
      MIDepthOpt = std::max(MIDepthOpt, DepthMap.lookup(DefMI).OptDepth);
   }

This also applies to more uses of operator[] below. If I'm not mistaken, only these two statements should remain using opertor[]:

  RegDefMaps[TargetRegisterInfo::isVirtualRegister(Reg)][Reg] = &MI;
  DepthMap[&MI] = {MIDepth += Latency, MIDepthOpt += Latency};

================
Comment at: lib/Target/X86/X86CmovConversion.cpp:131
+  TII = STI.getInstrInfo();
+  TSchedModel.init(STI.getSchedModel(), &STI, TII);
+
----------------
aaboud wrote:
> zvi wrote:
> > Could we save the overhead of initializing the resource tables from one visited function to another? If two visited functions share the same subtarget, could we reuse the same schedule model configuration?
> MachineCombiner, MachineLICM, and IfConverter initialize it in runOnMachineFunction, for every machine function.
> Do you think that this function is too expensive?
> Either way, it is out-of-scope for this patch.
I don't know what is the potential saving :)
I'm ok with keeping it as-is.

================
Comment at: lib/Target/X86/X86CmovConversion.cpp:211
+      // Check if we found a X86::CMOVrr instruction.
+      if (CC != X86::COND_INVALID && !I.mayLoad()) {
+        if (Group.empty()) {
----------------
aaboud wrote:
> zvi wrote:
> > If i did not miss it, can you please add a comment here or update the comments on the top why we skip cmovs with memory operands? My naive thought is that if we can speculate the load in the cmov, then we should also be able to put the load under a branch.
> The reason is implementation effort.
> There is a "TODO" at line 191, so this can be added in future patch.
> I do not see a need to add a comment, the "TODO" should be enough, do not you think so?
I agree. I missed the TODO

================
Comment at: test/CodeGen/X86/x86-cmov-converter.ll:321
+
+attributes #0 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { norecurse nounwind readonly uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
----------------
Can these attributes be dropped or minimized if they don't add anything?

https://reviews.llvm.org/D34769