[PATCH] D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.

Thu Jul 30 01:32:21 PDT 2015

v_klochkov added a comment.

`Elena, Matthias, Quentin,

Thank you for the code-review and comments.
The interface of the functions findCommutedOpIndices() and areOpsCommutable() 
caused the biggest questions.

Combining Elena's idea with some of Quentin's comments regarding 
the magic consts can give this first alternative:

  const unsigned CommuteAnyOperandIndex = ~0U;
  bool findCommutedOpIndices(MachineInstr *MI, 
                             unsigned &OpIdx1,
                             unsigned &OpIdx2,
                             unsigned MustCommuteIdx1 = CommuteAnyOperandIndex,
                             unsigned MustCommuteIdx2 = CommuteAnyOperandIndex);

The second alternative is to have a method returning all pairs 
of commutable operands:

  bool findCommutedOpIndices(MachineInstr *MI, 
                             SmallVectorImpl<_some_type_containing_a_pair_of_unsigned_indices_>);

(I am not sure if I should define some new class/struct or reuse some existing type for <pair of indices>,
 but that ok, let this question wait until we can decide what alternative seems better).

I really like Elena's idea; Please let me explain why. 
The reason (a) below is the most important one.

  a) In many cases we only need to know if some known operand is commutable with others.
     For example, RegisterCoalescer.cpp wants to know if 'UseOpIdx' can be swapped with something else;
     TwoAddressInstructionPass.cpp wants to know if 'BaseOpIdx' is commutable with something else;
     In some other cases we may even know the operands that need to be commuted and do not want
     to know about other operands commutativity.
     
     So, the first alternative provides a flexible instrument that helps to know just what we need to know,
     while the second alternative makes findCommutaleOpIndices() to collect information that often is
     not needed later.
     
     For example, I want to know if 2nd and 3rd operands of FMA are commutable.
     The 1st alternative just gives me the answer:
       if (findCommutedOpIndices(MI, Idx1, Idx2, 2, 3)) {}
     The 2nd alternative would probably return 3 pairs: <1,2>, <1,3>, <2,3>.
     I would not only need to find the desired <2,3> in the set of 3 pairs, but I would
     also ask findCommutedOpIndices() to do potentially expensive analysis regarding commutativity of
     the 1st operand (if FMA is scalar, then 1st operand is commutable only if users use only the lowest 
     element of XMM).
     
  b) The 1st alternative helps to make the change-set a bit more compact than it is now,
     while the 2nd would require additional changes that would additionally complicate the places 
     where findCOmmutedOpIndices() is called now.
  
  c) It would be very simple to remove the method areOpsCommutable() if it seems redundant.
     The calls of that method could be easily replaced with the calls of findCommutedOpIndices().
     For example:
       if (areOpsCommutable(MI, 1, 2)) {}
     ->
       if (findCommutedOpIndices(MI, Idx1, Idx2, 1, 2)) {}


Please let me know if you agree with the reasoning and if you are OK with Elena's idea.

Thanks,
Slava`


================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:2931
@@ +2930,3 @@
+///
+/// Do not call this method for a non-commutable instruction.
+/// Even though the instruction is commutable, the method may still
----------------
delena wrote:
> I suggest to change from "Do not call" to "If you call"
This "Do not call" comment was moved to here from the old version of include/llvm/Target/TargetInstrInfo.h
The TargetInstrInfo::commuteInstruction() has assert verifying that MI is commutable.
After taking that assert into account this comment seems quite precise.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:2940
@@ -2930,1 +2939,3 @@
+
+
   switch (MI->getOpcode()) {
----------------
delena wrote:
> Please remove one empty line.
Ok, removed it, the updated version of the change-set will have this fix.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:3169
@@ +3168,3 @@
+bool X86InstrInfo::isFMA3(unsigned Opcode) const {
+  switch (Opcode) {
+    case X86::VFMADDSDr132r:     case X86::VFMADDSDr132m:
----------------
delena wrote:
> Looks huge. I'm not sure but may be these enums in ABC order and we can compare against first and last?
> Or auto-generate something?
Unfortunately it is huge, I agree.
Comparing against the first and last or having some assumptions about how and in which order the opcodes were defined 
seems a very risky approach causing unexpected effects/errors in future. I am pretty sure that we should not go this way.

I considered the idea of having a special bit for FMAs (something similar to the fields defined in llvm/include/llvm/Target/Target.td: 
isReturn,isBitcast,etc). Adding isFMA3 to there would be inappropriate as FMA3 is meaningful only for X86, while all other 32 1-bit fields defined there are quite generic and usable for all targets. Also, adding even 1 bit to there will increase the size of IR.
Unfortunately, I could not find anything similar but for X86 platform only.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:3378
@@ +3377,3 @@
+///
+unsigned X86InstrInfo::getFMA3OpcodeToCommuteOperands(MachineInstr *MI,
+                                                      unsigned SrcOpIdx1,
----------------
delena wrote:
> This method may be static. Right?
Yes, this method could be static and be similar to existing methods like "static bool isFrameLoadOpcode(int Opcode)", etc.

The reason why I passed 'MachineInstruction' argument instead of 'Opcode' to this function and why this method is not static now,
is that I wanted to reserve the opportunity to handle SCALAR FMAs and their 1st operand more optimistically later (when additional analysis of scalar FMA users would be implemented); please see the FIXME comment at the line 3487.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:3387
@@ +3386,3 @@
+  //
+  static const struct {
+    int Opc1;
----------------
delena wrote:
> I suggest to separate scalar from vector. You handle them separately, right?
> I also think that you can put all FMA tables in a separate header file.
There is one loop handling all vector and scalar FMAs below, I did not handle them separately.
The 'IsScalar' field was needed only to handle the 1st operand with extra carefulness as commuting 1st operand of scalar FMA requires some additional analysis.

Regarding the separating FMA tables into a separate header file...
    Separating it to a header file makes sense only when it would be used by something else, i.e. not only by one method. Otherwise, it is more convenient to have this array definition closer to the function/method using that table.
    Also, In my opinion the function local/static array OpcodeAlts is written using the same style that was used in several other places in this file (Please see the definition of MemoryFoldTable2Addr, MemoryFoldTable0, etc). Moving all similar static arrays of structures to a header file deserves a special/separate change-set.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:3546
@@ +3545,3 @@
+///
+bool X86InstrInfo::findCommutedOpIndices(MachineInstr *MI,
+                                         unsigned &SrcOpIdx1,
----------------
delena wrote:
> I think this interface is inconvenient. I suggest to separate input and output. You can put ~0U as default value of input.
Special thank you for this comment! Separating INPUT and OUTPUT arguments seems very reasonable. I like this idea.

In my opinion both approaches have right to live though. 
Before adding 2 additional arguments to findCommutedOpIndices() and fixing other places I would wait for more comments from reviewers.


http://reviews.llvm.org/D11370