[PATCH] D32684: [X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.

Mon May 8 05:18:04 PDT 2017

aymanmus added inline comments.

================
Comment at: test/CodeGen/X86/stack-folding-fp-avx1.ll:1654

-define double @stack_fold_sqrtsd(double %a0) {
-  ;CHECK-LABEL: stack_fold_sqrtsd
----------------
filcab wrote:
> RKSimon wrote:
> > aymanmus wrote:
> > > RKSimon wrote:
> > > > aymanmus wrote:
> > > > > RKSimon wrote:
> > > > > > Please don't delete tests, especially if they've regressed - show the new codegen and add a todo comment
> > > > > The test checks that specific instructions are memory folded.
> > > > > In the removed test cases, the instructions were removed from the memory folding tables so testing that these instructions are NOT folded in this test, is not the best idea.
> > > > OK - so just leave TODO comments for the 4 cases:
> > > > ```
> > > > ; TODO stack_fold_sqrtsd
> > > > ; TODO stack_fold_sqrtsd_int
> > > > ; TODO stack_fold_sqrtss
> > > > ; TODO stack_fold_sqrtss_int
> > > > ```
> > > > 
> > > > 
> > > > 
> > > Sorry if i'm being too picky about this, but I cannot understand why to add TODOs for these test cases. what should be done with them later on? what would it indicate?
> > The SSE scalar math memory folding has always been tricky - I had wondered if these cases could be handled by X86InstrInfo::foldMemoryOperandCustom? The _Int variants will probably have to be treated quite differently but it'd be great to get pass-through of the upper bits from the other operand being properly supported some day.
> > 
> > I notice that you didn't have to alter the SSE versions, nor the rsqrt/rcp equivalents, so in the medium term I think we can get this fixed, which is why I say we keep the TODOs there to help keep track. You can strip the actual test cases, just leave the 4 TODO comments.
> > 
> Adding `TODO` tests and comments allows anyone reading the source code/tests to notice that there's an optimization that should be done but isn't being done.
> It's especially helpful in cases like this. From your comments (and other tests in this diff), it looks like we're not folding some `sqrts?` instructions after this patch.
> We should keep these tests and add a `TODO` mentioning that folding this instructions is not easy to do with tables and they should be folded elsewhere.
Fair enough. Thanks for the detailed answer.

================
Comment at: utils/TableGen/X86FoldTablesEmitter.cpp:118
+  // For manually mapping instructions that do not match by their encoding.
+  const std::vector<ManualMapEntry> ManualMapSet = {
+      { "ADD16ri_DB",       "ADD16mi",         NO_UNFOLD  },
----------------
filcab wrote:
> aymanmus wrote:
> > craig.topper wrote:
> > > aymanmus wrote:
> > > > craig.topper wrote:
> > > > > Why does this need to be vector? Why not a regular array or std::array?
> > > > Defining an array as a class member requires stating it's size (ManualMapSet[18]), while std::vector doesn't. I thought this table might be dynamic and changed from time to time, so this way can be more friendly for future tuning.
> > > > What do you think?
> > > The number is required even if you put the array contents with it?
> > Yes, ManualMapSet[] =...  would result in defining 0 sized array, and the compiler would give the following messages:
> > - warning: ISO C++ forbids zero-size array `ManualMapSet`
> > - error: too many initializers for `const {anonymous}::X86FoldTablesEmitter::ManualMapEntry [0]`
> Why? Not specifying the first (outermost) dimension is acceptable.
> Example: https://godbolt.org/g/HTnCIh
> Did I miss something?
In your example you defined the array to be global, in this case it works.
It doesn't work when the array is defined as a class member.
Thanks to your previous comment I moved this table outside the class and updated it accordingly.

================
Comment at: utils/TableGen/X86FoldTablesEmitter.cpp:160
+      // Memory folding is enabled only when optimizing for size by DAG
+      // patterns only. (issue detailed in D28744 review)
+      "VCVTSS2SDrm",            "VCVTSS2SDrr",
----------------
filcab wrote:
> D28744 was abandoned, and this comment made me think that we were doing something special when optimizing for size, which doesn't seem to be the case, after reading the patch.
> Is this a "this should be done" comment?
Yes, some of the listed instructions have DAG patterns for memory folding which are marked with OptForSize predicate.  A more "sophisticated" dealing should be added to VEX and EVEX instructions (which is detailed in the abandoned review comments).

================
Comment at: utils/TableGen/X86FoldTablesEmitter.cpp:310
+// Return the size of the register operand
+static inline unsigned int getRegOperandSize(const Record *RegRec) {
+  if (RegRec->isSubClassOf("RegisterClass"))
----------------
filcab wrote:
> You should probably comment that something ensures size == alignment.
Thanks for noticing, I planned to change that and somehow forgot. Updated with better approach.

================
Comment at: utils/TableGen/X86FoldTablesEmitter.cpp:714
+    std::vector<const CodeGenInstruction *> &OpcRegInsts =
+        RegInsts.find(Opc)->second;
+
----------------
filcab wrote:
> Maybe this?
> ```auto FoundRegInst = RegInsts.find(Opc);
> if (FoundRegInst == RegInsts.end())
>   continue;
> std::vector<const CodeGenInstruction *> &OpcRegInsts = FoundRegInst->second;```
The previous check:

```
if (RegInsts.count(Opc) == 0)
      continue;
```
makes sure RegInsts has an element with the required key.

https://reviews.llvm.org/D32684