[PATCH] D62908: [PowerPC] Improve float vector gather codegen

Tue Sep 17 10:27:46 PDT 2019

jsji added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3986
+                                   (f32 (load xoaddr:$D)))),
+              (v4f32 (XXPERMDI (XXMRGHW AlignValues.LD32D, AlignValues.LD32C),
+                               (XXMRGHW AlignValues.LD32B, AlignValues.LD32A), 3))>;
----------------
nemanjai wrote:
> jsji wrote:
> > What is the benefit of merging 'AB', 'CD', instead of original 'AC', 'BD' then `vmrgew`?
> > 
> > `vmrgew` is 2 cycle ALU instruction, should still be better than 3 cycler `xxpermdi` here.
> We should favour the larger register file available to `XXPERMDI` here rather than `VMRGEW`.
> Besides, where does the information about `XXPERMDI` taking 3 cycles come from? It is not listed in the UM and a similar instruction (`XXSEL` is a 2 cycle ALU instruction as well).
Yes, if the cycles is the same, then we should favor larger reg files.
But looks like they are not the same to me.

Unfortunately, UM is missing detail cycle information about `xxpermdi`, 
but since it is a permute instruction, all PM instructions are 3 cycles. 
`xxsel` is *ALU* instruction, hence it is 2 cycles.

And we are modeling that in our scheduling info as well.

```
$ grep XXPERMDI llvm/lib/Target/PowerPC/P9InstrResources.td -B 100|grep def -B 3
// Three Cycle PM operation. Only one PM unit per superslice so we use the whole
// superslice. That includes both exec pipelines (EXECO, EXECE) and one
// dispatch.
def : InstRW<[P9_PM_3C, IP_EXECO_1C, IP_EXECE_1C, DISP_1C],
```

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62908/new/

https://reviews.llvm.org/D62908