[PATCH] D62908: [PowerPC] Improve float vector gather codegen

Fri Aug 30 20:58:07 PDT 2019

jsji requested changes to this revision.
jsji added a comment.
This revision now requires changes to proceed.

My understanding is that the cycle saving come from avoiding unnecessary SP->DP, then DP->SP conversion.
But not the difference merge sequence.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2391
   dag I32_TO_BE_WORD1 = (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC);
+  dag LD32A = (COPY_TO_REGCLASS (LIWZX xoaddr:$A), VSRC);
+  dag LD32B = (COPY_TO_REGCLASS (LIWZX xoaddr:$B), VSRC);
----------------
Why these dag belongs to `AlignValues`? Why not `MrgFP`?

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3982
                         (COPY_TO_REGCLASS $A, VSRC), 0))>;
-
+    def : Pat<(v4f32 (build_vector (f32 (load xoaddr:$A)),
+                                   (f32 (load xoaddr:$B)),
----------------
What about BigEndian?

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3986
+                                   (f32 (load xoaddr:$D)))),
+              (v4f32 (XXPERMDI (XXMRGHW AlignValues.LD32D, AlignValues.LD32C),
+                               (XXMRGHW AlignValues.LD32B, AlignValues.LD32A), 3))>;
----------------
What is the benefit of merging 'AB', 'CD', instead of original 'AC', 'BD' then `vmrgew`?

`vmrgew` is 2 cycle ALU instruction, should still be better than 3 cycler `xxpermdi` here.

================
Comment at: llvm/test/CodeGen/PowerPC/float-vector-gather.ll:3
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -ppc-vsr-nums-as-vr \
+; RUN: -ppc-asm-full-reg-names -mtriple=powerpc64le-unknown-linux-gnu < %s \
+; RUN: | FileCheck %s
----------------
Add test for Big endian as well please.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62908/new/

https://reviews.llvm.org/D62908