[PATCH] D106555: [PowerPC] handle more splat loads

Thu Oct 28 03:08:35 PDT 2021

shchenz added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2835
+def : Pat<(v4f32 (PPCldsplat ForceXForm:$A)),
+          (v4f32 (XXSPLTW (SUBREG_TO_REG (i64 1), (LFIWZX ForceXForm:$A), sub_64), 1))>;
 def : Pat<(v2i64 (PPCldsplat ForceXForm:$A)),
----------------
jsji wrote:
> Do we need some conditional checking here? As `LFIWZX` is changing the type to int now, are there any precision change?
I think we don't need a conditional check. `LFIWZX` will not change the raw_data in the registers. For example, if the data stored in the memory is float type 1.5(0x000000003fc00000), after `LFIWZX`, the raw data stored in result FPR is still `0x000000003fc00000`, not integer 1?

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3556
+def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
+          (v8i16 (VSPLTHs 3, (MTVSRD (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (LHZX ForceXForm:$A), sub_32))))>;
+def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
----------------
jsji wrote:
> Do we need to predicate 64 bit before using `MTVSRD`?
Thanks, this is a very good question. Yes, normally, we need to predicate 64 bit if MTVSRD emits any value in the high 32bits of the first double world.

However, for this case, seems we don't need the predicate, because the useful bits for this case are the lowest 16/8 bits, so even on the 32 bit arch, we still can ensure the lowest 16/8 bits are having the right value.

I add a comment to indicate why we don't need to predicate 64 bit.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:4107
+def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
+          (v8i16 (VSPLTHs 3, (LXSIHZX ForceXForm:$A)))>;
+def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
----------------
jsji wrote:
> Do we need alignment check here for `LXSIHZX`?
I think non-alignment load may introduce perf issue, but it will not have functionality issue? Seems load like `LDX` also not checks the alignment. Do you see any issues here? 

================
Comment at: llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll:646
+; CHECK-P8-NEXT:    lfiwzx f0, r3, r4
+; CHECK-P8-NEXT:    xxspltd v2, f0, 0
+; CHECK-P8-NEXT:    vmrglb v2, v3, v2
----------------
jsji wrote:
> `f0` here looks weird regarding register classes, `xxspltd` does need `vs0` to get the `element 0` of doubleword.  Maybe we should follow up to see why we are printing `f0` here in ASMPrinter.
Sure, I will check this later. `vs0` seems a more reasonable input.

================
Comment at: llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll:18
 ; P9LE-NEXT:    blr
-
+;
 ; P9BE-LABEL: s2v_test1:
----------------
jsji wrote:
> Unrelated changes?
It is also for the weird change: `xxspltd v2, f0`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106555/new/

https://reviews.llvm.org/D106555