[PATCH] D106555: [PowerPC] handle more splat loads

Thu Nov 4 22:33:00 PDT 2021

shchenz marked 7 inline comments as done.
shchenz added a comment.

Thanks for your comments @nemanjai

I addressed most of the comment in commit 96950270669acd3c342a266562ff3a41464cc0a0 <https://reviews.llvm.org/rG96950270669acd3c342a266562ff3a41464cc0a0>

One comment related to the code gen improvement for the aligned load will be addressed in the follow-up Phabricator patch.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:5837
+      EVT Type = N->getValueType(0);
+      if (Type == MVT::v16i8 || Type == MVT::v8i16) {
+        // v16i8 LD_SPLAT addr
----------------
nemanjai wrote:
> Seems like all the code is also inside this block. Please flip the condition and early exit. You should really look for opportunities to do this and reduce the level of nesting.
> 
> I think it would also be useful to provide a much simpler code path for loads that are aligned at 16 bytes. In that case, all you need is `LVX + SPLAT` (where the splat index depends on endianness.
> 
> Example:
> ```
> typedef signed short __attribute__((aligned(16))) AlignedShort;
> vector signed short test(AlignedShort *Ptr) {
>   return (vector signed short)*(Ptr + 3);
> }
> ```
I'd like to follow up on the 16-bytes aligned splat load in another patch.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3556
+def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
+          (v8i16 (VSPLTHs 3, (MTVSRD (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (LHZX ForceXForm:$A), sub_32))))>;
+def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
----------------
nemanjai wrote:
> shchenz wrote:
> > jsji wrote:
> > > Do we need to predicate 64 bit before using `MTVSRD`?
> > Thanks, this is a very good question. Yes, normally, we need to predicate 64 bit if MTVSRD emits any value in the high 32bits of the first double world.
> > 
> > However, for this case, seems we don't need the predicate, because the useful bits for this case are the lowest 16/8 bits, so even on the 32 bit arch, we still can ensure the lowest 16/8 bits are having the right value.
> > 
> > I add a comment to indicate why we don't need to predicate 64 bit.
> I understand that we don't really care about the undefined bits because they'll be overwritten when we splat, but why not just use `MTVSRWZ` and not even need the comment?
Thanks for the good suggestion. This seems also resolve one crash related to `INSERT_SUBREG(x, x, sub_32)` on 32-bit AIX. I addressed this comment in D113236

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106555/new/

https://reviews.llvm.org/D106555