[llvm] [PPC] generate stxvw4x/lxvw4x on P7 (PR #87049)

Mon Apr 1 23:53:57 PDT 2024

================
@@ -17250,8 +17250,7 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
 EVT PPCTargetLowering::getOptimalMemOpType(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
   if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) {
-    // We should use Altivec/VSX loads and stores when available. For unaligned
-    // addresses, unaligned VSX loads are only fast starting with the P8.
----------------
chenzheng1030 wrote:

I checked on a PWR7 AIX server, with alignment like 1/2, AIX OS observed a misalign load/store operation. But the lxvw4x is able to execute successfully.
The performance shows lxvw4x is same with current `lxvd2x`(see `llvm/test/CodeGen/PowerPC/memcpy-vec.ll`). lxvw4x is better than scalar fix point `ld` on AIX P7.

https://github.com/llvm/llvm-project/pull/87049