[llvm] [RISCV] Prefer whole register loads and stores when VL=VLMAX (PR #75531)

Wed Feb 28 08:43:43 PST 2024

================
@@ -9882,12 +9896,23 @@ RISCVTargetLowering::lowerFixedLengthVectorStoreToRVV(SDValue Op,
 
   MVT ContainerVT = getContainerForFixedLengthVector(VT);
 
-  SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG,
-                       Subtarget);
-
   SDValue NewValue =
       convertToScalableVector(ContainerVT, StoreVal, DAG, Subtarget);
 
+
+  // If we know the exact VLEN and our fixed length vector completely fills
+  // the container, use a whole register store instead.
+  const auto [MinVLMAX, MaxVLMAX] =
+      RISCVTargetLowering::computeVLMAXBounds(ContainerVT, Subtarget);
+  if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
+      getLMUL1VT(ContainerVT).bitsLE(ContainerVT))
+    return DAG.getStore(Store->getChain(), DL, NewValue, Store->getBasePtr(),
----------------
lukel97 wrote:

I'm seeing a miscompile on SPEC CPU 2017 502.gcc_r when compiled with `-march=rv64gv -mrvv-vector-bits=zvl -O3` that was bisected back to here e8a15eca92f1a10b3af4f4e52f54d9d2d7612bf5. It seems to be caused by this store part and not the load, *only* with SEW=64 and LMUL=1. 

I don't have any reduced test case to share at the moment, and looking at the code and the test diffs nothing seems to stick out to me. Will keep investigating

https://github.com/llvm/llvm-project/pull/75531