[PATCH] D133421: [AArch64] break non-temporal loads over 256 into 256-loads and a smaller load

Fri Sep 9 02:08:04 PDT 2022

fhahn added a comment.

Thanks for the update, this looks like a nice improvement and should avoid regressions for those cases with D132559 <https://reviews.llvm.org/D132559>. A few more mostly stylistic comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17810

+// Break up large nontermporal loads so LDNPQ instruction can be selected
+static SDValue performLOADCombine(SDNode *N,
----------------
It would be good to be more concrete what large means here (>256 bits) will be split into blocks of 256bits.

nit: comments should be full sentences ending with a period (`.`)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17816
+  LoadSDNode *LD = cast<LoadSDNode>(N);
+  if (!LD->isNonTemporal())
+    return SDValue(N, 0);
----------------
Thinking about it a bit more, I think we should also not handle `volatile` and `atomic` loads here to be safe. Could you add tests and the extra conditions for early exit here? I think it would be fine to just add the tests directly in the patch.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17822
+      MemVT.getSizeInBits() % 256 == 0 ||
+      256 % MemVT.getScalarSizeInBits() != 0 || !LD->isNonTemporal())
+    return SDValue(N, 0);
----------------
`LD->isNonTemporal()` is already checked above

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17827
+  SDValue Chain = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  SDNodeFlags Flags = LD->getFlags();
----------------
nit: `BasePtr` would be more accurate.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17838
+  // Create all 256-bit loads starting from offset 0 and up to Num256Loads-1*32.
+  for (int I = 0; I < Num256Loads; I++) {
+    unsigned PtrOffset = I * 32;
----------------
nit: used as unsigned, so maybe make this unsigned too?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17848
+  }
+  // Process remaining bits in the load operation
+  // This is done by creating a null value vector to match the size of the
----------------
nit: comments should be full sentences ending with a period (`.`)

`bits in the load operation` -> `bits of the load operation`?

It might also help with readability if you add a newline before the comment.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17851
+  // 256-bit loads and inserting the remaining load to it. We extract the
+  // original load type at the end using EXTRACT_SUBVECTOR instruction
+  unsigned BitsRemaining = MemVT.getSizeInBits() % 256;
----------------
nit: comments should be full sentences ending with a period (.)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17864
+                  LD->getMemOperand()->getFlags(), LD->getAAInfo());
+  SmallVector<SDValue, 4> MaskVector;
+  SDValue NullVector = DAG.getUNDEF(NewVT);
----------------
MaskVector not used?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17865
+  SmallVector<SDValue, 4> MaskVector;
+  SDValue NullVector = DAG.getUNDEF(NewVT);
+  SDValue InsertIdx = DAG.getVectorIdxConstant(0, DL);
----------------
`UndefVector` would be more accurate?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133421/new/

https://reviews.llvm.org/D133421