[PATCH] D133421: [AArch64] break non-temporal loads over 256 into 256-loads and a smaller load
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 9 02:08:04 PDT 2022
fhahn added a comment.
Thanks for the update, this looks like a nice improvement and should avoid regressions for those cases with D132559 <https://reviews.llvm.org/D132559>. A few more mostly stylistic comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17810
+// Break up large nontermporal loads so LDNPQ instruction can be selected
+static SDValue performLOADCombine(SDNode *N,
----------------
It would be good to be more concrete what large means here (>256 bits) will be split into blocks of 256bits.
nit: comments should be full sentences ending with a period (`.`)
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17816
+ LoadSDNode *LD = cast<LoadSDNode>(N);
+ if (!LD->isNonTemporal())
+ return SDValue(N, 0);
----------------
Thinking about it a bit more, I think we should also not handle `volatile` and `atomic` loads here to be safe. Could you add tests and the extra conditions for early exit here? I think it would be fine to just add the tests directly in the patch.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17822
+ MemVT.getSizeInBits() % 256 == 0 ||
+ 256 % MemVT.getScalarSizeInBits() != 0 || !LD->isNonTemporal())
+ return SDValue(N, 0);
----------------
`LD->isNonTemporal()` is already checked above
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17827
+ SDValue Chain = LD->getChain();
+ SDValue Ptr = LD->getBasePtr();
+ SDNodeFlags Flags = LD->getFlags();
----------------
nit: `BasePtr` would be more accurate.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17838
+ // Create all 256-bit loads starting from offset 0 and up to Num256Loads-1*32.
+ for (int I = 0; I < Num256Loads; I++) {
+ unsigned PtrOffset = I * 32;
----------------
nit: used as unsigned, so maybe make this unsigned too?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17848
+ }
+ // Process remaining bits in the load operation
+ // This is done by creating a null value vector to match the size of the
----------------
nit: comments should be full sentences ending with a period (`.`)
`bits in the load operation` -> `bits of the load operation`?
It might also help with readability if you add a newline before the comment.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17851
+ // 256-bit loads and inserting the remaining load to it. We extract the
+ // original load type at the end using EXTRACT_SUBVECTOR instruction
+ unsigned BitsRemaining = MemVT.getSizeInBits() % 256;
----------------
nit: comments should be full sentences ending with a period (.)
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17864
+ LD->getMemOperand()->getFlags(), LD->getAAInfo());
+ SmallVector<SDValue, 4> MaskVector;
+ SDValue NullVector = DAG.getUNDEF(NewVT);
----------------
MaskVector not used?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17865
+ SmallVector<SDValue, 4> MaskVector;
+ SDValue NullVector = DAG.getUNDEF(NewVT);
+ SDValue InsertIdx = DAG.getVectorIdxConstant(0, DL);
----------------
`UndefVector` would be more accurate?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D133421/new/
https://reviews.llvm.org/D133421
More information about the llvm-commits
mailing list