[PATCH] D133421: [AArch64] break non-temporal loads over 256 into 256-loads and a smaller load
Tim Northover via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 9 08:55:41 PDT 2022
t.p.northover added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17860
+ DAG.getMemBasePlusOffset(BasePtr, TypeSize::Fixed(PtrOffset), DL, Flags);
+ Align NewAlign = commonAlignment(LD->getAlign(), MemVT.getSizeInBits());
+ SDValue RemainingLoad =
----------------
Shouldn't the second operand of this call be `PtrOffset` again?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17867-17869
+ SDValue ExtendedReminingLoad =
+ DAG.getNode(ISD::INSERT_SUBVECTOR, DL, NewVT,
+ {UndefVector, RemainingLoad, InsertIdx});
----------------
Is this (and the implementation generally) big-endian correct? I don't know the answer here, I can never remember what's supposed to happen. But someone should definitely try it on an `aarch64_be` target and at least eyeball the assembly to check the offsets and so on.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17881
+ {ConcatVectors, DAG.getVectorIdxConstant(0, DL)});
+ return DAG.getMergeValues({ExtractSubVector, Chain}, DL);
+}
----------------
The `Chain` here is the input chain, I think you need to `TokenFactor` all the loads' output chains together to make sure nothing gets reordered with them.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19934
+ performTBISimplification(N->getOperand(1), DCI, DAG))
return SDValue(N, 0);
+ return performLOADCombine(N, DCI, DAG, Subtarget);
----------------
This looks like it takes precedence over `performLOADCombine` and disables `ldnp` formation if the TBI feature is enabled.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D133421/new/
https://reviews.llvm.org/D133421
More information about the llvm-commits
mailing list