[llvm] Fix/aarch64 memset dup optimization (PR #166030)
Osama Abdelkader via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 18 10:06:43 PST 2025
================
@@ -29702,6 +29708,31 @@ AArch64TargetLowering::EmitKCFICheck(MachineBasicBlock &MBB,
.getInstr();
}
+bool AArch64TargetLowering::shallExtractConstSplatVectorElementToStore(
+ Type *VectorTy, unsigned ElemSizeInBits, unsigned &Index) const {
+ // On AArch64, we can efficiently extract a scalar from a splat vector using
+ // str s/d/q0 which extracts 32/64/128 bits from the vector register.
+ // This is useful for memset where we generate a v16i8 splat and need to store
+ // a smaller scalar (e.g., i32 for a 4-byte memset).
+ if (FixedVectorType *VTy = dyn_cast<FixedVectorType>(VectorTy)) {
+ // Only handle v16i8 splat (128 bits total, 16 elements of 8 bits each)
+ if (VTy->getNumElements() == 16 && VTy->getElementType()->isIntegerTy(8)) {
----------------
osamakader wrote:
Done.
https://github.com/llvm/llvm-project/pull/166030
More information about the llvm-commits
mailing list