[llvm] Fix/aarch64 memset dup optimization (PR #166030)

Osama Abdelkader via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 18 10:06:43 PST 2025


================
@@ -29702,6 +29708,31 @@ AArch64TargetLowering::EmitKCFICheck(MachineBasicBlock &MBB,
       .getInstr();
 }
 
+bool AArch64TargetLowering::shallExtractConstSplatVectorElementToStore(
+    Type *VectorTy, unsigned ElemSizeInBits, unsigned &Index) const {
+  // On AArch64, we can efficiently extract a scalar from a splat vector using
+  // str s/d/q0 which extracts 32/64/128 bits from the vector register.
+  // This is useful for memset where we generate a v16i8 splat and need to store
+  // a smaller scalar (e.g., i32 for a 4-byte memset).
+  if (FixedVectorType *VTy = dyn_cast<FixedVectorType>(VectorTy)) {
+    // Only handle v16i8 splat (128 bits total, 16 elements of 8 bits each)
+    if (VTy->getNumElements() == 16 && VTy->getElementType()->isIntegerTy(8)) {
----------------
osamakader wrote:

Done.

https://github.com/llvm/llvm-project/pull/166030


More information about the llvm-commits mailing list