[all-commits] [llvm/llvm-project] 49bc3f: [LowerMemIntrinsics] Optimize memset lowering
Fabian Ritter via All-commits
all-commits at lists.llvm.org
Wed Dec 3 00:55:58 PST 2025
Branch: refs/heads/users/ritter-x2a/08-20-_lowermemintrinsics_optimize_memset_lowering
Home: https://github.com/llvm/llvm-project
Commit: 49bc3f002d40f09d568952715a824a1e0fd2ed5d
https://github.com/llvm/llvm-project/commit/49bc3f002d40f09d568952715a824a1e0fd2ed5d
Author: Fabian Ritter <fabian.ritter at amd.com>
Date: 2025-12-03 (Wed, 03 Dec 2025)
Changed paths:
M llvm/include/llvm/Transforms/Utils/LowerMemIntrinsics.h
M llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
M llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
M llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
M llvm/lib/Target/NVPTX/NVPTXLowerAggrCopies.cpp
M llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
M llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll
M llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
M llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
M llvm/test/CodeGen/AMDGPU/lower-mem-intrinsics-threshold.ll
M llvm/test/CodeGen/AMDGPU/lower-mem-intrinsics.ll
M llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
A llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll
M llvm/test/CodeGen/NVPTX/lower-aggr-copies.ll
M llvm/test/CodeGen/SPIRV/llvm-intrinsics/memset.ll
M llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-inline-non-constant-len.ll
Log Message:
-----------
[LowerMemIntrinsics] Optimize memset lowering
This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.
This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.
I'm planning similar treatment for memset.pattern as a follow-up PR.
For SWDEV-543208.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list