[llvm] [NVPTX] Improve kernel byval parameter lowering (PR #136008)
Alex MacLean via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 17 07:29:21 PDT 2025
================
@@ -148,18 +139,40 @@ entry:
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define dso_local ptx_kernel void @read_only_gep_asc0(ptr nocapture noundef writeonly %out, ptr nocapture noundef readonly byval(%struct.S) align 4 %s) local_unnamed_addr #0 {
-; COMMON-LABEL: define dso_local ptx_kernel void @read_only_gep_asc0(
-; COMMON-SAME: ptr noundef writeonly captures(none) [[OUT:%.*]], ptr noundef readonly byval([[STRUCT_S:%.*]]) align 4 captures(none) [[S:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
-; COMMON-NEXT: [[ENTRY:.*:]]
-; COMMON-NEXT: [[S1:%.*]] = alloca [[STRUCT_S]], align 4
-; COMMON-NEXT: [[S2:%.*]] = addrspacecast ptr [[S]] to ptr addrspace(101)
-; COMMON-NEXT: call void @llvm.memcpy.p0.p101.i64(ptr align 4 [[S1]], ptr addrspace(101) align 4 [[S2]], i64 8, i1 false)
-; COMMON-NEXT: [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
-; COMMON-NEXT: [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
-; COMMON-NEXT: [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
-; COMMON-NEXT: [[I:%.*]] = load i32, ptr [[ASC0]], align 4
-; COMMON-NEXT: store i32 [[I]], ptr [[OUT]], align 4
-; COMMON-NEXT: ret void
+; SM_60-LABEL: define dso_local ptx_kernel void @read_only_gep_asc0(
+; SM_60-SAME: ptr noundef writeonly captures(none) [[OUT:%.*]], ptr noundef readonly byval([[STRUCT_S:%.*]]) align 4 captures(none) [[S:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; SM_60-NEXT: [[ENTRY:.*:]]
+; SM_60-NEXT: [[S1:%.*]] = alloca [[STRUCT_S]], align 4
+; SM_60-NEXT: [[S2:%.*]] = call ptr addrspace(101) @llvm.nvvm.internal.noop.addrspacecast.p101.p0(ptr [[S]])
+; SM_60-NEXT: call void @llvm.memcpy.p0.p101.i64(ptr align 4 [[S1]], ptr addrspace(101) align 4 [[S2]], i64 8, i1 false)
+; SM_60-NEXT: [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
+; SM_60-NEXT: [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
+; SM_60-NEXT: [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
+; SM_60-NEXT: [[I:%.*]] = load i32, ptr [[ASC0]], align 4
----------------
AlexMaclean wrote:
> The question is whether the test was correct to assume that there will be no local copy.
Considering that we are only reading from the byval argument and the test is named read_only_gep_asc0, it seems to be reasonable. So why did we end up with an alloca then?
Choosing to lower a parameter that hasn't been explicitly marked as "grid_constant" as a grid-constant seems like an optimization decision NVPTXLowerArgs may freely make based on it's own set of heuristics. I think it is not a good idea for any other pass or front-end to make assumptions about whether or not this will happen (fortunately, as far as I know there are no cases of this today). In this case, the `addrspacecast` instructions are extra dangerous because `cvta.param` isn't supported on sm_60 so we're reliant on other optimizations to remove them in order to compile correctly.
In this specific test it looks like we're not treating this pointer as grid-constant because we treat the cast back to generic as an escape:
https://github.com/llvm/llvm-project/blob/728f6de4177a7e4d8030cb37ace525e2af97d247/llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp#L482-L483
It might make sense to allow this case as well. But I don't know if I fully understand all the ramifications or real world motivation for that so I'd prefer to leave it to the side for now, as this patch is already somewhat large.
https://github.com/llvm/llvm-project/pull/136008
More information about the llvm-commits
mailing list