[llvm] [NVPTX] Improve kernel byval parameter lowering (PR #136008)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 17 11:48:07 PDT 2025


================
@@ -148,18 +139,40 @@ entry:
 
 ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
 define dso_local ptx_kernel void @read_only_gep_asc0(ptr nocapture noundef writeonly %out, ptr nocapture noundef readonly byval(%struct.S) align 4 %s) local_unnamed_addr #0 {
-; COMMON-LABEL: define dso_local ptx_kernel void @read_only_gep_asc0(
-; COMMON-SAME: ptr noundef writeonly captures(none) [[OUT:%.*]], ptr noundef readonly byval([[STRUCT_S:%.*]]) align 4 captures(none) [[S:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
-; COMMON-NEXT:  [[ENTRY:.*:]]
-; COMMON-NEXT:    [[S1:%.*]] = alloca [[STRUCT_S]], align 4
-; COMMON-NEXT:    [[S2:%.*]] = addrspacecast ptr [[S]] to ptr addrspace(101)
-; COMMON-NEXT:    call void @llvm.memcpy.p0.p101.i64(ptr align 4 [[S1]], ptr addrspace(101) align 4 [[S2]], i64 8, i1 false)
-; COMMON-NEXT:    [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
-; COMMON-NEXT:    [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
-; COMMON-NEXT:    [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
-; COMMON-NEXT:    [[I:%.*]] = load i32, ptr [[ASC0]], align 4
-; COMMON-NEXT:    store i32 [[I]], ptr [[OUT]], align 4
-; COMMON-NEXT:    ret void
+; SM_60-LABEL: define dso_local ptx_kernel void @read_only_gep_asc0(
+; SM_60-SAME: ptr noundef writeonly captures(none) [[OUT:%.*]], ptr noundef readonly byval([[STRUCT_S:%.*]]) align 4 captures(none) [[S:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; SM_60-NEXT:  [[ENTRY:.*:]]
+; SM_60-NEXT:    [[S1:%.*]] = alloca [[STRUCT_S]], align 4
+; SM_60-NEXT:    [[S2:%.*]] = call ptr addrspace(101) @llvm.nvvm.internal.noop.addrspacecast.p101.p0(ptr [[S]])
+; SM_60-NEXT:    call void @llvm.memcpy.p0.p101.i64(ptr align 4 [[S1]], ptr addrspace(101) align 4 [[S2]], i64 8, i1 false)
+; SM_60-NEXT:    [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
+; SM_60-NEXT:    [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
+; SM_60-NEXT:    [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
+; SM_60-NEXT:    [[I:%.*]] = load i32, ptr [[ASC0]], align 4
----------------
Artem-B wrote:

Thank you for double checking the reason for us still making a copy. 

I agree that the test is somewhat questionable and the IR should not have been casing argument pointer to generic AS and back to param space and it may indeed be broken in each of these directions.

> In this case, the addrspacecast instructions are extra dangerous because cvta.param isn't supported on sm_60

Correct. That test indeed works only because both the alloca and the ASCs got eliminated because the code is trivial.
I guess that was partially the idea for the test that byval lowering can see through additional ASCs and I've goofed up by adding a supposedly no-op but actually invalid ASC to generic and back, and got lucky that it just happened to give me the result I expected, so I did not pay attention to what happened in-between test IR and the PTX.

For now we can remove or comment out this test.

> It might make sense to allow this case as well. 

Yup. We still do want the read-only access checks to be able to see through ASCs. The code you've pointed to was basically intended to allow generic->param ASCs added by the pass inferring address spaces, and we basically considered everything else to be an escape. Now that ASC from param to generic does exist in some cases, considering them acceptable during read-only-ness tracking would make sense, too. Whether you want to incorporate it into this patch or leave it out is up to you.




https://github.com/llvm/llvm-project/pull/136008


More information about the llvm-commits mailing list