[llvm] [NVPTX] Improve kernel byval parameter lowering (PR #136008)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 16 16:14:56 PDT 2025
================
@@ -148,18 +139,40 @@ entry:
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define dso_local ptx_kernel void @read_only_gep_asc0(ptr nocapture noundef writeonly %out, ptr nocapture noundef readonly byval(%struct.S) align 4 %s) local_unnamed_addr #0 {
-; COMMON-LABEL: define dso_local ptx_kernel void @read_only_gep_asc0(
-; COMMON-SAME: ptr noundef writeonly captures(none) [[OUT:%.*]], ptr noundef readonly byval([[STRUCT_S:%.*]]) align 4 captures(none) [[S:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
-; COMMON-NEXT: [[ENTRY:.*:]]
-; COMMON-NEXT: [[S1:%.*]] = alloca [[STRUCT_S]], align 4
-; COMMON-NEXT: [[S2:%.*]] = addrspacecast ptr [[S]] to ptr addrspace(101)
-; COMMON-NEXT: call void @llvm.memcpy.p0.p101.i64(ptr align 4 [[S1]], ptr addrspace(101) align 4 [[S2]], i64 8, i1 false)
-; COMMON-NEXT: [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
-; COMMON-NEXT: [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
-; COMMON-NEXT: [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
-; COMMON-NEXT: [[I:%.*]] = load i32, ptr [[ASC0]], align 4
-; COMMON-NEXT: store i32 [[I]], ptr [[OUT]], align 4
-; COMMON-NEXT: ret void
+; SM_60-LABEL: define dso_local ptx_kernel void @read_only_gep_asc0(
+; SM_60-SAME: ptr noundef writeonly captures(none) [[OUT:%.*]], ptr noundef readonly byval([[STRUCT_S:%.*]]) align 4 captures(none) [[S:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; SM_60-NEXT: [[ENTRY:.*:]]
+; SM_60-NEXT: [[S1:%.*]] = alloca [[STRUCT_S]], align 4
+; SM_60-NEXT: [[S2:%.*]] = call ptr addrspace(101) @llvm.nvvm.internal.noop.addrspacecast.p101.p0(ptr [[S]])
+; SM_60-NEXT: call void @llvm.memcpy.p0.p101.i64(ptr align 4 [[S1]], ptr addrspace(101) align 4 [[S2]], i64 8, i1 false)
+; SM_60-NEXT: [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
+; SM_60-NEXT: [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
+; SM_60-NEXT: [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
+; SM_60-NEXT: [[I:%.*]] = load i32, ptr [[ASC0]], align 4
----------------
Artem-B wrote:
We're doing something really wrong here:
```
; SM_60-NEXT: [[S1:%.*]] = alloca [[STRUCT_S]], align 4
...
; SM_60-NEXT: [[B:%.*]] = getelementptr inbounds nuw i8, ptr [[S1]], i64 4
; SM_60-NEXT: [[ASC:%.*]] = addrspacecast ptr [[B]] to ptr addrspace(101)
; SM_60-NEXT: [[ASC0:%.*]] = addrspacecast ptr addrspace(101) [[ASC]] to ptr
```
`S1` is an alloca, in the local AS, yet we're still casting it to param AS. The only reason we get away with that is because we cast it back to generic right away and/or because alloca + memcpy get eliminated and we do end up pointing to the original location in the param space, but the IR as captured by the test is wrong.
Granted, it's been wrong before your patch, too. but this looks rather scary. If something ends up holding onto `ASC` and we'll end up trying to access it via `ld.param` we'd be in trouble.
https://github.com/llvm/llvm-project/pull/136008
More information about the llvm-commits
mailing list