[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

Mon Jul 1 05:57:22 PDT 2024

================
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck %s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:    [[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:    [[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:    [[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:    store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:    store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:    store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:    store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 4
+// CHECK-NEXT:    store double 1.000000e+00, ptr [[D]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:    [[CONV:%.*]] = sext i8 [[TMP0]] to i32
----------------
JonChesterfield wrote:

C promotes them to i32. C has a lot of rules around vararg type promotion that have not aged brilliantly.

If you want a i8 or i16, put it in a struct. C doesn't say anything about promoting that and amdgpu will pass it inlined into the struct, i.e. with no overhead. I believe nvptx will do likewise.

https://github.com/llvm/llvm-project/pull/96369