[PATCH] D142907: LangRef: Add "dynamic" option to "denormal-fp-math"

Mon Jan 30 11:20:48 PST 2023

tra added a comment.

My $.02, mostly on the style and the mechanics of applying the new attribute. FP semantics aspects are above my pay grade.

================
Comment at: clang/lib/CodeGen/CGCall.cpp:2059-2061
+  F.removeFnAttrs(AttrsToRemove);
+  addDenormalModeAttrs(Merged, MergedF32, FuncAttrs);
+  F.addFnAttrs(FuncAttrs);
----------------
IIUIC, this changes denorm mode attributes on the functions with dynamic denorm mode that we link in.
Will that be a problem if the same function, when linked into different modules, would end up with different attributes? E.g. if a function is externally visible and is intended to be common'ed across multiple modules. Should dynamic denorm mode be restricted to the functions private to the module only? We do typically internalize linked bitcode for CUDA, but I don't think it's something we can always implicitly assume.

================
Comment at: clang/test/CodeGenCUDA/Inputs/ocml-sample.cl:11
+
+half do_f16_stuff(half a, half b, half c) {
+  return __builtin_fmaf16(a, b, c) + 4.0h;
----------------
Cosmetic nit: order functions as f16/f32/f64?

================
Comment at: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:15
+// RUN:   %S/Inputs/ocml-sample.cl -o %t.dynamic.full.bc
+
+
----------------
Do we want to verify that the compiled samples have the correct function attributes?

================
Comment at: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:20
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.dynamic.f32.bc \
+// RUN:   -emit-llvm %s -o - | FileCheck -implicit-check-not=denormal-fp-math %s --check-prefixes=CHECK,DEFAULT
----------------
Would it be useful to check the attributes the functions have w/o linking the sample bitcode? In these tests one can infer expected attributes from the flags and comments, so I'm fine not having explicit checks for that.

================
Comment at: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:77
+
+// CHECK: kernel_f32({{.*}}) #[[$KERNELATTR:[0-9]+]]
+__global__ void kernel_f32(float* out, float* a, float* b, float* c) {
----------------
Nit: CHECK-LABEL ?

================
Comment at: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:90-91
+
+// CHECK: do_f32_stuff({{.*}}) #[[$FUNCATTR:[0-9]+]]
+// CHECK: do_f64_stuff({{.*}}) #[[$FUNCATTR]]
+
----------------
I assume these refer to linked in functions, not their calls. It may be useful to include match define/call to make it obvious.

================
Comment at: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:94
+// We should not be littering call sites with the attribute
+// CHECK-NOT: denormal-fp-math
+
----------------
I'm not sure whether it does what it's intended to. 
AFAICT, at this point we will be past the call sites, so if it's intended to check the call sites in kernel_*, it will likely always succeed, even if we do litter call sites with unwanted attributes.

It's also possible that I have a wrong idea about what the expected IR looks like. If you could post it for reference, that would be helpful.

================
Comment at: llvm/test/CodeGen/Generic/denormal-fp-math-cl-opt.ll:3
+
+; Check that we annotated the command line flag annotates the IR with the appropriate attributes
+
----------------
Edit: `Check that the command line flag annotates the IR with the appropriate attributes.`

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142907/new/

https://reviews.llvm.org/D142907