[PATCH] D145238: [NVPTX] Expose LDU builtins

Fri Mar 3 14:30:57 PST 2023

tra added a comment.

Nice. Thank you!

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:18104
 CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, const CallExpr *E) {
-  auto MakeLdg = [&](unsigned IntrinsicID) {
+  auto HasHalfSupport = [&](unsigned BuiltinID) {
+    auto &Context = getContext();
----------------
I'd add a comment describing a meaning of the fields in the returned pair.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:18116
+    case NVPTX::BI__nvvm_ldu_h:
+      BuiltinName = "__nvvm_ldu_h";
+      break;
----------------
Can we use the standard `StringRef Name = getContext().BuiltinInfo.getName(BuiltinID);` to figure out the builtin name?

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:18261-18263
+      std::string ErrMsg{HalfSupport.second};
+      CGM.Error(E->getExprLoc(),
+                ErrMsg.append(" requires native half type support.").c_str());
----------------
Nit: this would be a bit more readable:
```
std::string BuiltinName{HalfSupport.second};
CGM.Error(E->getExprLoc(),  BuiltinName + " requires native half type support.")` 
```
You may also consider changing returned `BuiltinName` to be `std::string`, so you would not need an explicit temp var. 

================
Comment at: clang/test/CodeGen/builtins-nvptx-native-half-type-err.c:3
+//
+// RUN: not %clang_cc1 -DLDG -fsyntax-only -ffp-contract=off -triple nvptx-unknown-unknown -target-cpu \
+// RUN:   sm_75 -target-feature +ptx70 -fcuda-is-device -x cuda -emit-llvm -o - %s 2>&1 \
----------------
I think we could've done it all in one run if we were to do both ldg and ldu in one function.

================
Comment at: llvm/test/CodeGen/NVPTX/ldu-ldg.ll:33
+; ldu.global.u16
+  %val = tail call i16 @llvm.nvvm.ldu.global.i.i16.p1(ptr addrspace(1) %ptr, i32 4)
+  ret i16 %val
----------------
Should alignment be 2 ?

================
Comment at: llvm/test/CodeGen/NVPTX/ldu-ldg.ll:60
+define double @test_ldu_f64(ptr addrspace(1) %ptr) {
+; ldu.global.u64
+  %val = tail call double @llvm.nvvm.ldu.global.f.f64.p1(ptr addrspace(1) %ptr, i32 8)
----------------
Hmm. I wonder why we end up with `u64` here and not `b64`. Not that it matters in this case, but it is a discrepancy vs. `f16`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145238/new/

https://reviews.llvm.org/D145238