[PATCH] D145238: [NVPTX] Expose LDU builtins
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Mar 3 14:30:57 PST 2023
tra added a comment.
Nice. Thank you!
================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:18104
CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, const CallExpr *E) {
- auto MakeLdg = [&](unsigned IntrinsicID) {
+ auto HasHalfSupport = [&](unsigned BuiltinID) {
+ auto &Context = getContext();
----------------
I'd add a comment describing a meaning of the fields in the returned pair.
================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:18116
+ case NVPTX::BI__nvvm_ldu_h:
+ BuiltinName = "__nvvm_ldu_h";
+ break;
----------------
Can we use the standard `StringRef Name = getContext().BuiltinInfo.getName(BuiltinID);` to figure out the builtin name?
================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:18261-18263
+ std::string ErrMsg{HalfSupport.second};
+ CGM.Error(E->getExprLoc(),
+ ErrMsg.append(" requires native half type support.").c_str());
----------------
Nit: this would be a bit more readable:
```
std::string BuiltinName{HalfSupport.second};
CGM.Error(E->getExprLoc(), BuiltinName + " requires native half type support.")`
```
You may also consider changing returned `BuiltinName` to be `std::string`, so you would not need an explicit temp var.
================
Comment at: clang/test/CodeGen/builtins-nvptx-native-half-type-err.c:3
+//
+// RUN: not %clang_cc1 -DLDG -fsyntax-only -ffp-contract=off -triple nvptx-unknown-unknown -target-cpu \
+// RUN: sm_75 -target-feature +ptx70 -fcuda-is-device -x cuda -emit-llvm -o - %s 2>&1 \
----------------
I think we could've done it all in one run if we were to do both ldg and ldu in one function.
================
Comment at: llvm/test/CodeGen/NVPTX/ldu-ldg.ll:33
+; ldu.global.u16
+ %val = tail call i16 @llvm.nvvm.ldu.global.i.i16.p1(ptr addrspace(1) %ptr, i32 4)
+ ret i16 %val
----------------
Should alignment be 2 ?
================
Comment at: llvm/test/CodeGen/NVPTX/ldu-ldg.ll:60
+define double @test_ldu_f64(ptr addrspace(1) %ptr) {
+; ldu.global.u64
+ %val = tail call double @llvm.nvvm.ldu.global.f.f64.p1(ptr addrspace(1) %ptr, i32 8)
----------------
Hmm. I wonder why we end up with `u64` here and not `b64`. Not that it matters in this case, but it is a discrepancy vs. `f16`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145238/new/
https://reviews.llvm.org/D145238
More information about the cfe-commits
mailing list