[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

Matt Arsenault via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue May 30 03:19:17 PDT 2023


arsenm added inline comments.


================
Comment at: clang/include/clang/Driver/Options.td:1030
   NegFlag<SetFalse>>;
+def mprintf_kind_EQ : Joined<["-"], "mprintf-kind=">, Group<m_Group>,
+  HelpText<"Specify the printf lowering scheme (AMDGPU only), allowed values are "
----------------
I'm a bit worried this is introducing a side-ABI option not captured in the triple or at least module flags 


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:4695
+      // Device side compilation printf
+      if (Args.getLastArg(options::OPT_mprintf_kind_EQ))
+        CmdArgs.push_back(Args.MakeArgString(
----------------
Braces 


================
Comment at: clang/test/CodeGenHIP/printf_nonhostcall.cpp:64
+// CHECK-NEXT:    [[PRINTBUFFNEXTPTR4:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PRINTBUFFNEXTPTR3]], i64 [[TMP13]]
+// CHECK-NEXT:    [[TMP21:%.*]] = ptrtoint ptr [[TMP1]] to i64
+// CHECK-NEXT:    store i64 [[TMP21]], ptr addrspace(1) [[PRINTBUFFNEXTPTR4]], align 8
----------------
Can directly store the pointer, don't ptrtoint? Should have some other pointer address spaces tested, the spec is quiet on how %p is supposed to work with different sized pointers 


================
Comment at: clang/test/CodeGenHIP/printf_nonhostcall.cpp:228
+  return printf(s, 10);
+}
----------------
Need some vector tests, especially 3 x vectors 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:211
+struct StringData {
+  std::string Str = "";
+  bool isConst = true;
----------------
Don't need = ""


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:212
+  std::string Str = "";
+  bool isConst = true;
+  Value *RealSize = nullptr;
----------------
Move last 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:224
+static Value *callBufferedPrintfStart(
+    IRBuilder<> &Builder, ArrayRef<Value *> &Args, Value *Fmt,
+    bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
----------------
ArrayRef should be passed by value 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:235
+  else {
+    auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
----------------
arsenm wrote:
> No auto 
No auto 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:235-238
+    auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+    // Align the computed length to next 8 byte boundary
+    auto TempAdd = Builder.CreateAdd(
----------------
No auto 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:294
+  Type *Tys_alloc[1] = {Builder.getInt32Ty()};
+  Type *I8Ptr = Builder.getInt8PtrTy(1);
+  FunctionType *FTy_alloc = FunctionType::get(I8Ptr, Tys_alloc, false);
----------------
I suppose you can't access the AMDGPU enum from Utils. You could use DL.getDefaultGlobalsAddrSpace 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:381
+    } else {
+      if (Args[i]->getType()->isDoubleTy())
+        WhatToStore.push_back(Args[i]);
----------------
Don't see why isDoubleTy would be special cased here and not part of fitArgInto64Bits


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:395
+          Builder.getInt8Ty(), PtrToStore,
+          M->getDataLayout().getTypeStoreSize(toStore->getType()),
+          "PrintBuffNextPtr");
----------------
This should probably be getTypeAllocSize 


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:369
+        // to alignment padding is not populated with any specific value
+        // here, I feel this would be safe as long as runtime is sync with
+        // the offsets.
----------------
sameerds wrote:
> Avoid first person comments. 
You have better alignment info than Align(1)


================
Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:458
+    auto CreateControlDWord = M->getOrInsertFunction(
+        StringRef("__ockl_create_control_dword"), Builder.getInt32Ty(),
+        Builder.getInt32Ty(), Int1Ty, Int1Ty);
----------------
sameerds wrote:
> vikramRH wrote:
> > arsenm wrote:
> > > vikramRH wrote:
> > > > arsenm wrote:
> > > > > Do we really need another ockl control variable for this? Why isn't it a parameter? printf=stdout always 
> > > > There are certain HIP API's such as "hip_assert" that output to stderr. currently such API's are supported via hostcalls. Although this implementation does not currently support the API's ,its kept as an option. 
> > > Right but the way to handle that would be a parameter for where to output, not an externally set global 
> > I am not clear here, you expect additional inputs to device lib function ?
> @arsenm, this "control word" written into the buffer. In that sense, it is indeed a parameter passed from device to host as part of the printf packet. It is not yet another global variable.
I'm not following why this part requires introducing another call into device libs. It's expanding this not-really defined API. I'd expect this to be a trivial function we can expand inline here and write directly into the parameter buffer?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427



More information about the cfe-commits mailing list