[PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

Thu Jan 28 11:50:15 PST 2016

rnk added inline comments.

================
Comment at: lib/CodeGen/CGCUDABuiltin.cpp:105-108
@@ -104,2 +104,6 @@
   } else {
-    BufferPtr = Builder.Insert(new llvm::AllocaInst(
+    // Insert our alloca not into the current BB, but into the function's entry
+    // block.  This is important because nvvm doesn't support alloca -- if we
+    // put the alloca anywhere else, llvm may eventually output
+    // stacksave/stackrestore intrinsics, which cause our nvvm backend to choke.
+    auto *Alloca = new llvm::AllocaInst(
----------------
The fact that allocas for local variables should always go in the entry block is pretty widespread cultural knowledge in LLVM and clang. Most readers aren't going to need this comment, unless you expect that people working on CUDA won't have that background. Plus, if you use CreateTempAlloca, there won't be any question about which insert point should be used.

================
Comment at: lib/CodeGen/CGCUDABuiltin.cpp:109
@@ -106,1 +108,3 @@
+    // stacksave/stackrestore intrinsics, which cause our nvvm backend to choke.
+    auto *Alloca = new llvm::AllocaInst(
         llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),
----------------
You can still use CreateTempAlloca by making an `[i8 x N]` LLVM type. You'll have to use CreateStructGEP below for forming GEPs. Overall I think that'd be nicer, since you don't need to worry about insertion at all.


http://reviews.llvm.org/D16664