[PATCH] D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 11 10:13:26 PDT 2017


tra added inline comments.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9726
+  case NVPTX::BI__hmma_m16n16k16_ld_c_f16:
+    case NVPTX::BI__hmma_m16n16k16_ld_c_f32:{
+    Address Dst = EmitPointerWithAlignment(E->getArg(0));
----------------
jlebar wrote:
> weird indentation?
My emacs and clang-format keep fighting case indentation... Fixed.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9733
+      return nullptr;
+    bool isColMajor = isColMajorArg.getZExtValue();
+    unsigned IID;
----------------
jlebar wrote:
> Urg, this isn't a bool?  Do we want it to be?
There are no explicit declarations for these builtins in CUDA headers. Callers of these builtins pass 0/1 and corresponding intrinsic described in [[ http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#nvvm-intrin-warp-level-matrix-ld | NVVM-IR spec ]] shows the argument type as i32, so I've made the type integer in clang. 




================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9762
+    //auto EltTy = cast<PointerType>(Src->getType())->getElementType();
+    // Returned are 8 16x2 elements.
+    for (unsigned i = 0; i < NumResults; ++i) {
----------------
jlebar wrote:
> s/8/NumElements/?
> s/16/f16/?
> 
> Maybe it would be better to write it as "Return value has type [[f16 x 2] x NumResults]."?
That was part of the leftover block. Particular types are irrelevant here. All we care is to store whatever intrinsic call returned ([4 or 8 elements of v2f16 or float] ) in the destination array (which is int[] ). 


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9802
+    llvm::Type *ParamType = Intrinsic->getFunctionType()->getParamType(1);
+    SmallVector<Value*, 10> Values;
+    Values.push_back(Builder.CreatePointerCast(Dst, VoidPtrTy));
----------------
jlebar wrote:
> Nit, we know that there won't ever be more than 8 elements...
We have two extra arguments -- destination buffer and stride.


https://reviews.llvm.org/D38742





More information about the cfe-commits mailing list