[PATCH] D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70

Justin Lebar via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 11 09:40:51 PDT 2017


jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9726
+  case NVPTX::BI__hmma_m16n16k16_ld_c_f16:
+    case NVPTX::BI__hmma_m16n16k16_ld_c_f32:{
+    Address Dst = EmitPointerWithAlignment(E->getArg(0));
----------------
weird indentation?


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9733
+      return nullptr;
+    bool isColMajor = isColMajorArg.getZExtValue();
+    unsigned IID;
----------------
Urg, this isn't a bool?  Do we want it to be?


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9761
+
+    //auto EltTy = cast<PointerType>(Src->getType())->getElementType();
+    // Returned are 8 16x2 elements.
----------------
Accidentally left over?


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9762
+    //auto EltTy = cast<PointerType>(Src->getType())->getElementType();
+    // Returned are 8 16x2 elements.
+    for (unsigned i = 0; i < NumResults; ++i) {
----------------
s/8/NumElements/?
s/16/f16/?

Maybe it would be better to write it as "Return value has type [[f16 x 2] x NumResults]."?


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9784
+    unsigned IID;
+    unsigned NumResults = 8;
+    // PTX Instructions (and LLVM instrinsics) are defined for slice _d_, yet
----------------
Nit, at this point it's probably better to assign NumResults in each branch, since there are only two.  clang should make sure that we don't accidentally use it uninitialized.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9786
+    // PTX Instructions (and LLVM instrinsics) are defined for slice _d_, yet
+    // for some reason nvcc buildtins are using _c_.
+    switch(BuiltinID) {
----------------
s/are using/use/


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9800
+    }
+    Function * Intrinsic = CGM.getIntrinsic(IID);
+    llvm::Type *ParamType = Intrinsic->getFunctionType()->getParamType(1);
----------------
spacing.  (Probably just worth clang-formatting this and the other patch.)


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9802
+    llvm::Type *ParamType = Intrinsic->getFunctionType()->getParamType(1);
+    SmallVector<Value*, 10> Values;
+    Values.push_back(Builder.CreatePointerCast(Dst, VoidPtrTy));
----------------
Nit, we know that there won't ever be more than 8 elements...


https://reviews.llvm.org/D38742





More information about the cfe-commits mailing list