[PATCH] D30057: [NVPTX] Added support for v2f16 operations.

Wed Feb 22 17:22:59 PST 2017

tra marked 5 inline comments as done.
tra added inline comments.

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:869
+      // using two f16 elements packed into an i32.
+      fromTypeWidth *= 2;
+      if (NumElts == 2)
----------------
jlebar wrote:
> I'm confused how this works without additional changes elsewhere.  We increase fromTypeWidth, but this doesn't affect the selection of Opcode below.  So how do we know that this is a packed-i32 load rather than a regular f16 load?
fromTypeWidth=16, vecType = NVPTX::PTXLdStInstCode::V2 would normally result in `ld.v2.b16`. Unfortunately, you can't use it to load v2f16, because it does not accept 32-bit registers that we use for v2f16. This code changes load width to 32 (which now can load v2f16 bits into a 32-bit register) and reduces number of elements to compensate for it -- we emit `ld.b32`.

llvm keeps splitting vectors until it ends up with a legal register type. If a vector of f16 has even number of elements, it's split into chunks of v2f16, otherwise it gets split into scalar f16 elements.

It also appears that vector types don't get passed to this function as they are either handled by tryLoadVector() or split by legalizer into scalars. This patch introduces a load of v2f16 element which will be the only vector to be handled here.

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:2410
+    }
+  }
+
----------------
jlebar wrote:
> Should this be unified with the similar code above?
It turns out that most of this code wasn't necessary.  All vector types (except v2f16) are either handled in tryStoreVector() or scalarized. I've removed unneeded code.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2333
+            // here and then bitcast it back.
+            // TODO: consider switching to getMemIntrinsicNode() which can
+            // deal with v2f16 better.
----------------
jlebar wrote:
> , which
TODO discarded. getLoad/getStore are the right way to create loads/stores as they can handle unaligned loads/stores correctly, which getMemIntrinsicNode() does not.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2856
+    return N->getSExtValue() == 1;
+}]>;
+
----------------
jlebar wrote:
> Could we give these slightly louder names, maybe ImmediateZero and ImmediateOne or immZero/immOne?  Otherwise it's not really clear that "i1imm" and "imm0" are completely different things -- they look too similar.
I can just use 0/1 in the instruction pattern below without wrapping them into a predicate.
I've moved these predicates and simplified subvector extraction instructions below.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2873
+                             [(set Float16Regs:$dst,
+                               (extractelt (v2f16 Float16x2Regs:$src), imm1:$c))]>;
+
----------------
jlebar wrote:
> What if the vector element index is not an immediate?  Do we just fail to select?  I'd think we should be able to handle that.
Yes. We did fail to select. I've added custom lowering for that case, so it works now. I'm not sure if it's needed, though. 

https://reviews.llvm.org/D30057