[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)

Tue Apr 22 15:24:02 PDT 2025

================
@@ -1274,17 +1292,39 @@ bool NVPTXDAGToDAGISel::tryLDGLDU(SDNode *N) {
   EVT OrigType = N->getValueType(0);
   EVT EltVT = Mem->getMemoryVT();
   unsigned NumElts = 1;
+
+  std::optional<unsigned> Opcode;
+
   if (EltVT.isVector()) {
     NumElts = EltVT.getVectorNumElements();
     EltVT = EltVT.getVectorElementType();
     // vectors of 8/16bits type are loaded/stored as multiples of v4i8/v2x16
     // elements.
-    if ((EltVT == MVT::f16 && OrigType == MVT::v2f16) ||
+    if ((EltVT == MVT::f32 && OrigType == MVT::v2f32) ||
+        (EltVT == MVT::f16 && OrigType == MVT::v2f16) ||
         (EltVT == MVT::bf16 && OrigType == MVT::v2bf16) ||
         (EltVT == MVT::i16 && OrigType == MVT::v2i16) ||
         (EltVT == MVT::i8 && OrigType == MVT::v4i8)) {
       assert(NumElts % OrigType.getVectorNumElements() == 0 &&
              "NumElts must be divisible by the number of elts in subvectors");
+      if (N->getOpcode() == ISD::LOAD ||
+          N->getOpcode() == ISD::INTRINSIC_W_CHAIN) {
----------------
AlexMaclean wrote:

Why special case these? Can we just use the untyped variant in all cases?

https://github.com/llvm/llvm-project/pull/126337