[llvm] [NVPTX] Add TMA Bulk Copy Intrinsics (PR #138679)

Wed May 7 01:15:28 PDT 2025

================
@@ -2720,28 +2720,46 @@ void NVPTXDAGToDAGISel::SelectCpAsyncBulkTensorReduceCommon(SDNode *N,
   ReplaceNode(N, CurDAG->getMachineNode(Opcode, DL, N->getVTList(), Ops));
 }
 
-void NVPTXDAGToDAGISel::SelectCpAsyncBulkS2G(SDNode *N) {
+void NVPTXDAGToDAGISel::SelectCpAsyncBulkS2GCommon(SDNode *N, bool HasMask) {
----------------
durga4github wrote:

In general, the cache-hint pattern is common across a few families of TMA intrinsics. So, if there is an option to do this in table-gen, it can simplify a good chunk of the Cpp implementations.

Let us wait to hear from Artem,

https://github.com/llvm/llvm-project/pull/138679