[all-commits] [llvm/llvm-project] fa7f0e: [NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226)

Abhilash Majumder via All-commits all-commits at lists.llvm.org
Thu Jan 23 03:20:06 PST 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: fa7f0e582bc25a91d89dab7c488a1619060f9bef
      https://github.com/llvm/llvm-project/commit/fa7f0e582bc25a91d89dab7c488a1619060f9bef
  Author: Abhilash Majumder <30946547+abhilash1910 at users.noreply.github.com>
  Date:   2025-01-23 (Thu, 23 Jan 2025)

  Changed paths:
    M llvm/docs/NVPTXUsage.rst
    M llvm/include/llvm/IR/IntrinsicsNVVM.td
    M llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
    M llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
    M llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
    M llvm/test/CodeGen/NVPTX/cp-async-bulk.ll

  Log Message:
  -----------
  [NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226)

This patch adds NVVM intrinsics and NVPTX codegen for:

- cp.async.bulk.prefetch.L2.* variants 
- These intrinsics optionally support cache_hints as indicated by the
   boolean flag argument.
- Lit tests are added for all combinations of these intrinsics in
   cp-async-bulk.ll.
- The generated PTX is verified with a 12.3 ptxas executable.
- Added docs for these intrinsics in NVPTXUsage.rst file.

PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch


Co-authored-by: abmajumder <abmajumder at nvidia.com>



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list