[all-commits] [llvm/llvm-project] 86888e: [mlir][sparse][gpu] generate proper memcpy in/out ...

Fri Apr 21 09:30:59 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 86888e420c41ebb07fa1a8818ea9af218b015fe3
      https://github.com/llvm/llvm-project/commit/86888e420c41ebb07fa1a8818ea9af218b015fe3
  Author: Aart Bik <ajcbik at google.com>
  Date:   2023-04-21 (Fri, 21 Apr 2023)

  Changed paths:
    M mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
    M mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir
    M mlir/test/Dialect/SparseTensor/GPU/gpu_matmul.mlir
    M mlir/test/Dialect/SparseTensor/GPU/gpu_matvec.mlir
    A mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matvec-const.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-mma-2-4-f16.mlir

  Log Message:
  -----------
  [mlir][sparse][gpu] generate proper memcpy in/out host and device

The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148682