[all-commits] [llvm/llvm-project] a014fb: [OpenMP] Improve D2D memcpy to use more efficient ...

Shilei Tian via All-commits all-commits at lists.llvm.org
Thu Jun 4 13:59:23 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: a014fbbc219fc8e1dbce382fd6f9280c3b720219
      https://github.com/llvm/llvm-project/commit/a014fbbc219fc8e1dbce382fd6f9280c3b720219
  Author: Shilei Tian <tianshilei1992 at gmail.com>
  Date:   2020-06-04 (Thu, 04 Jun 2020)

  Changed paths:
    M openmp/libomptarget/include/omptargetplugin.h
    M openmp/libomptarget/plugins/cuda/src/rtl.cpp
    M openmp/libomptarget/plugins/exports
    M openmp/libomptarget/src/api.cpp
    M openmp/libomptarget/src/device.cpp
    M openmp/libomptarget/src/device.h
    M openmp/libomptarget/src/rtl.cpp
    M openmp/libomptarget/src/rtl.h
    A openmp/libomptarget/test/offloading/d2d_memcpy.c

  Log Message:
  -----------
  [OpenMP] Improve D2D memcpy to use more efficient driver API

Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:
- Same devices: this is the D2D memcpy in the CUDA context.
- Different devices: this is the PeerToPeer memcpy in the CUDA context.
My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649




More information about the All-commits mailing list