[Openmp-commits] [PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

Wed Aug 4 10:20:27 PDT 2021

tra added a comment.

In D106960#2925610 <https://reviews.llvm.org/D106960#2925610>, @ye-luo wrote:

> my second GPU is NVIDIA 3060Ti (sm_86)
> I build my app daily with -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80.
>
> About sm_80 binary able ot run on sm_86
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#application-compatibility-on-ampere

Keep in mind that the binaries compiled for sm_80 will likely run a lot slower on sm_86. sm_86 has distinctly different hardware and the code generated for sm_80 will be sub-optimal for it.
I don't have the Ampere cards to compare, but sm_70 binaries running on sm_75 were reached only about 1/2 of the speed of the same code compiled for sm_75 when it was operating on fp16.

NVIDIA didn't provide performance tuning guide for Ampere, but here's what it had to say about Volta/Turing:
https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#tensor-operations

> Any binary compiled for Volta will run on Turing, but Volta binaries using Tensor Cores will only be able to reach half of Turing's Tensor Core peak performance. 
> Recompiling the binary specifically for Turing would allow it to reach the peak performance.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960