<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Henry,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<font size="2"><span style="font-size:11pt">> Since the SPIR-V BE might not land in LLVM soon, we will set up the compilation flow<br>
> to produce the SPIR-V binary by using the LLVM-SPIR-V translator [7]<br>
> which is used in our experimental branch.</span></font></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Can you provide more details regarding this? Do you plan to integrate the</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
translator as an external tool?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Overall, there seem to be a huge overlap with what we need for OpenCL so it would</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
be good to make sure we are aligned and the new functionality is reusable for OpenCL</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
too.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Cheers,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Anastasia<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="appendonsend"></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> llvm-dev <llvm-dev-bounces@lists.llvm.org> on behalf of Henry Linjamäki via llvm-dev <llvm-dev@lists.llvm.org><br>
<b>Sent:</b> 09 August 2021 07:57<br>
<b>To:</b> cfe-dev@lists.llvm.org <cfe-dev@lists.llvm.org><br>
<b>Cc:</b> llvm-dev@lists.llvm.org <llvm-dev@lists.llvm.org>; yaxun.liu@amd.com <yaxun.liu@amd.com><br>
<b>Subject:</b> [llvm-dev] [RFC][HIPSPV] Emitting HIP device code as SPIR-V</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText">Hi all,<br>
<br>
HIP is a C++ Runtime API and kernel language that allows developers to<br>
create portable applications for AMD and NVIDIA GPUs from a single<br>
source code [0]. There are also projects for running HIP code on Intel<br>
GPU platforms via the Intel Level Zero API [1] called HIPLZ [3] and<br>
HIPCL [2], which runs HIP programs in OpenCL devices with certain<br>
advanced features supported. Both of these backends consume SPIR-V<br>
binaries.<br>
<br>
We are proposing a patch set to be upstreamed that enables SPIR-V<br>
emission through the HIP code path. The end goal of the patches to be<br>
submitted is to emit SPIR-V binaries from HIP device code so it can be<br>
embedded into executables for OpenCL-like environments (at least for<br>
starters). Our current focus is on the two above-mentioned projects,<br>
HIPCL and HIPLZ which are both work-in-progress HIP<br>
implementations. They itself do not consume SPIR-V, but the device<br>
binaries are handed over to the OpenCL and Intel Level Zero APIs,<br>
respectively.<br>
<br>
Coarsely, the current process of translating the HIP code to SPIR-V in<br>
LLVM/Clang involves:<br>
<br>
* Retargeting HIP device code generation to the SPIR-V target.<br>
* Mapping address spaces in HIP to corresponding ones in SPIR-V.<br>
* Expanding HIP features, which can not be directly modeled in SPIR-V<br>
(e.g. dynamic shared memory).<br>
<br>
The HIPSPIRV experimental branch is available at [4]. Note that it is<br>
not yet in a state we intend to propose for upstreaming, but shaping<br>
up the patches is a work in progress. Before proceeding to shape up<br>
and submit the patches, we would like to get feedback for the plans we<br>
have for upstreaming. In the following sections, we open up the above<br>
points further and sketch our plans for changes to LLVM (mostly to the<br>
Clang tool) to achieve the goal.<br>
<br>
Retargeting device codegen<br>
==========================<br>
<br>
For making the HIP toolchain to emit and embed SPIR-V we are<br>
tentatively planning the following changes to the LLVM/Clang:<br>
<br>
* Introduce, at minimum, a 'spirv64' architecture type in Triple. This<br>
is what the SPIR-V backend [5] (SPIR-V BE) effort is planning to<br>
upstream. We would like to upstream this change in advance to<br>
specify the HIP SPIR-V device code target, potentially before the<br>
SPIR-V BE work lands.<br>
<br>
* Implement a new SPIRVTargetInfo and fill it with necessary<br>
information. For HIPCL/-LZ we are planning to adjust the address<br>
space mapping in a way which is discussed later in the ‘address<br>
space mapping’ section.<br>
<br>
* Introduce a clang option to override the HIP device code target. We<br>
are interested in the option ‘--offload=<target>’ discussed in the<br>
'Unified offload option for CUDA/HIP/OpenMP'-thread [6]. This option<br>
would suit this use case well. As far as we know, the subject has<br>
not advanced further from the discussion - is anyone working on it?<br>
<br>
* Compilation driver:<br>
<br>
HIP offload builder is changed to retrieve the offload device target<br>
from the --offload option. If it is not present, it can fall back to<br>
AMD's default target for avoiding changing the current default HIP<br>
compilation behavior.<br>
<br>
Temporarily change Driver to force clang to emit LLVM bitcode for<br>
SPIR-V targets in the backend compilation phase. Otherwise, the<br>
compilation will fail due to the lack of the real SPIR-V BE in many<br>
parts of the code. Reworked HIPToolChain takes care of translating<br>
the bitcode to SPIR-V during the linking phase. When the SPIR-V BE<br>
lands in LLVM, we can revert this change.<br>
<br>
* Introduce ’hipspv’ as an OS or environment type in Triple. The<br>
primary and the current use of the type is to select device offload<br>
toolchain for HIPCL/-LZ.<br>
<br>
* Implement a new toolchain class 'HIPSPVToolChain' in clang which is<br>
selected when the HIP device target is specified to be<br>
‘spirv64-unknown-hipspv’ with the --offload option. Since the SPIR-V<br>
BE might not land in LLVM soon, we will set up the compilation flow<br>
to produce the SPIR-V binary by using the LLVM-SPIR-V translator [7]<br>
which is used in our experimental branch.<br>
<br>
One important thing the toolchain does is to run one or several LLVM<br>
IR passes, which are needed by the HIPCL/LZ runtime, on the final<br>
fully linked device bitcode. The passes are required to be run<br>
during link time - all user specified device code and HIPCL/LZ<br>
device library routines have to be visible when the passes are<br>
run. The reason for the requirement is explained in the 'HIP code<br>
expansion' section. HIPSPVToolChain will use the opt tool for<br>
running the passes at link time.<br>
<br>
* Currently, HIPToolChain is derived from ROCmToolchain and its long<br>
chain of super classes (AMDGPUToolChain, Generic_ELF and<br>
Generic_GCC). The new upstreamed target would not logically belong<br>
under the AMDGPU/ROCm family so it does not make sense to derive the<br>
HIPCL toolchain from the HIP toolchain. Therefore, we propose to:<br>
<br>
- Create a new base HIP tool chain, 'BaseHIPToolChain' or just<br>
'HIPToolChain', derived directly from ToolChain and put any<br>
HIP-related code that is common or that can be reused in the<br>
derived toolchains there.<br>
<br>
- Derive a new HIPSPVToolChain from HIPToolChain.<br>
<br>
- Rebase the HIPToolChain under the HIPToolChain and rename it to<br>
HIPAMDToolChain. Since the current HIPToolChain depends on methods<br>
in the super classes (e.g. AMDGPUToolChain’s getParsedTargetID)<br>
the rebased class is planned to be a proxy class to avoid code<br>
duplication and to reduce the amount of changes. Another option to<br>
refactor the current HIPToolChain would be to use multiple<br>
heritance but that leads to dreaded diamond class structure which<br>
probably is not a great choice.<br>
<br>
With the current plan, HIPToolChain is not going to have much code<br>
to be shared with the derived classes - so far only a bit of the<br>
“fat binary” construction code is in sight for sharing, so the<br>
immediate gains for the effort seems small. However, The TC’s layout<br>
is more logical and it may spark more HIP implementations, as well<br>
as help refactoring when going forward.<br>
<br>
<br>
Address space mapping<br>
=====================<br>
<br>
Translating HIP device code to valid SPIR-V binary requires tweaks on<br>
pointers:<br>
<br>
Pointers without address space (AS) qualification in HIP programs are<br>
considered “flat” pointers - they can point to function local,<br>
__device__, __shared__ and __constant__ memory space dynamically,<br>
which matches the idea of ‘generic’ pointers introduced in OpenCL<br>
2.0. Therefore, the logical choice for the flat pointers is to map<br>
them to generic pointers of SPIR-V’s OpenCL environment. HIPCL’s and<br>
HIPLZ’s SPIR-V environment mandates that the kernel pointer parameters<br>
must point to __global, __local or __constant memory (these are named<br>
differently in SPIR-V; using OpenCL names as they are more<br>
familiar). So HIP pointer parameters in the HIP kernel (__global__)<br>
functions would be mapped to global pointers. Otherwise, HIP pointers<br>
with AS qualifiers are mapped to SPIR-V equivalent, if suitable.<br>
<br>
Now, there are significant differences between HIP’s __constant__ and<br>
SPIR-V/OpenCL’s constant address space:<br>
<br>
* In HIP, __constant__ globals can be altered on the host side with<br>
the hipMemcpyToSymbol() API function. In the OpenCL’s host API you<br>
cannot do this.<br>
<br>
(Side-note: OpenCL host API does not have an equivalent method for<br>
hipMemcpyToSymbol but HIPCL currently supports hipMemcpyToSymbol for<br>
the global __global variables via Intel’s<br>
clGetDeviceGlobalVariablePointerINTEL API extension, but we are<br>
planning to inject shadow kernel commands that access the global<br>
variables instead for portability.)<br>
<br>
* In HIP flat pointers can point to __constant__ memory. In OpenCL<br>
this is not the case with __generic pointers, which means __constant<br>
pointers cannot be casted to __generic pointers and vice versa.<br>
<br>
There are a couple ways to deal with constants:<br>
<br>
* Map __constant__ to __global space in SPIR-V. That way we can<br>
generate code that works and is simple to implement. Of course, we<br>
lose the optimization/placing benefits of constant memory.<br>
<br>
* Transform the code after clang codegen (by an LLVM pass) by<br>
converting the __constant objects to kernel arguments. This covers<br>
the hipMemcpyToSymbol() case. There is still the constant-to-generic<br>
cast issue, so we would have to use the previous point as the<br>
fallback.<br>
<br>
We plan to start by upstreaming the first option, and time permitting,<br>
improve by implementing the second option.<br>
<br>
The planned changes to Clang to achieve the aforementioned AS mapping<br>
are as follows:<br>
<br>
* Define address space mapping in the new, aforementioned<br>
SPIRVTargetInfo to map CUDA address spaces (which the HIP reuses) to<br>
do the mapping mentioned earlier. Default AS (0) used for the flat<br>
pointers are mapped to the SPIR-V’s ‘generic’. We intend this<br>
mapping being enabled when the language mode is HIP.<br>
<br>
* Change SPIRABIInfo to coerce kernel AS-unqualified pointer arguments<br>
to __global ones. Pointer arguments in regular device functions<br>
receive the __generic AS qualifier via the address space mapping<br>
defined in SPIRVTargetInfo in the above point.<br>
<br>
<br>
HIP code expansion<br>
==================<br>
<br>
There are features in HIP language which do not have direct<br>
counterparts in SPIR-V’s OpenCL environment and those features need to<br>
be rewritten before translation to SPIR-V (in the future, lowering to<br>
SPIR-V machine code through the new BE). The non-exhaustive list of<br>
features that need to be expanded includes:<br>
<br>
* Dynamic shared memory allocation (DSM): It is an array which is<br>
declared globally in LLVM IR and its actual size determined at<br>
kernel launch. OpTypeRuntimeArray in SPIR-V is the closest thing to<br>
model this object, alas, it requires shader capability.<br>
<br>
* abort() builtin: No counterpart in SPIR-V/OpenCL.<br>
(Note: the behavior is not well specified in the HIP spec<br>
either. Assuming it terminates the whole grid if any work item<br>
reaches it. AMD’s abort definition calls __builtin_trap).<br>
<br>
* printf(): OpenCL’s printf takes the format string as ‘__constant__<br>
char*’ while in HIP the format string does not have to reside in<br>
constant memory.<br>
<br>
* Texture objects. These roughly correspond to image and sampler<br>
objects of OpenCL combined. Also, texture objects carry more<br>
information for the texture functions than image+sampler objects do.<br>
<br>
* Texture references. Same as above but these are program global<br>
objects. In OpenCL, image objects cannot reside in the program<br>
global space.<br>
<br>
HIPCL/-LZ’s solution to the DSM allocation case is that the runtime<br>
allocates a shared buffer and passes it to the kernel as an additional<br>
argument (which is hidden from the user). The device code is modified<br>
so that the DSM object is replaced with the new kernel<br>
argument. Various other cases listed will be handled similarly:<br>
<br>
* For the printf case we tentatively replace the printf calls with a<br>
function that packs their arguments to an additional buffer passed<br>
as additional kernel argument and do the printing on the host side.<br>
<br>
* Texture objects will be tentatively split to image and sampler<br>
objects and possibly auxiliary struct to carry texture<br>
settings. This means at least that the kernel parameter listing<br>
needs to be rewritten for the Texture objects.<br>
<br>
* For the texture reference we tentatively planned replacing the<br>
global texture objects also with a number of additional kernel<br>
arguments.<br>
<br>
For this and other HIP features we need to apply LLVM IR passes to<br>
perform modifications on the device code. In many cases the passes<br>
should be run when the device code (as LLVM bitcode) is fully<br>
linked. This is simply achieved as the HIP offload mechanism already<br>
emits device code as LLVM bitcode in RDC mode (-fgpu-rdc), so during<br>
linking we do receive the device code as LLVM bitcode where to apply<br>
these expansions with full view of the device code.<br>
<br>
The current plan for implementing this is to make the HIPSPVToolChain<br>
to build a linker that uses llvm-link for linking device code, opt for<br>
running the IR passes needed and the external llvm-spirv tool (llc in<br>
the future when the SPIR-V BE lands) for emitting the SPIR-V<br>
binary. We load the passes from a path the user provides<br>
via --hip-link-pass-path (name pending) or automatically from HIP<br>
runtime’s installation location by using the search logic provided by<br>
ROCmInstallationDetector.<br>
<br>
There is interest in upstreaming the HIPCL/-LZ passes from the<br>
HIPCL/-LZ repositories in the future for reduced maintenance<br>
burden. However, we are not attempting to upstream them initially, as<br>
they are not yet completed and are subject to rapid changes. Question<br>
is: Where should the passes eventually be put in within the LLVM<br>
project tree? Could it be OK to add a new directory under Clang for<br>
tool chain passes?<br>
<br>
<br>
Testing<br>
=======<br>
<br>
We will provide llvm-lit tests for our toolchain in the upstream. We<br>
also want to add tests to make sure clang who will run the HIPCL/-LZ<br>
runtime passes get run at device code link time. For this we need a<br>
dummy pass plugin that the clang loads during the test.<br>
<br>
When the new LLVM SPIR-V BE work lands on LLVM, we will add SPIR-V<br>
assembly checks that are relevant for HIPSPV.<br>
<br>
<br>
References<br>
==========<br>
<br>
[0]: <a href="https://rocmdocs.amd.com/en/latest/Programming_Guides/Programming-Guides.html">
https://rocmdocs.amd.com/en/latest/Programming_Guides/Programming-Guides.html</a><br>
[1]: <a href="https://spec.oneapi.com/level-zero/latest/index.html">https://spec.oneapi.com/level-zero/latest/index.html</a><br>
[2]: <a href="https://github.com/cpc/hipcl">https://github.com/cpc/hipcl</a><br>
[3]: <a href="https://github.com/jz10/anl-gt-gpu">https://github.com/jz10/anl-gt-gpu</a><br>
[4]: <a href="https://github.com/parmance/llvm-project/tree/hip2spirv-v5">https://github.com/parmance/llvm-project/tree/hip2spirv-v5</a><br>
[5]: <a href="https://github.com/KhronosGroup/LLVM-SPIRV-Backend">https://github.com/KhronosGroup/LLVM-SPIRV-Backend</a><br>
[6]: <a href="https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html">
https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html</a><br>
[7]: <a href="https://github.com/KhronosGroup/SPIRV-LLVM-Translator">https://github.com/KhronosGroup/SPIRV-LLVM-Translator</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
llvm-dev@lists.llvm.org<br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</div>
</span></font></div>
</body>
</html>