<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Hi Henry,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<font size="2"><span style="font-size:11pt">> Since the SPIR-V BE might not land in LLVM soon, we will set up the compilation flow<br>

>  to produce the SPIR-V binary by using the LLVM-SPIR-V translator [7]<br>

>  which is used in our experimental branch.</span></font></div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Can you provide more details regarding this? Do you plan to integrate the</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

translator as an external tool?</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Overall, there seem to be a huge overlap with what we need for OpenCL so it would</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

be good to make sure we are aligned and the new functionality is reusable for OpenCL</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

too.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Cheers,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Anastasia<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div id="appendonsend"></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> llvm-dev <llvm-dev-bounces@lists.llvm.org> on behalf of Henry Linjamäki via llvm-dev <llvm-dev@lists.llvm.org><br>

<b>Sent:</b> 09 August 2021 07:57<br>

<b>To:</b> cfe-dev@lists.llvm.org <cfe-dev@lists.llvm.org><br>

<b>Cc:</b> llvm-dev@lists.llvm.org <llvm-dev@lists.llvm.org>; yaxun.liu@amd.com <yaxun.liu@amd.com><br>

<b>Subject:</b> [llvm-dev] [RFC][HIPSPV] Emitting HIP device code as SPIR-V</font>

<div> </div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt">

<div class="PlainText">Hi all,<br>

<br>

HIP is a C++ Runtime API and kernel language that allows developers to<br>

create portable applications for AMD and NVIDIA GPUs from a single<br>

source code [0]. There are also projects for running HIP code on Intel<br>

GPU platforms via the Intel Level Zero API [1] called HIPLZ [3] and<br>

HIPCL [2], which runs HIP programs in OpenCL devices with certain<br>

advanced features supported. Both of these backends consume SPIR-V<br>

binaries.<br>

<br>

We are proposing a patch set to be upstreamed that enables SPIR-V<br>

emission through the HIP code path. The end goal of the patches to be<br>

submitted is to emit SPIR-V binaries from HIP device code so it can be<br>

embedded into executables for OpenCL-like environments (at least for<br>

starters). Our current focus is on the two above-mentioned projects,<br>

HIPCL and HIPLZ which are both work-in-progress HIP<br>

implementations. They itself do not consume SPIR-V, but the device<br>

binaries are handed over to the OpenCL and Intel Level Zero APIs,<br>

respectively.<br>

<br>

Coarsely, the current process of translating the HIP code to SPIR-V in<br>

LLVM/Clang involves:<br>

<br>

* Retargeting HIP device code generation to the SPIR-V target.<br>

* Mapping address spaces in HIP to corresponding ones in SPIR-V.<br>

* Expanding HIP features, which can not be directly modeled in SPIR-V<br>

  (e.g. dynamic shared memory).<br>

<br>

The HIPSPIRV experimental branch is available at [4]. Note that it is<br>

not yet in a state we intend to propose for upstreaming, but shaping<br>

up the patches is a work in progress. Before proceeding to shape up<br>

and submit the patches, we would like to get feedback for the plans we<br>

have for upstreaming. In the following sections, we open up the above<br>

points further and sketch our plans for changes to LLVM (mostly to the<br>

Clang tool) to achieve the goal.<br>

<br>

Retargeting device codegen<br>

==========================<br>

<br>

For making the HIP toolchain to emit and embed SPIR-V we are<br>

tentatively planning the following changes to the LLVM/Clang:<br>

<br>

* Introduce, at minimum, a 'spirv64' architecture type in Triple. This<br>

  is what the SPIR-V backend [5] (SPIR-V BE) effort is planning to<br>

  upstream. We would like to upstream this change in advance to<br>

  specify the HIP SPIR-V device code target, potentially before the<br>

  SPIR-V BE work lands.<br>

<br>

* Implement a new SPIRVTargetInfo and fill it with necessary<br>

  information. For HIPCL/-LZ we are planning to adjust the address<br>

  space mapping in a way which is discussed later in the ‘address<br>

  space mapping’ section.<br>

<br>

* Introduce a clang option to override the HIP device code target. We<br>

  are interested in the option ‘--offload=<target>’ discussed in the<br>

  'Unified offload option for CUDA/HIP/OpenMP'-thread [6]. This option<br>

  would suit this use case well. As far as we know, the subject has<br>

  not advanced further from the discussion - is anyone working on it?<br>

<br>

* Compilation driver:<br>

<br>

  HIP offload builder is changed to retrieve the offload device target<br>

  from the --offload option. If it is not present, it can fall back to<br>

  AMD's default target for avoiding changing the current default HIP<br>

  compilation behavior.<br>

<br>

  Temporarily change Driver to force clang to emit LLVM bitcode for<br>

  SPIR-V targets in the backend compilation phase. Otherwise, the<br>

  compilation will fail due to the lack of the real SPIR-V BE in many<br>

  parts of the code. Reworked HIPToolChain takes care of translating<br>

  the bitcode to SPIR-V during the linking phase. When the SPIR-V BE<br>

  lands in LLVM, we can revert this change.<br>

<br>

* Introduce ’hipspv’ as an OS or environment type in Triple. The<br>

  primary and the current use of the type is to select device offload<br>

  toolchain for HIPCL/-LZ.<br>

<br>

* Implement a new toolchain class 'HIPSPVToolChain' in clang which is<br>

  selected when the HIP device target is specified to be<br>

  ‘spirv64-unknown-hipspv’ with the --offload option. Since the SPIR-V<br>

  BE might not land in LLVM soon, we will set up the compilation flow<br>

  to produce the SPIR-V binary by using the LLVM-SPIR-V translator [7]<br>

  which is used in our experimental branch.<br>

<br>

  One important thing the toolchain does is to run one or several LLVM<br>

  IR passes, which are needed by the HIPCL/LZ runtime, on the final<br>

  fully linked device bitcode. The passes are required to be run<br>

  during link time - all user specified device code and HIPCL/LZ<br>

  device library routines have to be visible when the passes are<br>

  run. The reason for the requirement is explained in the 'HIP code<br>

  expansion' section. HIPSPVToolChain will use the opt tool for<br>

  running the passes at link time.<br>

<br>

* Currently, HIPToolChain is derived from ROCmToolchain and its long<br>

  chain of super classes (AMDGPUToolChain, Generic_ELF and<br>

  Generic_GCC). The new upstreamed target would not logically belong<br>

  under the AMDGPU/ROCm family so it does not make sense to derive the<br>

  HIPCL toolchain from the HIP toolchain. Therefore, we propose to:<br>

<br>

  - Create a new base HIP tool chain, 'BaseHIPToolChain' or just<br>

    'HIPToolChain', derived directly from ToolChain and put any<br>

    HIP-related code that is common or that can be reused in the<br>

    derived toolchains there.<br>

<br>

  - Derive a new HIPSPVToolChain from HIPToolChain.<br>

<br>

  - Rebase the HIPToolChain under the HIPToolChain and rename it to<br>

    HIPAMDToolChain. Since the current HIPToolChain depends on methods<br>

    in the super classes (e.g. AMDGPUToolChain’s getParsedTargetID)<br>

    the rebased class is planned to be a proxy class to avoid code<br>

    duplication and to reduce the amount of changes. Another option to<br>

    refactor the current HIPToolChain would be to use multiple<br>

    heritance but that leads to dreaded diamond class structure which<br>

    probably is not a great choice.<br>

<br>

  With the current plan, HIPToolChain is not going to have much code<br>

  to be shared with the derived classes - so far only a bit of the<br>

  “fat binary” construction code is in sight for sharing, so the<br>

  immediate gains for the effort seems small. However, The TC’s layout<br>

  is more logical and it may spark more HIP implementations, as well<br>

  as help refactoring when going forward.<br>

<br>

<br>

Address space mapping<br>

=====================<br>

<br>

Translating HIP device code to valid SPIR-V binary requires tweaks on<br>

pointers:<br>

<br>

Pointers without address space (AS) qualification in HIP programs are<br>

considered “flat” pointers - they can point to function local,<br>

__device__, __shared__ and __constant__ memory space dynamically,<br>

which matches the idea of ‘generic’ pointers introduced in OpenCL<br>

2.0. Therefore, the logical choice for the flat pointers is to map<br>

them to generic pointers of SPIR-V’s OpenCL environment. HIPCL’s and<br>

HIPLZ’s SPIR-V environment mandates that the kernel pointer parameters<br>

must point to __global, __local or __constant memory (these are named<br>

differently in SPIR-V; using OpenCL names as they are more<br>

familiar). So HIP pointer parameters in the HIP kernel (__global__)<br>

functions would be mapped to global pointers. Otherwise, HIP pointers<br>

with AS qualifiers are mapped to SPIR-V equivalent, if suitable.<br>

<br>

Now, there are significant differences between HIP’s __constant__ and<br>

SPIR-V/OpenCL’s constant address space:<br>

<br>

* In HIP, __constant__ globals can be altered on the host side with<br>

  the hipMemcpyToSymbol() API function. In the OpenCL’s host API you<br>

  cannot do this.<br>

<br>

  (Side-note: OpenCL host API does not have an equivalent method for<br>

  hipMemcpyToSymbol but HIPCL currently supports hipMemcpyToSymbol for<br>

  the global __global variables via Intel’s<br>

  clGetDeviceGlobalVariablePointerINTEL API extension, but we are<br>

  planning to inject shadow kernel commands that access the global<br>

  variables instead for portability.)<br>

<br>

* In HIP flat pointers can point to __constant__ memory. In OpenCL<br>

  this is not the case with __generic pointers, which means __constant<br>

  pointers cannot be casted to __generic pointers and vice versa.<br>

<br>

There are a couple ways to deal with constants:<br>

<br>

* Map __constant__ to __global space in SPIR-V. That way we can<br>

  generate code that works and is simple to implement. Of course, we<br>

  lose the optimization/placing benefits of constant memory.<br>

<br>

* Transform the code after clang codegen (by an LLVM pass) by<br>

  converting the __constant objects to kernel arguments. This covers<br>

  the hipMemcpyToSymbol() case. There is still the constant-to-generic<br>

  cast issue, so we would have to use the previous point as the<br>

  fallback.<br>

<br>

We plan to start by upstreaming the first option, and time permitting,<br>

improve by implementing the second option.<br>

<br>

The planned changes to Clang to achieve the aforementioned AS mapping<br>

are as follows:<br>

<br>

* Define address space mapping in the new, aforementioned<br>

  SPIRVTargetInfo to map CUDA address spaces (which the HIP reuses) to<br>

  do the mapping mentioned earlier. Default AS (0) used for the flat<br>

  pointers are mapped to the SPIR-V’s ‘generic’. We intend this<br>

  mapping being enabled when the language mode is HIP.<br>

<br>

* Change SPIRABIInfo to coerce kernel AS-unqualified pointer arguments<br>

  to __global ones. Pointer arguments in regular device functions<br>

  receive the __generic AS qualifier via the address space mapping<br>

  defined in SPIRVTargetInfo in the above point.<br>

<br>

<br>

HIP code expansion<br>

==================<br>

<br>

There are features in HIP language which do not have direct<br>

counterparts in SPIR-V’s OpenCL environment and those features need to<br>

be rewritten before translation to SPIR-V (in the future, lowering to<br>

SPIR-V machine code through the new BE). The non-exhaustive list of<br>

features that need to be expanded includes:<br>

<br>

* Dynamic shared memory allocation (DSM): It is an array which is<br>

  declared globally in LLVM IR and its actual size determined at<br>

  kernel launch. OpTypeRuntimeArray in SPIR-V is the closest thing to<br>

  model this object, alas, it requires shader capability.<br>

<br>

* abort() builtin: No counterpart in SPIR-V/OpenCL.<br>

  (Note: the behavior is not well specified in the HIP spec<br>

  either. Assuming it terminates the whole grid if any work item<br>

  reaches it. AMD’s abort definition calls __builtin_trap).<br>

<br>

* printf(): OpenCL’s printf takes the format string as ‘__constant__<br>

  char*’ while in HIP the format string does not have to reside in<br>

  constant memory.<br>

<br>

* Texture objects. These roughly correspond to image and sampler<br>

  objects of OpenCL combined. Also, texture objects carry more<br>

  information for the texture functions than image+sampler objects do.<br>

<br>

* Texture references. Same as above but these are program global<br>

  objects. In OpenCL, image objects cannot reside in the program<br>

  global space.<br>

<br>

HIPCL/-LZ’s solution to the DSM allocation case is that the runtime<br>

allocates a shared buffer and passes it to the kernel as an additional<br>

argument (which is hidden from the user). The device code is modified<br>

so that the DSM object is replaced with the new kernel<br>

argument. Various other cases listed will be handled similarly:<br>

<br>

* For the printf case we tentatively replace the printf calls with a<br>

  function that packs their arguments to an additional buffer passed<br>

  as additional kernel argument and do the printing on the host side.<br>

<br>

* Texture objects will be tentatively split to image and sampler<br>

  objects and possibly auxiliary struct to carry texture<br>

  settings. This means at least that the kernel parameter listing<br>

  needs to be rewritten for the Texture objects.<br>

<br>

* For the texture reference we tentatively planned replacing the<br>

  global texture objects also with a number of additional kernel<br>

  arguments.<br>

<br>

For this and other HIP features we need to apply LLVM IR passes to<br>

perform modifications on the device code. In many cases the passes<br>

should be run when the device code (as LLVM bitcode) is fully<br>

linked. This is simply achieved as the HIP offload mechanism already<br>

emits device code as LLVM bitcode in RDC mode (-fgpu-rdc), so during<br>

linking we do receive the device code as LLVM bitcode where to apply<br>

these expansions with full view of the device code.<br>

<br>

The current plan for implementing this is to make the HIPSPVToolChain<br>

to build a linker that uses llvm-link for linking device code, opt for<br>

running the IR passes needed and the external llvm-spirv tool (llc in<br>

the future when the SPIR-V BE lands) for emitting the SPIR-V<br>

binary. We load the passes from a path the user provides<br>

via --hip-link-pass-path (name pending) or automatically from HIP<br>

runtime’s installation location by using the search logic provided by<br>

ROCmInstallationDetector.<br>

<br>

There is interest in upstreaming the HIPCL/-LZ passes from the<br>

HIPCL/-LZ repositories in the future for reduced maintenance<br>

burden. However, we are not attempting to upstream them initially, as<br>

they are not yet completed and are subject to rapid changes. Question<br>

is: Where should the passes eventually be put in within the LLVM<br>

project tree? Could it be OK to add a new directory under Clang for<br>

tool chain passes?<br>

<br>

<br>

Testing<br>

=======<br>

<br>

We will provide llvm-lit tests for our toolchain in the upstream. We<br>

also want to add tests to make sure clang who will run the HIPCL/-LZ<br>

runtime passes get run at device code link time. For this we need a<br>

dummy pass plugin that the clang loads during the test.<br>

<br>

When the new LLVM SPIR-V BE work lands on LLVM, we will add SPIR-V<br>

assembly checks that are relevant for HIPSPV.<br>

<br>

<br>

References<br>

==========<br>

<br>

[0]: <a href="https://rocmdocs.amd.com/en/latest/Programming_Guides/Programming-Guides.html">

https://rocmdocs.amd.com/en/latest/Programming_Guides/Programming-Guides.html</a><br>

[1]: <a href="https://spec.oneapi.com/level-zero/latest/index.html">https://spec.oneapi.com/level-zero/latest/index.html</a><br>

[2]: <a href="https://github.com/cpc/hipcl">https://github.com/cpc/hipcl</a><br>

[3]: <a href="https://github.com/jz10/anl-gt-gpu">https://github.com/jz10/anl-gt-gpu</a><br>

[4]: <a href="https://github.com/parmance/llvm-project/tree/hip2spirv-v5">https://github.com/parmance/llvm-project/tree/hip2spirv-v5</a><br>

[5]: <a href="https://github.com/KhronosGroup/LLVM-SPIRV-Backend">https://github.com/KhronosGroup/LLVM-SPIRV-Backend</a><br>

[6]: <a href="https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html">

https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html</a><br>

[7]: <a href="https://github.com/KhronosGroup/SPIRV-LLVM-Translator">https://github.com/KhronosGroup/SPIRV-LLVM-Translator</a><br>

_______________________________________________<br>

LLVM Developers mailing list<br>

llvm-dev@lists.llvm.org<br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</div>

</span></font></div>

</body>

</html>