[Openmp-commits] [PATCH] D139287: [WIP][OpenMP] Introduce basic JIT support to OpenMP target offloading

Mon Dec 5 07:41:56 PST 2022

jhuber6 added a comment.

In D139287#3971024 <https://reviews.llvm.org/D139287#3971024>, @tianshilei1992 wrote:

> In D139287#3970996 <https://reviews.llvm.org/D139287#3970996>, @jhuber6 wrote:
>
>> Why do we have the JIT in the nextgen plugins? I figured that JIT would be handled by `libomptarget` proper rather than the plugins. I guess this is needed for per-kernel specialization? My idea of the rough pseudocode would be like this and we wouldn't need a complex class heirarchy. Also I don't know if we can skip `ptxas` by giving CUDA the ptx directly, we probably will need to invoke `lld` on the command line however right.
>>
>>   for each image:
>>     if image is bitcode
>>       image = compile(image)
>>    register(image)
>
> We could handle them in `libomptarget`, but that's gonna require we add another two interface functions: `is_valid_bitcode_image`, and `compile_bitcode_image`. It is doable. Handling them in plugin as a separate module can just reuse the two existing interfaces.

Would we need to consult the plugin? We can just check the `magic` directly, if it's bitcode we just compile it for its triple. If this was wrong then when the plugin gets the compiled image it will error.

>> Also I don't know if we can skip `ptxas` by giving CUDA the ptx directly, we probably will need to invoke `lld` on the command line however right.
>>
>>   for each image:
>>     if image is bitcode
>>       image = compile(image)
>>    register(image)
>
> We can give CUDA PTX directly, since the CUDA JIT is to just call `ptxas` instead of `ptxas -c`, which requires `nvlink` afterwards.

That makes it easier for us, so the only command line tool we need to call is `lld` for AMDGPU.

================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp:184
+
+  auto AddStream =
+      [&](size_t Task,
----------------
tianshilei1992 wrote:
> jhuber6 wrote:
> > tianshilei1992 wrote:
> > > Is there any way that we don't write it to a file here?
> > Why do we need to invoke LTO here? I figured that we could call the backend directly since we have no need to actually link any filies, and we may not have a need to run more expensive optimizations when the bitcode is already optimized. If you do that then you should be able to just use a `raw_svector_ostream` as your output stream and get the compiled output written to that buffer.
> For the purpose of this basic JIT support, we indeed just need backend. However, since we have the plan for super optimization, etc., having an optimization pipeline here is also useful.
We should be able to configure our own optimization pipeline in that case, we might want the extra control as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139287/new/

https://reviews.llvm.org/D139287