[PATCH] D145815: [Flang][Driver] Add support for fopenmp-is-device and fembed-offload-object to Flang ToolChain

Thu Mar 16 11:41:51 PDT 2023

jhuber6 added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Flang.cpp:128
+  if (IsHostOffloadingAction) {
+    for (size_t i = 1; i < Inputs.size(); ++i) {
+      if (Inputs[i].getType() != types::TY_Nothing)
----------------
agozillon wrote:
> jhuber6 wrote:
> > agozillon wrote:
> > > awarzynski wrote:
> > > > agozillon wrote:
> > > > > awarzynski wrote:
> > > > > > What's the magic "1"? And given that the input count matters here - is there a test with multiple inputs?
> > > > > It aims to mimic the behavior of Clang: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L4561 where the main input is skipped (the input currently being compiled or embedded into etc.), when adding to //-fembed-offload-object//. 
> > > > > 
> > > > > It does look different to Clang's as Clang has more cases and the logic is spread across the constructJob invocation, but the first if case is what the if statement inside of the loop and setting the loop index variable to 1 do. The HostOffloadingInputs array is what is being generated here, except we're skipping and directly applying it as arguments.
> > > > > 
> > > > > I tried to condense it a little in this case! Perhaps it loses readability though, I had hoped the comment might have kept it clear
> > > > Thanks for the link - that code in Clang doesn't really clarify what makes `Inputs[0]` special 🤔 . 
> > > > 
> > > > Let me rephrase my question - what's so special about the first input? (referred to in Clang as "main input") Is that something specific to OpenMP? For example, in this case:
> > > > ```
> > > > flang-new  -fopenmp  file.f90
> > > > ```
> > > > I assume that `inputs[0]` is "file.f90", so nothing will happen?
> > > > 
> > > > > I tried to condense it a little in this case! Perhaps it loses readability though, I had hoped the comment might have kept it clear
> > > > 
> > > > Nah, I think that your implementation is fine. It's my ignorance with respect to OpenMP that's the problem here ;-)
> > > It's not specific to OpenMP I believe, as far as I am aware Clang's supported offload models (SYCL and CUDA as well as OpenMP) use it! In Flang's case we only really care about OpenMP as I believe it's the only offload programming model supported.
> > > 
> > > In the case of the command: 
> > > 
> > > ```
> > > flang-new -fopenmp file.f90
> > > ``` 
> > > The code should never be executed as no part of the command will enable an offloading action (for device or host)! But yes inputs[0] would indeed refer to file.f90.
> > > 
> > > However, this code becomes relevant when you utilise an option that enables the clangDriver to perform some form of offloading action. For example a command like: 
> > > 
> > > ```
> > > flang-new -fopenmp --offload-arch=gfx90a file.f90 
> > > ```
> > > Will trigger two phase compilation, one for the host device (your resident CPU in this command) and one for the device (gfx90a in this command), the regular host pass will invoke like your provided command and the device pass will then invoke with -fopenmp-is-device in addition alongside the device triple. This generates two bitcode files from the one file, one containing the host code from the file, the other the device code (extracted from OpenMP target regions or declare target etc.). 
> > > 
> > > However, now we have two files, with both parts of our program, we need to conjoin them together, the clangDriver generates an embeddable fat-binary/binary using the clang-offload-packager and then invokes flang-new again, and this is where the above code becomes relevant, as this binary (or multiple binaries, if we target multiple devices in the same program) is what is passed to -fembed-offload-object! And inputs[0] in this case would refer to the output from the original host pass, which is what we want to embed the device binary into, so we wish to skip this original host output and only pass the subsequent inputs (which should be device binaries when the clangDriver initiates a host offloading action) we want to embed as -fembed-offload-object arguments. 
> > > 
> > > The offloading driver is quite complex and my knowledge of it is not perfect as I am not our resident expert on it unfortunately (so if anyone sees anything incorrect, please do chime in and correct me)! 
> > > 
> > > But hopefully this answers your question and gives you an idea of when and how this -fembed-offload-object comes into play, essentially a way to get the device binaries we wish to insert into the host binary, so it can load the binaries at runtime and execute them. Currently upstream Flang doesn't utilise this option of course, but we intend to use this as part of our OpenMP offloading efforts for AMD devices (whilst leaving the door open for other vendors devices as well). We are trying to re-use/mimic as much of the existing machinery that the clangDriver implements as we can! 
> > >  
> > The compiler requires at least one input file to run, otherwise it exits early. Therefore we're guaranteed to have at least one input file in the list. Some functions need an input file, usually to write some temp name to, and `Inputs[0]` is the easiest way to get an input file.
> Thank you very much @jhuber6! I should have added you as a subscriber/reviewer as well in hindsight, sorry about that. 
Sorry, this is two separate things. I was thinking about the driver's input list which behaves like I mentioned above.

Here is the input list to an actual compilation job. In this case we expect to only get one. E.g. `clang foo.c bar.c` gets turned into `clang -cc1 foo.c` and `clang -cc1 bar.c`. The extra files added here are intended to be used as additional input handled separately. So, for example consider the following OpenMP offloading compilation.

```
> clang input.c -fopenmp --offload-arch=gfx1030 -ccc-print-bindings
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["input.c"], output: "/tmp/input-44c8b5.bc"
# "amdgcn-amd-amdhsa" - "clang", inputs: ["input.c", "/tmp/input-44c8b5.bc"], output: "/tmp/input-gfx1030-4471d9.bc"
# "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["/tmp/input-gfx1030-4471d9.bc"], output: "/tmp/input-83327d.out"
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/input-44c8b5.bc", "/tmp/input-83327d.out"], output: "/tmp/input-cdd693.o"
# "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["/tmp/input-cdd693.o"], output: "a.out"
```
The first input file is the actual file meant to be compiled. The other files are handled separately. For the `amdgcn-amd-amdhsa` triple the extra input is the host bitcode we use to match symbols between the host and the device. For the `x64` triple the extra input is a binary blob to be embedded into the host.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145815/new/

https://reviews.llvm.org/D145815