[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

Thu Jul 20 14:08:59 PDT 2023

AlexVlx added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5542
+  if (!getLangOpts().HIPStdPar)
+    ErrorUnsupported(E, "builtin function");

----------------
efriedma wrote:
> This doesn't make sense; we can't just ignore bits of the source code.  I guess this is related to "the decision on their validity is deferred", but I don't see how you expect this to work.
This is one of the weirder parts, so let's consider the following example:

```cpp
void foo() { __builtin_ia32_pause(); }
void bar() { __builtin_trap(); }

void baz(const vector<int>& v) {
    return for_each(par_unseq, cbegin(v), cend(v), [](auto&& x) { if (x == 42) bar(); });
}
```

In the case above, what we'd offload to the accelerator, and ask the target BE to lower, is the implementation of `for_each`, and `bar`, because it is reachable from the latter. `foo` is not reachable by any execution path on the accelerator side, however it includes a builtin that is unsupported by the accelerator (unless said accelerator is x86, which is not impossible, but not something we're dealing with at the moment). If we were to actually error out early, in the FE, in these cases, there's almost no appeal to what is being proposed, because standard headers, as well as other libraries, are littered with various target specific builtins that are not going to be supported. This all builds on the core invariant of this model / extension / thingamabob, which is that the algorithms, and only the algorithms, are targets for offload. It thus follows that as long as code that is reachable from an algorithm's implementation is safe, all is fine, but we cannot know this in the FE / on an AST level, because we need the actual CFG. This part is handled in LLVM in the `SelectAcceleratorCodePass` that's in the last patch in this series.

Now, you will quite correctly observe that there's nothing preventing an user from calling `foo` in the callable they pass to an algorithm; they might read the docs / appreciate that this won't work, but even there they are not safe, because there via some opaque call chain they might end up touching some unsupported builtin. My intuition here, which is reflected above in letting builtins just flow through, is that such cases are better served with a compile time error, which is what will obtain once the target BE chokes trying to lower an unsupported builtin. It's not going to be a beautiful error, and we could probably prettify it somewhat if we were to check after we've done the accelerator code selection pass, but it will happen at compile time. Another solution would be to emit these as traps (poison + trap for value returning ones), but I am concerned that it would lead to really fascinating debug journeys.

Having said this, if there's a better way to deal with these scenarios, it would be rather nice. Similarly, if the above doesn't make sense, please let me know.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850