[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

Fri Jul 21 05:25:23 PDT 2023

AlexVlx added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5542
+  if (!getLangOpts().HIPStdPar)
+    ErrorUnsupported(E, "builtin function");

----------------
efriedma wrote:
> AlexVlx wrote:
> > efriedma wrote:
> > > This doesn't make sense; we can't just ignore bits of the source code.  I guess this is related to "the decision on their validity is deferred", but I don't see how you expect this to work.
> > This is one of the weirder parts, so let's consider the following example:
> > 
> > ```cpp
> > void foo() { __builtin_ia32_pause(); }
> > void bar() { __builtin_trap(); }
> > 
> > void baz(const vector<int>& v) {
> >     return for_each(par_unseq, cbegin(v), cend(v), [](auto&& x) { if (x == 42) bar(); });
> > }
> > ```
> > 
> > In the case above, what we'd offload to the accelerator, and ask the target BE to lower, is the implementation of `for_each`, and `bar`, because it is reachable from the latter. `foo` is not reachable by any execution path on the accelerator side, however it includes a builtin that is unsupported by the accelerator (unless said accelerator is x86, which is not impossible, but not something we're dealing with at the moment). If we were to actually error out early, in the FE, in these cases, there's almost no appeal to what is being proposed, because standard headers, as well as other libraries, are littered with various target specific builtins that are not going to be supported. This all builds on the core invariant of this model / extension / thingamabob, which is that the algorithms, and only the algorithms, are targets for offload. It thus follows that as long as code that is reachable from an algorithm's implementation is safe, all is fine, but we cannot know this in the FE / on an AST level, because we need the actual CFG. This part is handled in LLVM in the `SelectAcceleratorCodePass` that's in the last patch in this series.
> > 
> > Now, you will quite correctly observe that there's nothing preventing an user from calling `foo` in the callable they pass to an algorithm; they might read the docs / appreciate that this won't work, but even there they are not safe, because there via some opaque call chain they might end up touching some unsupported builtin. My intuition here, which is reflected above in letting builtins just flow through, is that such cases are better served with a compile time error, which is what will obtain once the target BE chokes trying to lower an unsupported builtin. It's not going to be a beautiful error, and we could probably prettify it somewhat if we were to check after we've done the accelerator code selection pass, but it will happen at compile time. Another solution would be to emit these as traps (poison + trap for value returning ones), but I am concerned that it would lead to really fascinating debug journeys.
> > 
> > Having said this, if there's a better way to deal with these scenarios, it would be rather nice. Similarly, if the above doesn't make sense, please let me know.
> > 
> Oh, I see; you "optimistically" compile everything assuming it might run on the accelerator, then run LLVM IR optimizations, then determine late which bits of code will actually run on the accelerator, which then prunes the code which shouldn't run.
> 
> I'm not sure I really like this... would it be possible to infer which functions need to be run on the accelerator based on the AST?  I mean, if your API takes a lambda expression that runs on the accelerator, you can mark the lambda's body as "must be emitted for GPU", then recursively mark all the functions referred to by the lambda.
> 
> Emiting errors lazily from the backend means you get different diagnostics depending on the optimization level.
> 
> If you do go with this codegen-based approach, it's not clear to me how you detect that a forbidden builtin was called; if you skip the error handling, you just get a literal "undef".
`I'm not sure I really like this...` - actually, I am not a big fan either, however I think it's about the best one can do, given the constraints (consume standard C++, no annotations on the user side etc.). Having tried a few times in the past (and at least once in a different compiler), I don't quite think this can be done on an AST level. It would add some fairly awkward checking during template instantiation (no way to know earlier that a `CallableFoo` was passed to an offloadable algorithm), and it's a bit unwieldy to basically compute the CFG on the AST and mark reachable Callees at that point. Ignoring those, the main reason for which we cannot do this is that the interface is not constrained to only take lambdas, but callables in general, and that includes pointers to function as well. We don't deal with those today, but plan to, and there's a natural solution when operating on IR, assuming closed / internalised Modules (which is the case at least for AMDGPU at the moment). The final challenge pertains to the AST being per TU, with no cross-TU visibility, whereas with IR you can either pre-link the BC (implicitly or LTO) and then operate on the entire compilation. This is a problem with cases where `foo` defined in TU0 is reachable from `algorithm_bar_offloaded_impl` in TU1. So TL;DR, I think it would be more complex to do this on the AST and would end up more brittle / less future proof.

In what regards how to do deferred diagnostics, it think it can be done like this (I crossed streams in my prior reply when discussing this part, so it's actually nonsense): instead of emitting undef here, we can emit a builtin with the same signature, but with the name suffixed with e.g. (`__stdpar_unsupported`) or something similar. Then, when doing the reachability computation later, if we stumble upon a node in the CFG that contains a builtin suffixed with `__stdpar_unsupported` we error out, and can provide nice diagnostics since we'd have the call-chain handy. Thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850