[PATCH] D78655: [CUDA][HIP] Let non-caputuring lambda be host device

Tue Jun 23 10:43:11 PDT 2020

tra added a comment.

In D78655#2108047 <https://reviews.llvm.org/D78655#2108047>, @pfultz2 wrote:

> > Could you give an example to demonstrate current use and how it will break?
>
> Here is place where it would break:
>
> https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/src/targets/gpu/device/include/migraphx/gpu/device/multi_index.hpp#L129
>
> This change was already included in a fork of llvm in rocm 3.5 and 3.6 releases which is why this compiles. This also compiles using the hcc-based hip compilers which is what previous rocm versions used. It would be best if this can be upstreamed, so we dont have to hold on to these extra changes in a fork.

It may be OK to require updated software in order to switch to a new compiler. E.g. it would be unreasonable for clang to compile all existing HCC code. Nor did we promise to compile all existing CUDA code when it was at the point in time where HIP is now -- new compiler emerging in an ecosystem with existing code which compiles and works fine with the incumbent compiler, but needs some tweaks to compile/work with clang. There will be some back and forth before we reach the equilibrium where most things compile and work.

You may need to make some portability tweaks to your code to make it work with upstream and internal clang versions + hcc. This is roughly what's been done to existing CUDA code -- pretty much all major libraries that use CUDA (tensorflow, Thrust, cutlas, cub, pytorch) had to have minor tweaks to make it portable to clang.

Now, back to the specifics of your example. I'm still not 100% sure I understand what the problem is. Can you boil down the use case to an example on godbolt? Not just the lambda itself, but also the way it's intended to be used. It does not need to compile, I just need it to understand your use case and the problem.
I can imaging passing lambda type as a template parameter which would make it hard to predict/control where/how it will finally be instantiated or used, but it would be great to have a practical example.

> Part of the motivation for this change was that it wasn't always clear in code where the `__device__` attribute is needed with lambdas sometimes. It also makes it more consistent with `constexpr` lambdas and hcc-based hip compiler. Including this for capturing lambdas will make this simpler and easier to understand.
> 
> If there are concerns about making it default for capturing lambdas, then can we at least just have a flag to enable this for capturing lambdas?

I've just pointed that the assumption that having the capture implies having enclosing function is invalid. We've already decided to proceed with promotion of all lambdas in general, so it's mostly the matter of taking care of implementation details.
Dealing with capturing lambdas in a separate patch is one option. IMO it makes sense in general as capturing lambdas do have their own distinct quirks, while promotion of non-capturing lambdas are relatively uncontroversial.
If Sam decides to incorporate support for capturing lambdas in this patch, we could still do it by restricting the capturing lambda promotion to the ones within a function scope only. I.e. lambdas created in global scope would still be host.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78655/new/

https://reviews.llvm.org/D78655