[PATCH] D79744: clang: Add address space to indirect abi info and use it for kernels

Johannes Doerfert via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 20 16:00:55 PDT 2020


jdoerfert added a comment.

In D79744#2047482 <https://reviews.llvm.org/D79744#2047482>, @arsenm wrote:

> For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program. There is no caller function, and in the future I would like to make a call to amdgpu_kernel an IR verifier error (technically OpenCL device enqueue is an exception to this, but we don't treat this as a call. Instead there's a lot of library magic to invoke the kernel. From the perspective of the callee nothing changes, since it's still not allowed to modify the incoming argument buffer or aware it was called this way).


Did you consider callback annotation for the device enqueue call? While that might not change anything *now*, I'm expecting interesting optimization opportunities there at some point "soon".

> The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are
> 
> 1. Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.

And, as was noted in the `byval` lang ref patch (D79636 <https://reviews.llvm.org/D79636>), there is a reasonable argument to be made to phase-out `byval` in favor of some explicit copying. If that happens, this solution should not be "the last `byval` user". Also, `byval` arguments could be used as scratchpad by smart optimizations. I somehow don't believe this is a great choice but I can by now see that the others are neither.

> 1. Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)
> 2. Keep clang IR generation unchanged, but move the pass that lowers arguments to loads earlier and hack out aggregate IR loads before SROA makes things worse. This is really just a kludgier version of option 2. We do ultimately do this late in the backend to enable vectorization, but it does seem to make the middle end optimizer unhappy




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79744/new/

https://reviews.llvm.org/D79744





More information about the cfe-commits mailing list