[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 25 16:20:48 PST 2022


tra planned changes to this revision.
tra added a comment.

Getting rid of byval helps getting rid of locals in quite a few places, but runs into a new problem. 😕

Looks like this change does have unexpected side-effects.
When we need to dynamically index into a struct passed directly, there's no easy way to do it as `extractvalue` only supports constant indices. 
With `byval` aggregates LLVM uses GEP which does allow using dynamic indexing. 
Alloca would only show up after `nvptx-lower-args` pass and that by that time IR would often be simple enough to eliminate that alloca. 
Now, clang generates  a local copy early on and, indexes into it dynamically with GEP... and then LLVM fails to eliminate the local copy because SROA fails to deal with dynamic indices and that in turn prevents IR optimizations that were possible without alloca.
https://github.com/llvm/llvm-project/issues/51734

That's rather unfortunate. This regression is serious enough to be a showstopper for my use case.



================
Comment at: clang/lib/CodeGen/TargetInfo.cpp:7193
          EIT->getNumBits() > 64))
       return getNaturalAlignIndirect(Ty, /* byval */ true);
   }
----------------
jdoerfert wrote:
> When is this ever hit and should we not disable byval here too while we are at it?
Basically it's saying "pass as byval pointer if it's an int that's larger than what we can lower".
Yes, I think passing such integer directly would make sense.

We may hit this if clang wants to pass `__i128` (do larger int types exist in clang?). 
I think (some of) this may be a leftover from the days when we didn't support i128 in CUDA/NVPTX. I think we do now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118084/new/

https://reviews.llvm.org/D118084



More information about the cfe-commits mailing list