[PATCH] D79744: clang: Add address space to indirect abi info and use it for kernels
Matt Arsenault via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed May 20 13:44:06 PDT 2020
arsenm added a comment.
In D79744#2040731 <https://reviews.llvm.org/D79744#2040731>, @rjmccall wrote:
> In D79744#2040434 <https://reviews.llvm.org/D79744#2040434>, @jdoerfert wrote:
>
> > In D79744#2040380 <https://reviews.llvm.org/D79744#2040380>, @rjmccall wrote:
> >
> > > In D79744#2040348 <https://reviews.llvm.org/D79744#2040348>, @jdoerfert wrote:
> > >
> > > > Drive by, I haven't read the entire history.
> > > >
> > > > In D79744#2040208 <https://reviews.llvm.org/D79744#2040208>, @rjmccall wrote:
> > > >
> > > > > I don't understand why `noalias` is even a concern if the whole buffer passed to the kernel is read-only. `noalias` is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.
> > > >
> > > >
> > > > The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked `readonly`, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory.
> > >
> > >
> > > Oh, do we really not have a way to mark that memory is known to be truly immutable for a time? That seems like a really bad limitation. It should be doable with a custom alias analysis at least.
> >
> >
> > `noalias` + `readone` on an argument basically implies immutable for the function scope. I think we have invariant intrinsics that could do the trick as well, though I'm not too familiar with those. I was eventually hoping for paired/scoped `llvm.assumes` which would allow `noalias` + `readnone` again. Then there is `invariant` which can be placed on a load instruction. Finally, TBAA has a "constant memory" tag (or something like that), but again it is a per-access thing. That are all the in-tree ways I can think of right now.
> >
> > Custom alias analysis can probably be used to some degree but except address spaces I'm unsure we have much that you can attach to a pointer and that "really sticks".
> >
> > >>> Regardless, if you do need `noalias`, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into `restrict` local variables and then only those variables were subsequently used.
> > >>
> > >> We are still debating the local restrict pointer support. Once we merge that in it should be usable here.
> > >
> > > I thought that was finished a few years ago; is it really not considered usable yet? Or does "we" not just mean LLVM here?
> >
> > Yes, I meant "we" = LLVM here. Maybe we talk about different things. I was referring to local restrict qualified variables, e.g., `double * __restrict Ptr = ...`. The proposal to not just ignore the restrict (see https://godbolt.org/z/jLzjR3) came last year, it was a big one and progress unfortunately stalled (partly my fault). Now we are just about to see a second push to get it done.
> > Is that what you meant too?
>
>
> I thought I remembered Hal doing a lot of work on local restrict a few years ago. I'm probably just misremembering, or I didn't realize that the work never landed or got pulled out later.
>
> Okay. So where we're at is that you'd like to add a new argument-passing convention that's basically "passed indirect but immutable", implying that the frontend has to copy it in order to create the mutable local parameter. That's not actually a totally ridiculous convention in principle, although it has poor worst-case behavior (copies on both sides), and that happens to be what the frontend will often have to conservatively emit. I would still prefer not to add new argument-passing conventions just to satisfy short-term optimization needs, though. Are there any other reasonable options?
For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program. There is no caller function, and in the future I would like to make a call to amdgpu_kernel an IR verifier error (technically OpenCL device enqueue is an exception to this, but we don't treat this as a call. Instead there's a lot of library magic to invoke the kernel. From the perspective of the callee nothing changes, since it's still not allowed to modify the incoming argument buffer or aware it was called this way).
The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are
1. Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.
2. Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)
3. Keep clang IR generation unchanged, but move the pass that lowers arguments to loads earlier and hack out aggregate IR loads before SROA makes things worse. This is really just a kludgier version of option 2. We do ultimately do this late in the backend to enable vectorization, but it does seem to make the middle end optimizer unhappy
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79744/new/
https://reviews.llvm.org/D79744
More information about the cfe-commits
mailing list