[llvm] [NVPTX] Teach NVPTX about predicates (PR #67468)

Tue Oct 3 08:05:38 PDT 2023

ldrumm wrote:

@Artem-B Sorry for the delay.

Your questions are definitely valid, and I'm investigating some cases reported to me where divergent branches are killing performance. I should be able to have some conclusions on that soon for several different generations.

Before I'm confident in these data, I think it's safe to make a few observations:

- I don't think this complicates the NVPTX backend much. There are a couple of places where we manually need to add the default predicates for custom dagtodag lowering, but for the most part the tablegen changes mean that llvm will do the right thing with default predicates and the inversion switch
- We don't know what `ptxas` can reliably do regarding if-conversion et al, and having the ability to control this in llvm gives us more power to affect code generation.
- I *have* seen ptxas if-convert very trivial cases in the wild, but llvm will have more information and can better reason about the control flow graph because it has more information.
- In my testing I've seen that divergent control flow can still be very expensive. NVIDIA's marketing documentation for Ampere suggests the hardware can now eliminate most of this, but the issues I'm looking at indicate that this is for simple cases only. Old hardware still suffers.
- I'm guessing PTX exposes generalized predicates for a reason. Not adding them limits what we can do in the backend.

https://github.com/llvm/llvm-project/pull/67468