[llvm] [NVPTX] Do not run the NVVMReflect pass as part of the normal pipeline (PR #121834)

Mon Jan 6 15:52:04 PST 2025

Artem-B wrote:

> Running it twice is a valid option.

The problem, IIUIC, is that in some compilation modes we may run optimization w/o the constants set properly for the reflect pass and running it may pick the wrong branch -- something that the late reflect pass would not be able to undo.

> I don't think this will make a considerable difference, since it's usually guarding some very shallow code paths.

I don't think it's always the case. There are functions in libdevice where `__nvvm_reflect()` is used multiple times (i.e. not just a single top-level `if`).

>  I feel like we're still getting a full optimization pipeline and this likely will be easily optimized out.

It's a maybe. Considering that __nvvm_reflect() only depends on a string, its branches may be optimizable, as long as they have no other `__nvvm_reflect()` calls in them. However, libdevice does have some functions where it's not the case.

The practical impact will likely be limited to the heavy functions with multiple `__nvvm_reflect()` calls in branches, which will potentially keep those branches hanging around and blocking optimizations throughout the normal optimization pipeline. Whether the limited optimizations in the back-end will be sufficient to produce good-enough code for those functions -- I do not know. It will likely be rare, but it's a small consolation for those folks who use those functions.

>  ignore it if there's no SM passed.

Reflect can be used with other parameters, so SM-only check alone is, generally speaking, not sufficient as an on/off switch.

I think the decision where the reflect pass should run should be tied to the earliest point where the reflect inputs get set. For CUDA, it's the beginning of the pipeline. For openMP and stand-alone compilation it's probably somewhere closer to the back-end (or wherever we may link in with libdevice, or do LTO, or other point where we finally know what we're actually compiling for).

https://github.com/llvm/llvm-project/pull/121834