[llvm] Add Dead Block Elimination to NVVMReflect (PR #144171)

Tue Jun 17 16:38:28 PDT 2025

Artem-B wrote:

Fascinating. Looks like the `AMDGPUExpandFeaturePredicatesPass` in #134016 is a close relative of NVVMReflect, it just checks LLVM features, instead of hard-coded constants.
While it does attempt to resolve some of those feature detections on the AST level, when possible, I think it has the same problem when the code is generated for a generic architecture as SPIR-V, and it will still require some sort of DCE to be applied in order to work reliably.

> The previous version did the simple DCE which was good enough to eliminate trivial uses, but we definitely need something more complicated for it to be completely robust. 

Do you have an example?

> We do this at O0 because otherwise it's useless as a means to guard potentially invalid code for that target.

Yes, the invalid code should either never show up in IR (AST-level resolution in #134016) or eliminated by something on IR level. In this case we're arguing about how to do that, and the options are:
- custom folding in NVVMReflect pass. 
- use existing passes to do the job, but apply them only when explicitly enabled by the user (preserves status quo for users who do not care, gives an escape hatch for those who run into the issue)

Now, it looks like we have another question -- how robust is DCE or the folding code in this patch. 

> In either case this is a bit of a hack because a lot of target specific features require +ptx73 or whatever, which would then need to be made a function level attribute and then propagated somehow.

Yup. That's why I think that the right way out of this well painted corner is to provide per-target bitcode which is intended to be valid for the given target. I do not think we can guarantee "generic" IR functionality for everyone at once. There will always be corner cases when we mix it with other IR modules, unless the IR in question itself is very generic, or we have some way to enforce that the wrong parts were eliminated before we mix it with any other IR.

E.g. in this case, the const-folding and DCE would work fine if it were applied to libdevice only. However, nothing stops users from using `__nvvm_reflect()` wherever they want, in an arbitrary convoluted way that we may not always be able to const-eval at build-time. 

I don't want to keep adding more magic to NVVMReflect. If we want a corner case workaround, I think it should be an explicitly enabled thing, with the clear understanding of the trade-offs involved. If we want a robust solution -- this PR is not it.

https://github.com/llvm/llvm-project/pull/144171