[llvm] [CVP] Implement type narrowing for LShr (PR #119577)

Mon Dec 16 08:53:24 PST 2024

nikic wrote:

Thanks for the example. The important distinction here is that we aggressively narrow integer divs because on many CPUs the latency of integer divisions can be significantly smaller for smaller bit widths. This is not the case for lshr. For lshr, this optimization only really makes sense insofar as it removes unnecessary ext+trunc patterns. And this is not specific to lshr, you can narrow most integer ops with various constraints.

We already have various optimizations that do this, e.g. see canEvaluateTruncated in InstCombine and TruncInstCombine in AggressiveInstCombine. I think the reason they don't work for your sample is that computeKnownBits probably fails to determine that the top bits are zero in your sample. Probably it's just a matter of adding support for isSignedMinMaxIntrinsicClamp in computeKnownBits. (If we didn't fold the ashr to lshr it would probably also work via ComputeNumSignBits, which does handle the clamp pattern.)

https://github.com/llvm/llvm-project/pull/119577