[llvm] [AArch64] Use pattern to select bf16 fpextend (PR #137212)

Mon Apr 28 06:28:32 PDT 2025

john-brawn-arm wrote:

> As per #118966 was hoping the opposite would be better. I can see the advantage if there at fast-math fpexts/fptruncs that can be removed, but I was hoping that we could lower the extends/round so that the shifts and whatnot can be optimized nicely as they should be. Otherwise you miss fairly basic codegen opportunities.

All of the changes in test output as a result of this patch are better or equivalent to what we currently have, as far as I can tell, so if there's a situation where converting to a shift earlier rather than later is better we don't have a test for it.

> I am mostly against patterns that produce multiple output instructions, especially anything that is a cross-register-bank copy. `SUBREG_TO_REG` and `EXTRACT_SUBREG` are fine as they don't produce instructions. But I'm not sure if `(i32 (SUBREG_TO_REG (i32 0), (bf16 FPR16:$Rn), hsub))` is really valid.

This results in a COPY being implicitly added later, but having an explicit COPY_TO_REGCLASS here is probably better. I'll do that.

https://github.com/llvm/llvm-project/pull/137212