[llvm] [AArch64] Use pattern to select bf16 fpextend (PR #137212)

Thu May 1 07:03:52 PDT 2025

davemgreen wrote:

> All of the changes in test output as a result of this patch are better or equivalent to what we currently have, as far as I can tell, so if there's a situation where converting to a shift earlier rather than later is better we don't have a test for it.

Yeah some of the old codegen certainly looks like it is not doing as well as it should be. We obviously wouldn't have tests for everything possible. I was thinking about cases like `fpext(load)` with noneon, which should turn into a scalar `gpr load + gpr shift`, not a `fpr load + fpr->gpr move + gpr shift`. But that doesn't even work before! (it crashes or generates wrong instructions). There are other cases like how copysign gets expanded that should be optimizing better than they are at the moment.
```
define float @test(ptr %a) {
  %l = load bfloat, ptr %a
  %e = fpext bfloat %l to float
  ret float %e
}
```

When there is just one instruction being generate it looks OK, it's only the noneon patterns that worry me and those are of less importance overall. (i.e. this sounds ok, but...)  As far as I can see this won't currently work without +bf16, as it relies on seeing the `fpext(fpround())` after legalization, and the fpround will equally be expanded. It is a lot more code, emitting it with a pattern sounds unreasonable but did you give it any thought?

(Would it be possible to have the fpext(fpround) optimization happen as part of getNode(), so that it happens almost immediately and doesn't have the requirement that the fpround and fpext are legal operations?)

https://github.com/llvm/llvm-project/pull/137212