[llvm] [AArch64] Improve code generation of bool vector reduce operations (PR #115713)

Thu Dec 12 03:14:10 PST 2024

Il-Capitano wrote:

You're right, the issue is that the fast-math start-up object sets the FZ bit (bit 24) of FPCR, meaning that [_denormalized single-precision and double-precision inputs to, and outputs from, floating-point instructions are flushed to zero_](https://developer.arm.com/documentation/ddi0601/2024-09/AArch64-Registers/FPCR--Floating-point-Control-Register). This means that we can't reliably use `fcmp d0, #0.0` to check for an all-zero bit pattern.

To me it seems there's no way to use the `fcmp` trick and get correct lowering in all cases, since the setting of FPCR is configured at link time, i.e. a TU compiled with `-O3` can be affected if the `-mdaz-ftz` flag is used in the linking command.

There's one part of this patch that could still be used to optimize boolean reductions (used for `reduce_xor` here). I'll open a new PR for that in the next few days.

https://github.com/llvm/llvm-project/pull/115713