[llvm] [X86] Lower `minimum`/`maximum`/`minimumnum`/`maximumnum` using bitwise operations (PR #170069)

Wed Dec 10 04:08:08 PST 2025

valadaptive wrote:

Thinking about this a bit more, I think doing two min/max operations and combining them is a lot more specialized and tricky than what I've implemented here, and can be deferred to the future. The operations are asymmetric, and you have to make sure you follow LLVM's NaN semantics.

- `minps(x, y) | minps(y, x)` for a non-numeric minimum is asymmetric with regards to the maximum. Getting the non-numeric maximum requires more operations, as seen in Cranelift.
- `minps(x, y) | minps(y, x)` also violates LLVM's [NaN payload semantics](https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values) (emphasis mine):

  > Floating-point math operations that return a NaN are an exception from the general principle that LLVM implements IEEE-754 semantics. Unless specified otherwise, the following rules apply whenever the IEEE-754 semantics say that a NaN value is returned: the result has a non-deterministic sign; the quiet bit and payload are non-deterministically chosen from the following set of options:
  >
  >   - The quiet bit is set and the payload is all-zero. (“Preferred NaN” case)
  >
  >   - The quiet bit is set and the payload is copied from any input operand that is a NaN. (“Quieting NaN propagation” case)
  >
  >   - The quiet bit and payload are copied from any input operand that is a NaN. (“Unchanged NaN propagation” case)
  >
  >   - The quiet bit is set and the payload is picked from a target-specific set of “extra” possible NaN payloads. The set can depend on the input operand values. **This set is empty on x86 and ARM,** but can be non-empty on other architectures. (For instance, on wasm, if any input NaN does not have the preferred all-zero payload or any input NaN is an SNaN, then this set contains all possible payloads; otherwise, it is empty. On SPARC, this set consists of the all-one payload.)

  The bitwise "or" copies the mantissa bits from the non-NaN operand, which LLVM explicitly does not allow. You can fix this with an extra fixup step afterwards like Cranelift does, but the speed tradeoff becomes even more murky.

- Extending those operations to prefer the non-NaN operand seems tricky. Not only do you now have to check against *two* source operands (if you do only one minps/maxps, you only have to check if one source operand is NaN), but whatever bitwise tricks you perform must not create signaling NaN values. For LLVM's semantics, it's fine to pass through a signaling NaN input, but you may not *create* a signaling NaN if none of the input operands were signaling NaNs.

I think loading the sign masks from memory can be avoided. LLVM already "materializes" an all-ones value using something like `pcmpeqd xmm0, xmm0`, and you can extend that to "materialize" any mask value (a run of ones followed by zeroes, or vice versa) by performing a left or right shift afterwards. That's a different part of the codebase, however, and it'd have much broader performance implications.

https://github.com/llvm/llvm-project/pull/170069