[all-commits] [llvm/llvm-project] bdd365: [MLIR] Fix `ComplexToStandard` lowering of `comple...

Thu Dec 12 08:12:01 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: bdd365825d0766b6991c8f5443f8a9f76e75011a
      https://github.com/llvm/llvm-project/commit/bdd365825d0766b6991c8f5443f8a9f76e75011a
  Author: Benoit Jacob <jacob.benoit.1 at gmail.com>
  Date:   2024-12-12 (Thu, 12 Dec 2024)

  Changed paths:
    M mlir/lib/Conversion/ComplexToStandard/ComplexToStandard.cpp
    M mlir/test/Conversion/ComplexToStandard/convert-to-standard.mlir

  Log Message:
  -----------
  [MLIR] Fix `ComplexToStandard` lowering of `complex::MulOp` (#119591)

A complex multiplication should lower simply to the familiar 4 real
multiplications, 1 real addition, 1 real subtraction. No special-casing
of infinite or NaN values should be made, instead the complex numbers
should be thought as just vectors of two reals, naturally bottoming out
on the reals' semantics, IEEE754 or otherwise. That is what nearly
everybody else is doing ("nearly" because at the end of this PR
description we pinpoint the actual source of this in C99 `_Complex`),
and this pattern, by trying to do something different, was generating
much larger code, which was much slower and a departure from the
naturally expected floating-point behavior.

This code had originally been introduced in
https://reviews.llvm.org/D105270, which stated this rationale:
> The lowering handles special cases with NaN or infinity like C++.

I don't think that the C++ standard is a particularly important thing to
follow in this instance. What matters more is what people actually do in
practice with complex numbers, which rarely involves the C++
`std::complex` library type.

But out of curiosity, I checked, and the above statement seems
incorrect. The [current C++
standard](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4928.pdf)
library specification for `std::complex` does not say anything about the
implementation of complex multiplication: paragraph `[complex.ops]`
falls back on `[complex.member.ops]` which says:
> Effects: Multiplies the complex value rhs by the complex value *this
and stores the product in *this.

I also checked cppreference which often has useful information in case
something changed in a c++ language revision, but likewise, nothing at
all there:
https://en.cppreference.com/w/cpp/numeric/complex/operator_arith3

Finally, I checked in Compiler Explorer what Clang 19 currently
generates:
https://godbolt.org/z/oY7Ks4j95
That is just the familiar 4 multiplications.... and then there is some
weird check (`fcmp`) and conditionally a call to an external `__mulsc3`.
Googled that, found this StackOverflow answer:
https://stackoverflow.com/a/49438578

Summary: this is not about C++ (this post confirms my reading of the C++
standard not mandating anything about this). This is about C, and it
just happens that this C++ standard library implementation bottoms out
on code shared with the C `_Complex` implementation.

Another nuance missing in that SO answer: this is actually
[implementation-defined
behavior](https://en.cppreference.com/w/c/preprocessor/impl). There are
two modes, controlled by
```c
#pragma STDC CX_LIMITED_RANGE {ON,OFF,DEFAULT}
```
It is implementation-defined which is the default. Clang defaults to
OFF, but that's just Clang. In that mode, the check is required:

https://en.cppreference.com/w/c/language/arithmetic_types#Complex_floating_types
And the specific point in the [C99
standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
is: `G.5.1 Multiplicative operators`.

But set it to ON and the check is gone:
https://godbolt.org/z/aG8fnbYoP

Summary: the argument has moved from C++ to C --- and even there, to
implementation-defined behavior with a standard opt-out mechanism.

Like with C++, I maintain that the C standard is not a particularly
meaningful thing for MLIR to follow here, because people doing business
with complex numbers tend to lower them to real numbers themselves, or
have their own specialized complex types, either way not relying on
C99's `_Complex` type --- and the very poor performance of the
`CX_LIMITED_RANGE OFF` behavior (default in Clang) is certainly a key
reason why people who care prefer to stay away from `_Complex` and
`std::complex`.

A good example that's relevant to MLIR's space is CUDA's `cuComplex`
type (used in the cuBLAS CGEMM interface). Here is its multiplication
function. The comment about competitiveness is interesting: it's not a
quirk of this particular function, it's the spirit underpinning
numerical code that matters.

https://github.com/tpn/cuda-samples/blob/1bf5cd15c51ce80fc9b387c0ff89a9f535b42bf5/v8.0/include/cuComplex.h#L106-L120
```c

/* This implementation could suffer from intermediate overflow even though
 * the final result would be in range. However, various implementations do
 * not guard against this (presumably to avoid losing performance), so we 
 * don't do it either to stay competitive.
 */
__host__ __device__ static __inline__ cuFloatComplex cuCmulf (cuFloatComplex x,
                                                              cuFloatComplex y)
{
    cuFloatComplex prod;
    prod = make_cuFloatComplex  ((cuCrealf(x) * cuCrealf(y)) - 
                                 (cuCimagf(x) * cuCimagf(y)),
                                 (cuCrealf(x) * cuCimagf(y)) + 
                                 (cuCimagf(x) * cuCrealf(y)));
    return prod;
}
```

Another instance in CUTLASS:
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/complex.h#L231-L236

Signed-off-by: Benoit Jacob <jacob.benoit.1 at gmail.com>

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications