[llvm-dev] Complex proposal v2

Thu Aug 29 08:04:50 PDT 2019

All,

Here is the second revision of the proposal for a complex type in LLVM.
It clarifies a few things that came up during discussion and adds
additional operations for complex types.

                         -David

Proposal to Support Complex Operations in LLVM
----------------------------------------------

Revision History

v1 - Initial proposal [1]
v2 - This proposal
     - Added complex of all existing floating point types
     - Made complex a special aggregate
     - Specified literal syntax
     - Added special index values "real" and "imag" for insertvalue/extractvalue
       of complex
     - Added czip intrinsic to create vector of complex from vectors of
       real/imaginary
     - Added extractreal and extractimag intrinsics for vectors of complex
     - Added masked vector intrinsics

Abstract
--------

Several vendors and individuals have proposed first-class complex
support in LLVM.  Goals of this proposal include better optimization,
diagnostics and general user experience.

Introduction and Motivation
---------------------------

Recently the topic of complex numbers arose on llvm-dev with several
developers expressing a desire for first-class IR support for complex
[2] [3].  Interest in complex numbers in LLVM goes back much further
[4].

Currently clang chooses to represent standard types like "double
complex" and "std::complex<float>" as structure types containing two
scalar fields, for example {double, double}.  Consequently, arrays of
complex type are represented as, for example, [8 x {double, double}].
This has consequences for how clang converts complex operations to
LLVM IR.  In general, clang emits loads of the individual real and
imaginary parts and feeds them into arithmetic operations.
Vectorization results in many shufflevector operations to massage the
data into sequences suitable for vector arithmetic.

All of the real/imaginary data manipulation obscures the underlying
arithmetic.  It makes it difficult to reason about the algebraic
properties of expressions.  For expressiveness and optimization
ability, it will be nice to have a higher-level representation for
complex in LLVM IR.  In general, it is desirable to defer lowering of
complex until the optimizer has had a reasonable chance to exploit its
properties.

First-class support for complex can also improve the user experience.
Diagnostics could express concepts in the complex domain instead of
referring to expressions containing shuffles and other low-level data
manipulation.  Users that wish to examine IR directly will see much
less gobbbledygook and can more easily reason about the IR.

Types
-----

This proposal introduces new aggregate types to represent complex
numbers.

c16      - Complex of 16-bit float
c32      - like float complex or std::complex<float>
c64      - like double complex or std::complex<double>
x86_c80  - Complex of x86_fp80
c128     - like long double complex or std::complex<long double>
ppc_c128 - Complex of ppc_fp128

Note that the references to C and C++ types above are simply
explanatory.  Nothing in this proposal assumes any particular
high-level language type will map to the above LLVM types.

The "underlying type" of a complex type is the type of its real and
imaginary components.

The sizes of the complex types are twice that of their underlying
types.  The real part of the complex shall appear first in the layout
of the types.  The format of the real and imaginary parts is the same
as for the complex type's underlying type.  This should map to most
common data representations of complex in various languages.

These types are *not* considered floating point types for the purposes
of Type::isFloatTy and friends, llvm_anyfloat_ty, etc. in order to
limit surprises when introducing these types.  New APIs will allow
querying and creation of complex types:

bool Type::isComplexTy()          const;
bool Type::isComplex16Ty()        const;
bool Type::isComplex32Ty()        const;
bool Type::isComplex64Ty()        const;
bool Type::isComplexX86_FP80Ty()  const;
bool Type::isComplex128Ty()       const;
bool Type::isComplexPPC_FP128Ty() const;

The types are a special kind of aggregate, giving them access to the
insertvalue and extractvalue operations with special notation (see
below).

Analogous ValueTypes will be used by intrinsics.

def c16       : ValueType<32,  uuu>
def c32       : ValueType<64,  vvv>
def c64       : ValueType<128, www>
def x86c80    : ValueType<160, xxx>
def c128      : ValueType<256, yyy>
def ppcc128   : ValueType<256, zzz>

def llvm_anycomplex_ty : LLVMType<Any>;
def llvm_c16_ty        : LLVMType<c16>;
def llvm_c32_ty        : LLVMType<c32>;
def llvm_c64_ty        : LLVMType<c64>;
def llvm_x86c80_ty     : LLVMType<x86c80>;
def llvm_c128_ty       : LLVMType<c128>;
def llvm_ppcc128_ty    : LLVMType<ppcc128>;

The numbering of the ValueTypes will be determined after discussion.
It may be desirable to insert them before the existing vector types,
grouping them with the other scalar types or we may want to put them
somewhere else.

Literals
--------

Literal complex values have special spellings
'(' <fp constant> '+'|'-' <fpconstant>'i' ')':

%v1 = c64 ( 5.67 + 1.56i )
%v2 = c64 ( 55.87 - 4.23i )
%v3 = c64 ( 55.87 + -4.23i )
%v4 = c32 ( 8.24e+2 + 0.0i )
%v5 = c16 ( 0.0 + 0.0i )

Note that the literal representation requires an explicit
specification of the imaginary part, even if zero.  A "redundant" <+
negative imaginary> is allowed to facilitate reuse of floating point
constants.

Operations
----------

This proposal overloads existing floating point instructions for
complex types in order to leverage existing expression optimizations:

c64 %res   = fadd c64 %a, c64 %b
v8c64 %res = fsub v8c64 %a, v8c64 %b
c128 %res  = fmul c128 %a, c128 %b
v4c32 %res = fdiv v4c64 %a, v4c64 %b

The only valid comparisons of complex values shall be equality:

i1 %res = eq c32 %a, c32 %b
i8 %res = eq v8c32 %a, v8c32 %b
i1 %res = ne c64 %a, c64 %b
i8 %res = ne v8c64 %a, v8c64 %b

select is defined for complex:

c32 = select i1 %cmp, c32 %a, c32 %b

v4c64 = select i4 %cmp, v4c64 %a, v4c64 %b

Complex values may be casted to other complex types:

c32 %res = fptrunc c64 %a to c32
c64 %res = fpext c32 %a to c64

insertvalue and extractvalue may be used with the special index values
"real" and "imag":

%real = f32 extractvalue c32 %a, real
%real = c64 insertvalue c64 undef, f64 %r, real
%cplx = c64 insertvalue c64 %real, f64 %i, imag

The pseudo-value "real" shall evaluate to the integer constant zero
and the pseudo-valid "imag" shall evaluate to the integer constant
one, as if extractvalue/insertvalue were written with 0/1.  The use of
any other index with a complex value is undefined.

We also overload existing intrinsics:

declare c16      @llvm.sqrt.c16(c16 %val)
declare c32      @llvm.sqrt.c32(c32 %val)
declare c64      @llvm.sqrt.c64(c64 %val)
declare x86_c80  @llvm.sqrt.x86_c80(x86_c80 %val)
declare c128     @llvm.sqrt.c128(c128 %val)
declare ppc_c128 @llvm.sqrt.ppc_c128(ppc_c128 %val)

declare c16      @llvm.pow.c16(c16 %val, c16 %power)
declare c32      @llvm.pow.c32(c32 %val, c32 %power)
declare c64      @llvm.pow.c64(c64 %val, c64 %power
declare x86_c86  @llvm.pow.x86_c80(x86_c80 %val, x86_c80 %power
declare c128     @llvm.pow.c128(c128 %val, c128 %power)
declare ppc_c128 @llvm.pow.ppc_c128(ppc_c128 %val, ppc_c128 %power)

declare c16      @llvm.sin.c16(c16 %val)
declare c32      @llvm.sin.c32(c32 %val)
declare c64      @llvm.sin.c64(c64 %val)
declare x86_c80  @llvm.sin.x86_c80(x86_c80 %val)
declare c128     @llvm.sin.c128(c128 %val)
declare ppc_c128 @llvm.sin.ppc_c128(ppc_c128 %val)

declare c16      @llvm.cos.c16(c16 %val)
declare c32      @llvm.cos.c32(c32 %val)
declare c64      @llvm.cos.c64(c64 %val)
declare x86_c80  @llvm.cos.x86_c80(x86_c80 %val)
declare c128     @llvm.cos.c128(c128 %val)
declare ppc_c128 @llvm.cos.ppc_c128(ppc_c128 %val)

declare c16      @llvm.log.c16(c16 %val)
declare c32      @llvm.log.c32(c32 %val)
declare c64      @llvm.log.c64(c64 %val)
declare x86_c80  @llvm.log.x86_c80(x86_c80 %val)
declare c128     @llvm.log.c128(c128 %val)
declare ppc_c128 @llvm.log.ppc_c128(ppc_c128 %val)

declare half      @llvm.fabs.c16(c16 %val)
declare double    @llvm.fabs.c64(c64 %val)
declare x86_fp80  @llvm.fabs.x86_c80(x86_c80 %val)
declare fp128     @llvm.fabs.c128(c128 %val)
declare ppc_fp128 @llvm.fabs.ppc_c128(ppc_c128 %val)

Conversion to/from half-precision overloads the existing intrinsics.

llvm.convert.to.c16.* - Overloaded intrinsic to convert to c16.

declare c16 @llvm.convert.to.c16.c32(c32 %val)
declare c16 @llvm.convert.to.c16.c64(c64 %val)

llvm.convert.from.c16.* - Overloaded intrinsic to convert from c16.

declare c32 @llvm.convert.from.c16.c32(c16 %val)
declare c64 @llvm.convert.from.c16.c64(c16 %val)

In addition, new intrinsics will be used for complex-specific
operations:

llvm.cconj.* - Overloaded intrinsic to compute the conjugate of a
               complex value

declare c16      @llvm.cconj.c16(c16 %val)
declare c32      @llvm.cconj.c32(c32 %val)
declare c64      @llvm.cconj.c64(c64 %val)
declare x86_c80  @llvm.cconj.x86_c80(x86_c80 %val)
declare c128     @llvm.cconj.c128(c128 %val)
declare ppc_c128 @llvm.cconj.ppc_c128(ppc_c128 %val)

llvm.czip.* - Overloaded intrinsic to create a vector of complex from
              two vectors of floating-point type (not all variants
              shown)

declare v4c32 @llvm.czip.v4c32(v4f32 %real, v4f32 %imag)
declare v4c64 @llvm.czip.v4c32(v4f64 %real, v4f64 %imag)

llvm.extractreal.* - Overloaded intrinsic to create a vector of
                     floating-point type from the real portions of a
                     vector of complex (not all variants shown)

declare v4f32 @llvm.extractreal.v4c32(v4c32 %val)
declare v4f64 @llvm.extractreal.v4c64(v4c64 %val)

llvm.extractimag.* - Overloaded intrinsic to create a vector of
                     floating-point type from the imaginary portions
                     of a vector of complex (not all variants shown)

declare v4f32 @llvm.extractimag.v4c32(v4c32 %val)
declare v4f64 @llvm.extractimag.v4c64(v4c64 %val)

Masked intrinsics are also overloaded.  The complex types are
considered a single logical entity and thus the mask bits correspond
to the complex value as a whole, not the individual real and imaginary
parts:

llvm.masked.load.* - Overloaded intrinsic to load complex under mask
                     (not all variants shown)

declare v4c32 @llvm.masked.load.v4c32.p0v4c32(<4 x c32>* %ptr,
                                              i32 %alignment,
                                              <4 x i1> %mask,
                                              <4 x c32> %passthrough)

declare v8c32 @llvm.masked.load.v8c64.p0v8c64(<8 x c64>* %ptr,
                                              i32 %alignment,
                                              <8 x i1> %mask,
                                              <8 x c64> %passthrough)

llvm.masked.store.* - Overloaded intrinsic to store complex under mask
                      (not all variants shown)

declare void @llvm.masked.store.v4c32.p0v4c32(<4 x c32> %val,
                                              <4 x c32>* %ptr,
                                              i32 %alignment,
                                              <4 x i1> %mask)

declare void @llvm.masked.store.v8c64.p0v8c64(<8 x c64> %val,
                                              <8 x c64>* %ptr,
                                              i32 %alignment,
                                              <8 x i1> %mask)

llvm.masked.gather.* - Overloaded intrinsic to gather complex under
                       mask (not all variants shown)

declare v4c32 @llvm.masked.gather.v4c32.p0v4c32(<4 x c32 *> %ptrs,
                                                i32 %alignment,
                                                <4 x i1> %mask,
                                                <4 x c32> %passthrough)

declare v8c32 @llvm.masked.gather.v8c64.p0v8c64(<8 x c64*> %ptrs,
                                                i32 %alignment,
                                                <8 x i1> %mask,
                                                <8 x c64> %passthrough)

llvm.masked.scatter.* - Overloaded intrinsic to scatter complex under
                        mask (not all variants shown)

declare void @llvm.masked.scatter.v4c32.p0v4c32(<4 x c32> %val,
                                                <4 x c32*> %ptrs,
                                                i32 %alignment,
                                                <4 x i1> %mask)

declare void @llvm.masked.scatter.v8c64.p0v8c64(<8 x c64> %val,
                                                <8 x c64*> %ptrs,
                                                i32 %alignment,
                                                <8 x i1> %mask)

llvm.masked.expandload.* - Overloaded intrinsic to expandload complex
                           under mask (not all variants shown)

declare v4c32 @llvm.masked.expandload.v4c32.p0v4c32(c32* %ptr,
                                                    <4 x i1> %mask,
                                                    <4 x c32> %passthrough)

declare v8c32 @llvm.masked.expandload.v8c64.p0v8c64(c64* %ptr,
                                                    <8 x i1> %mask,
                                                    <8 x c64> %passthrough)

llvm.masked.compressstore.* - Overloaded intrinsic to compressstore
                              complex under mask (not all variants
                              shown)

declare void @llvm.masked.compressstore.v4c32.p0v4c32(<4 x c32> %val,
                                                      c32* %ptr,
                                                      <4 x i1> %mask)

declare void @llvm.masked.compressstore.v8c64.p0v8c64(<8 x c64> %val,
                                                      c64* %ptr,
                                                      <8 x i1> %mask)

Conclusion
----------

This proposal introduces new complex types and overloads existing
floating point instructions and intrinsics for common complex
operations and introduces new intrinsics for complex-specific
operations.

Goals of this work include better reasoning about complex operations
within LLVM, leading to better optimization, reporting and overall
user experience.

This is a draft and subject to change.

[1] http://lists.llvm.org/pipermail/llvm-dev/2019-July/133558.html
[2] http://lists.llvm.org/pipermail/llvm-dev/2019-April/131516.html
[3] http://lists.llvm.org/pipermail/llvm-dev/2019-April/131523.html
[4] http://lists.llvm.org/pipermail/llvm-dev/2010-December/037072.html