[Mlir-commits] [mlir] 0efb0dd - [mlir] Partially update the conversion-to-llvm document

Thu Dec 17 13:00:18 PST 2020

Author: Alex Zinenko
Date: 2020-12-17T22:00:09+01:00
New Revision: 0efb0dd978014c9ca5ef4cd93516a0cd6e77f185

URL: https://github.com/llvm/llvm-project/commit/0efb0dd978014c9ca5ef4cd93516a0cd6e77f185
DIFF: https://github.com/llvm/llvm-project/commit/0efb0dd978014c9ca5ef4cd93516a0cd6e77f185.diff

LOG: [mlir] Partially update the conversion-to-llvm document

This document was not updated after the LLVM dialect type system had been
reimplemented and was using an outdated syntax. Rewrite the part of the
document that concerns type conversion and prepare the ground for splitting it
into a document that explains how built-in types are converted and a separate
document that explains how standard types and functions are converted, which
will better correspond to the fact that built-in types do not belong to the
standard dialect.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D93486

Added: 
    

Modified: 
    mlir/docs/ConversionToLLVMDialect.md

Removed: 
    


################################################################################
diff  --git a/mlir/docs/ConversionToLLVMDialect.md b/mlir/docs/ConversionToLLVMDialect.md
index 27b732015f9f..778eea6184c9 100644

--- a/mlir/docs/ConversionToLLVMDialect.md
+++ b/mlir/docs/ConversionToLLVMDialect.md
@@ -1,16 +1,19 @@
 # Conversion to the LLVM Dialect
 
-Conversion from the Standard to the [LLVM Dialect](Dialects/LLVM.md) can be
-performed by the specialized dialect conversion pass by running:
+Conversion from several dialects that rely on
+[built-in types](LangRef.md#builtin-types) to the
+[LLVM Dialect](Dialects/LLVM.md) is expected to be performed through the
+[Dialect Conversion](DialectConversion.md) infrastructure.
 
-```shell
-mlir-opt -convert-std-to-llvm <filename.mlir>
-```
+The conversion of types and that of the overall module structure is described in
+this document. Individual conversion passes provide a set of conversion patterns
+for ops in 
diff erent dialects, such as `-convert-std-to-llvm` for ops in the
+[Standard dialect](Dialects/Standard.md) and `-convert-vector-to-llvm` in the
+[Vector dialect](Dialects/Vector.md). *Note that some conversions subsume the
+others.*
 
-It performs type and operation conversions for a subset of operations from
-standard dialect (operations on scalars and vectors, control flow operations) as
-described in this document. We use the terminology defined by the
-[LLVM IR Dialect description](Dialects/LLVM.md) throughout this document.
+We use the terminology defined by the
+[LLVM Dialect description](Dialects/LLVM.md) throughout this document.
 
 [TOC]
 
@@ -22,19 +25,19 @@ Scalar types are converted to their LLVM counterparts if they exist. The
 following conversions are currently implemented:
 
 -   `i*` converts to `!llvm.i*`
+-   `bf16` converts to `!llvm.bfloat`
 -   `f16` converts to `!llvm.half`
 -   `f32` converts to `!llvm.float`
 -   `f64` converts to `!llvm.double`
 
-Note: `bf16` type is not supported by LLVM IR and cannot be converted.
-
 ### Index Type
 
-Index type is converted to a wrapped LLVM IR integer with bitwidth equal to the
-bitwidth of the pointer size as specified by the
-[data layout](https://llvm.org/docs/LangRef.html#data-layout) of the LLVM module
-[contained](Dialects/LLVM.md#context-and-module-association) in the LLVM Dialect
-object. For example, on x86-64 CPUs it converts to `!llvm.i64`.
+Index type is converted to an LLVM dialect integer type with bitwidth equal to
+the bitwidth of the pointer size as specified by the
+[data layout](Dialects/LLVM.md#data-layout-and-triple) of the closest module.
+For example, on x86-64 CPUs it converts to `!llvm.i64`. This behavior can be
+overridden by the type converter configuration, which is often exposed as a pass
+option by conversion passes.
 
 ### Vector Types
 
@@ -45,31 +48,54 @@ size with element type converted using these conversion rules. In the
 n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types
 of one-dimensional vectors.
 
-For example, `vector<4 x f32>` converts to `!llvm<"<4 x float>">` and `vector<4
-x 8 x 16 x f32>` converts to `!llvm<"[4 x [8 x <16 x float>]]">`.
+For example, `vector<4 x f32>` converts to `!llvm.vec<4 x float>` and `vector<4
+x 8 x 16 x f32>` converts to `!llvm.array<4 x array<8 x vec<16 x float>>>`.
 
-### Memref Types
+### Ranked Memref Types
 
 Memref types in MLIR have both static and dynamic information associated with
-them. The dynamic information comprises the buffer pointer as well as sizes and
+them. In the general case, the dynamic information describes dynamic sizes in
+the logical indexing space and any symbols bound to the memref. This dynamic
+information must be present at runtime in the LLVM dialect equivalent type.
+
+In practice, the conversion supports two conventions:
+
+-   the default convention for memrefs in the
+    **[strided form](LangRef.md#strided-memref)**;
+-   a "bare pointer" conversion for statically-shaped memrefs with default
+    layout.
+
+The choice between conventions is specified at type converter construction time
+and is often exposed as an option by conversion passes.
+
+Memrefs with arbitrary layouts are not supported. Instead, these layouts can be
+factored out of the type and used as part of index computation for operations
+that read and write into a memref with the default layout.
+
+#### Default Convention
+
+The dynamic information comprises the buffer pointer as well as sizes and
 strides of any dynamically-sized dimensions. Memref types are normalized and
-converted to a descriptor that is only dependent on the rank of the memref. The
-descriptor contains:
-
-1.  the pointer to the data buffer, followed by
-2.  the pointer to properly aligned data payload that the memref indexes,
-    followed by
-3.  a lowered `index`-type integer containing the distance between the beginning
-    of the buffer and the first element to be accessed through the memref,
-    followed by
-4.  an array containing as many `index`-type integers as the rank of the memref:
-    the array represents the size, in number of elements, of the memref along
-    the given dimension. For constant MemRef dimensions, the corresponding size
-    entry is a constant whose runtime value must match the static value,
-    followed by
-5.  a second array containing as many 64-bit integers as the rank of the MemRef:
-    the second array represents the "stride" (in tensor abstraction sense), i.e.
-    the number of consecutive elements of the underlying buffer.
+converted to a _descriptor_ that is only dependent on the rank of the memref.
+The descriptor contains the following fields in order:
+
+1.  The pointer to the data buffer as allocated, referred to as "allocated
+    pointer". This is only useful for deallocating the memref.
+2.  The pointer to the properly aligned data pointer that the memref indexes,
+    referred to as "aligned pointer".
+3.  A lowered converted `index`-type integer containing the distance in number
+    of elements between the beginning of the (aligned) buffer and the first
+    element to be accessed through the memref, referred to as "offset".
+4.  An array containing as many converted `index`-type integers as the rank of
+    the memref: the array represents the size, in number of elements, of the
+    memref along the given dimension. For constant memref dimensions, the
+    corresponding size entry is a constant whose runtime value must match the
+    static value.
+5.  A second array containing as many converted `index`-type integers as the
+    rank of memref: the second array represents the "stride" (in tensor
+    abstraction sense), i.e. the number of consecutive elements of the
+    underlying buffer one needs to jump over to get to the next logically
+    indexed element.
 
 For constant memref dimensions, the corresponding size entry is a constant whose
 runtime value matches the static value. This normalization serves as an ABI for
@@ -80,125 +106,187 @@ resulting in a struct containing two pointers + offset.
 Examples:
 
 ```mlir
-memref<f32> -> !llvm<"{ float*, float*, i64 }">
-memref<1 x f32> -> !llvm<"{ float*, float*, i64, [1 x i64], [1 x i64] }">
-memref<? x f32> -> !llvm<"{ float*, float*, i64, [1 x i64], [1 x i64] }">
-memref<10x42x42x43x123 x f32> -> !llvm<"{ float*, float*, i64, [5 x i64], [5 x i64] }">
-memref<10x?x42x?x123 x f32> -> !llvm<"{ float*, float*, i64, [5 x i64], [5 x i64]  }">
+memref<f32> -> !llvm.struct<(ptr<float> , ptr<float>, i64)>
+memref<1 x f32> -> !llvm.struct<(ptr<float>, ptr<float>, i64,
+                                 array<1 x 64>, array<1 x i64>)>
+memref<? x f32> -> !llvm.struct<(ptr<float>, ptr<float>, i64
+                                 array<1 x 64>, array<1 x i64>)>
+memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<float>, ptr<float>, i64
+                                               array<5 x 64>, array<5 x i64>)>
+memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<float>, ptr<float>, i64
+                                             array<5 x 64>, array<5 x i64>)>
 
 // Memref types can have vectors as element types
-memref<1x? x vector<4xf32>> -> !llvm<"{ <4 x float>*, <4 x float>*, i64, [1 x i64], [1 x i64] }">
+memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vec<4 x float>>,
+                                             ptr<vec<4 x float>>, i64,
+                                             array<1 x i64>, array<1 x i64>)>
 ```
 
-If the rank of the memref is unknown at compile time, the memref is converted to
-an unranked descriptor that contains:
-
-1.  a 64-bit integer representing the dynamic rank of the memref, followed by
-2.  a pointer to a ranked memref descriptor with the contents listed above.
+#### Bare Pointer Convention
 
-Dynamic ranked memrefs should be used only to pass arguments to external library
-calls that expect a unified memref type. The called functions can parse any
-unranked memref descriptor by reading the rank and parsing the enclosed ranked
-descriptor pointer.
+Ranked memrefs with static shape and default layout can be converted into an
+LLVM dialect pointer to their element type. Only the default alignment is
+supported in such cases, e.g. the `alloc` operation cannot have an alignemnt
+attribute.
 
 Examples:
 
 ```mlir
-// unranked descriptor
-memref<*xf32> -> !llvm<"{i64, i8*}">
+memref<f32> -> !llvm.ptr<float>
+memref<10x42 x f32> -> !llvm.ptr<float>
+
+// Memrefs with vector types are also supported.
+memref<10x42 x vector<4xf32>> -> !llvm.ptr<vec<4 x float>>
 ```
 
-**In function signatures,** `memref` is passed as a _pointer_ to the structured
-defined above to comply with the calling convention.
+### Unranked Memref types
 
-Example:
+Unranked memrefs are converted to an unranked descriptor that contains:
+
+1.  a converted `index`-typed integer representing the dynamic rank of the
+    memref;
+2.  a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with
+    the contents listed above.
+
+This descriptor is primarily intended for interfacing with rank-polymorphic
+library functions. The pointer to the ranked memref descriptor points to memory
+_allocated on stack_ of the function in which it is used.
+
+Note that stack allocations may be emitted at a location where the unranked
+memref first appears, e.g., a cast operation, and remain live throughout the
+lifetime of the function; this may lead to stack exhaustion if used in a loop.
+
+Examples:
 
 ```mlir
-// A function type with memref as argument
-(memref<?xf32>) -> ()
-// is transformed into the LLVM function with pointer-to-structure argument.
-!llvm<"void({ float*, float*, i64, [1 x i64], [1 x i64]}*) ">
+// Unranked descriptor.
+memref<*xf32> -> !llvm.struct<(i64, ptr<i8>)>
 ```
 
+Bare pointer convention does not support unranked memrefs.
+
 ### Function Types
 
-Function types get converted to LLVM function types. The arguments are converted
-individually according to these rules. The result types need to accommodate the
-fact that LLVM IR functions always have a return type, which may be a Void type.
-The converted function always has a single result type. If the original function
-type had no results, the converted function will have one result of the wrapped
-`void` type. If the original function type had one result, the converted
-function will also have one result converted using these rules. Otherwise, the result
-type will be a wrapped LLVM IR structure type where each element of the
-structure corresponds to one of the results of the original function, converted
-using these rules. In high-order functions, function-typed arguments and results
-are converted to a wrapped LLVM IR function pointer type (since LLVM IR does not
-allow passing functions to functions without indirection) with the pointee type
-converted using these rules.
+Function types get converted to LLVM dialect function types. The arguments are
+converted individually according to these rules, except for `memref` types in
+function arguments and high-order functions, which are described below. The
+result types need to accommodate the fact that LLVM functions always have a
+return type, which may be an `!llvm.void` type. The converted function always
+has a single result type. If the original function type had no results, the
+converted function will have one result of the `!llvm.void` type. If the
+original function type had one result, the converted function will also have one
+result converted using these rules. Otherwise, the result type will be an LLVM
+dialect structure type where each element of the structure corresponds to one of
+the results of the original function, converted using these rules.
 
 Examples:
 
 ```mlir
-// zero-ary function type with no results.
+// Zero-ary function type with no results:
 () -> ()
-// is converted to a zero-ary function with `void` result
-!llvm<"void ()">
+// is converted to a zero-ary function with `void` result.
+!llvm.func<void ()>
 
-// unary function with one result
+// Unary function with one result:
 (i32) -> (i64)
-// has its argument and result type converted, before creating the LLVM IR function type
-!llvm<"i64 (i32)">
+// has its argument and result type converted, before creating the LLVM dialect
+// function type.
+!llvm.func<i64 (i32)>
 
-// binary function with one result
+// Binary function with one result:
 (i32, f32) -> (i64)
 // has its arguments handled separately
-!llvm<"i64 (i32, float)">
+!llvm.func<i64 (i32, float)>
 
-// binary function with two results
+// Binary function with two results:
 (i32, f32) -> (i64, f64)
-// has its result aggregated into a structure type
-!llvm<"{i64, double} (i32, f32)">
+// has its result aggregated into a structure type.
+!llvm.func<struct<(i64, double)> (i32, float)>
+```
+
+#### Functions as Function Arguments or Results
 
-// function-typed arguments or results in higher-order functions
+High-order function types, i.e. types of functions that have other functions as
+arguments or results, are converted 
diff erently to accommodate the fact that
+LLVM IR does not allow for function-typed values. Instead, functions are
+expected to be passed into and return from other functions _by pointer_.
+Therefore, function-typed function arguments are results are converted to
+pointer-to-the-function type. The pointee type is converted using these rules.
+
+Examples:
+
+```mlir
+// Function-typed arguments or results in higher-order functions:
 (() -> ()) -> (() -> ())
-// are converted into pointers to functions
-!llvm<"void ()* (void ()*)">
+// are converted into pointers to functions.
+!llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)>
+
+// These rules apply recursively: a function type taking a function that takes
+// another function
+( ( (i32) -> (i64) ) -> () ) -> ()
+// is converted into a function type taking a pointer-to-function that takes
+// another point-to-function.
+!llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)>
 ```
 
-## Calling Convention
+#### Memrefs as Function Arguments
 
-### Function Signature Conversion
+When used as function arguments, both ranked and unranked memrefs are converted
+into a list of arguments that represents each _scalar_ component of their
+descriptor. This is intended for some comaptibility with C ABI, in which
+structure types would need to be passed by-pointer leading to the need for
+allocations and related issues, as well as for aliasing annotations, which are
+currently attached to pointer in function arguments. Having scalar components
+means that each size and stride is passed as an invidivual value.
 
-LLVM IR functions are defined by a custom operation. The function itself has a
-wrapped LLVM IR function type converted as described above. The function
-definition operation uses MLIR syntax.
+When used as function results, memrefs are converted as usual, i.e. each memref
+is converted to a descriptor struct (default convention) or to a pointer (bare
+pointer convention).
 
 Examples:
 
 ```mlir
-// zero-ary function type with no results.
-func @foo() -> ()
-// gets LLVM type void().
-llvm.func @foo() -> ()
-
-// function with one result
-func @bar(i32) -> (i64)
-// gets converted to LLVM type i64(i32).
-func @bar(!llvm.i32) -> !llvm.i64
-
-// function with two results
-func @qux(i32, f32) -> (i64, f64)
-// has its result aggregated into a structure type
-func @qux(!llvm.i32, !llvm.float) -> !llvm<"{i64, double}">
-
-// function-typed arguments or results in higher-order functions
-func @quux(() -> ()) -> (() -> ())
-// are converted into pointers to functions
-func @quux(!llvm<"void ()*">) -> !llvm<"void ()*">
-// the call flow is handled by the LLVM dialect `call` operation supporting both
-// direct and indirect calls
+// A memref descriptor appearing as function argument:
+(memref<f32>) -> ()
+// gets converted into a list of individual scalar components of a descriptor.
+!llvm.func<void (ptr<float>, ptr<float>, i64)>
+
+// The list of arguments is linearized and one can freely mix memref and other
+// types in this list:
+(memref<f32>, f32) -> ()
+// which gets converted into a flat list.
+!llvm.func<void (ptr<float>, ptr<float>, i64, float)>
+
+// For nD ranked memref descriptors:
+(memref<?x?xf32>) -> ()
+// the converted signature will contain 2n+1 `index`-typed integer arguments,
+// offset, n sizes and n strides, per memref argument type.
+!llvm.func<void (ptr<float>, ptr<float>, i64, i64, i64, i64, i64)>
+
+// Same rules apply to unranked descriptors:
+(memref<*xf32>) -> ()
+// which get converted into their components.
+!llvm.func<void (i64, ptr<i8>)>
+
+// However, returning a memref from a function is not affected:
+() -> (memref<?xf32>)
+// gets converted to a function returning a descriptor structure.
+!llvm.func<struct<(ptr<float>, ptr<float>, i64, array<1xi64>, array<1xi64>)> ()>
+
+// If multiple memref-typed results are returned:
+() -> (memref<f32>, memref<f64>)
+// their descriptor structures are additionally packed into another structure,
+// potentially with other non-memref typed results.
+!llvm.func<struct<(struct<(ptr<float>, ptr<float>, i64)>,
+                   struct<(ptr<double>, ptr<double>, i64)>)> ()>
 ```
 
+## Calling Convention for Standard Calls
+
+<!-- TODO: This should be moved to a separate file, and the remaining file
+     renamed decouple the description of built-in type conversion from standard
+     dialect ops conversion. -->
+
 ### Result Packing
 
 In case of multi-result functions, the returned values are inserted into a