<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">Hi Chris,</div><div dir="ltr"><br>On Dec 18, 2018, at 8:45 PM, Chris Lattner <<a href="mailto:clattner@nondot.org">clattner@nondot.org</a>> wrote:<br><br></div><blockquote type="cite"><div dir="ltr"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 5, 2018, at 10:41 AM, Adam Nemet <<a href="mailto:anemet@apple.com" class="">anemet@apple.com</a>> wrote:</div><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi all,<div dir="ltr" class=""></div><div class=""><br class=""></div><div class="">After the previous RFC[1], there were multiple discussions on the ML and in person at the DevMtg. I will summarize the options discussed and propose a path forward.</div><div class=""><br class=""></div><div class=""><font face="Menlo" class="">===========================</font></div><div class="">Options</div><div class=""><span style="font-family: Menlo;" class="">===========================</span></div><div class=""><br class=""></div><div class="">A. Extend VectorType to be multidimensional</div><div class=""><br class=""></div><div class="">B. Flatten matrices into the current VectorType. Matrix shape and layout information is passed to matrix intrinsics. All matrix operations including element-wise matrix operations are implemented via intrinsics</div><div class=""><br class=""></div><div class="">C. Same as B but padding is explicitly managed by shufflevector instructions, element-wise operations are implemented via built-in operators (e.g. fadd)</div><div class=""><br class=""></div><div class=""><span style="font-family: Menlo;" class="">===========================</span></div><div class="">tl;dr</div><div class=""><span style="font-family: Menlo;" class="">===========================</span></div><div class=""><br class=""></div><div class="">There was some support for option A to introduce first-class matrices (or multidimensional vectors) but also many concerns. I have sketched out many examples in IR and flattening matrices to vectors does not seem to present any clear show-stoppers. Thus, I am scaling back the proposal to option B which is a more incremental step. </div></div></div></div></blockquote><div><br class=""></div><div>Seems reasonable to start small, learn, then build out from there.</div><div><br class=""></div><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><span style="background-color: rgba(255, 255, 255, 0);" class="">Throughout this work, an important goal is to provide a matrix-aware IRBuilder API. E.g.:</span></div><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 20.3px;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""> Value *CreateMatrixAdd(<span style="color: rgb(2, 37, 199);" class="">Value</span> *<span style="color: rgb(202, 48, 199);" class="">Op0</span>, <span style="color: rgb(2, 37, 199);" class="">Value</span> <span style="color: rgb(202, 48, 199);" class="">Op1</span>,</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""> <span style="color: rgb(2, 37, 199);" class="">unsigned</span> <span style="color: rgb(202, 48, 199);" class="">Rows</span>, <span style="color: rgb(2, 37, 199);" class="">unsigned</span> <span style="color: rgb(202, 48, 199);" class="">Cols</span>,</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; color: rgb(0, 197, 199);" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><span style="" class=""> </span><span style="color: rgb(2, 37, 199);" class="">MatrixLayout</span><span style="" class=""> </span><span style="color: rgb(202, 48, 199);" class="">ML</span><span style="" class=""> </span>/* row/column-major, padding */<span style="" class="">);</span></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; color: rgb(0, 197, 199);" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><br class=""></span></div></div></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class="">This will allow for simpler front-ends and would allow us to swap out the generated IR if the design needs to change.</span></div></div></div></div></blockquote><div><br class=""></div><div>Why does this have to be on IRBuilder? If you were writing this in Swift, then these would be an extension on IRBuilder that added some methods like this, but could be kept out of core.</div><div><br class=""></div><div>C++ isn’t as sophisticated, but you could still define these as a separate header file with functions like:</div><div><br class=""></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Value *BuilderCreateMatrixAdd(IRBuilder &B, … other args)</div><div><br class=""></div><div>This would avoid adding a bunch of matrix specific stuff to the core IRBuilder class and header.</div></div></div></blockquote><div><br></div><div>Sure, that works too.</div><br><blockquote type="cite"><div dir="ltr"><div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div><div class=""><span style="font-family: Menlo;" class="">===========================</span></div><div class="">Details </div><div class=""><span style="font-family: Menlo;" class="">===========================</span></div><div class=""><br class=""></div><div class=""><span style="font-size: 13px;" class=""><font face="Menlo" class="">-------------------------</font></span></div><div class="">Introduction</div><div class=""><span style="font-family: Menlo; font-size: 13px;" class="">-------------------------</span></div><div class=""><br class=""></div><div class="">Representing multi-dimensional vectors in the IR through types makes the IR more expressive (<b class="">option A</b>). Additionally, if we have a new type we have the freedom to implicitly map it to a layout. E.g. <3 x 3 x float> could imply column-major order and one element of padding between each column. When it’s passed or returned from functions it should be passed in 3 vector registers.</div><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">This is a sample IR to add two 3x3 matrices followed by a matrix multiply with a 3x2:</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%a</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <3 x 3 x float>, <3 x 3 x float>* <span style="color: rgb(202, 48, 199);" class="">%A</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%b</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <3 x 3 x float>, <3 x 3 x float>* <span style="color: rgb(202, 48, 199);" class="">%B</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%c</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <3 x 2 x float>, <3 x 2 x float>* <span style="color: rgb(202, 48, 199);" class="">%C</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%add</span> = <span style="color: rgb(201, 27, 0);" class="">fadd</span> <3 x 3 x float> <span style="color: rgb(202, 48, 199);" class="">%a</span>, <span style="color: rgb(202, 48, 199);" class="">%b</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%mul</span> = <span style="color: rgb(201, 27, 0);" class="">call</span> <3 x 2 x float> @llvm.matrix.multiply(<3 x 3 x float> <span style="color: rgb(202, 48, 199);" class="">%add</span>,</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <3 x 2 x float> <span style="color: rgb(202, 48, 199);" class="">%c</span>)</div></div></div></div></div></div></div></blockquote><div><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div class=""><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><div class="" style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"> <span class="" style="color: rgb(201, 27, 0);">store</span><span class="Apple-tab-span" style="white-space: pre;"> </span><3 x 2 x float><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="" style="color: rgb(202, 48, 199);">%mul</span>, <3 x 2 x float>* <span class="" style="color: rgb(202, 48, 199);">%MUL</span></div></div></div></div></div></div></div></blockquote><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div class=""><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><div class="" style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><span class="" style="color: rgb(202, 48, 199);"><br class=""></span></div></div></div></div></div></div></div></div></div><div>LLVM requires name mangling the type into the intrinsic, but yeah.</div></div></div></blockquote><div><br></div><div>Yes, we have that in our prototype. I just removed it from here for readability.</div><br><blockquote type="cite"><div dir="ltr"><div><br class=""><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div></div></div></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Note that the type always implies a layout here. If we have multiple layouts appear in the same module beyond the default specified by DataLayout, we would have these represented in the type, e.g. <3 x 3 x float column-major pad(1)> specifying column-major layout with a single element of padding after each 3-element column vector.</div></div></div></div></blockquote><div><br class=""></div>Right.</div><div><br class=""><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Also note that we’re using built-in fadd operator for element-wise operation and an intrinsic for non-element-wise operations like matrix multiply.</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Instead of extending the type system, we can map the matrix instances onto existing types. The vector type is a natural fit as it can be considered a SequenceType with Row*Column elements of the element type. For operations, like matrix multiply we just need to pass the shape information for the extra dimension.</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">One question arises though, how padding should be handled. For one, performing operations like division with the padding can cause spurious faults. But even for non-trapping operations excluding padding should be an option. For example in the case of a <3 x 3 x double>, we may want to lower a single row/column into the combination of a 128B vector register (2 elts) and a scalar rather than two vectors. This should be more beneficial for power. Thus we want to make padding explicit in the IR.</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">One option is to expose the shape to all operations including element-wise operations. This is <b class="">option B</b>. With that, the above sequence looks like this:</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%a</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <b class=""><12 x float></b>, <12 x float>* <span style="color: rgb(202, 48, 199);" class="">%A</span>, align 16</div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%b</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <12 x float>, <12 x float>* <span style="color: rgb(202, 48, 199);" class="">%B</span>, align 16</div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%c</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <8 x float>, <8 x float>* <span style="color: rgb(202, 48, 199);" class="">%C</span>, align 16</div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span style="color: rgb(202, 48, 199);" class="">%add</span> = <span style="color: rgb(201, 27, 0);" class="">call</span> <12 x float> @llvm.matrix.fadd(<12 x float> <span style="color: rgb(202, 48, 199);" class="">%a</span>, <12 x float> <span style="color: rgb(202, 48, 199);" class="">%b</span>,</b></div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span class="Apple-tab-span" style="white-space: pre;"> </span> <span class="Apple-tab-span" style="white-space: pre;"> </span> <span class="Apple-tab-span" style="white-space: pre;"> </span> <span class="Apple-tab-span" style="white-space: pre;"> </span> <span class="Apple-tab-span" style="white-space: pre;"> </span> <span style="color: rgb(0, 197, 199);" class="">; 3 x 3 column-major:</span></b></div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span style="color: rgb(2, 37, 199);" class="">i32</span> 3, <span style="color: rgb(2, 37, 199);" class="">i32</span> 3, <span style="color: rgb(2, 37, 199);" class="">i1</span> <span style="color: rgb(201, 27, 0);" class="">true</span>)</b></div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%mul</span> = <span style="color: rgb(201, 27, 0);" class="">call</span> <8 x float> @llvm.matrix.multiply(<12 x float> <span style="color: rgb(202, 48, 199);" class="">%add</span>, <8 x float> <span style="color: rgb(202, 48, 199);" class="">%c</span>,</div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal; color: rgb(0, 197, 199);" class=""><span style="" class=""><span class="Apple-tab-span" style="white-space: pre;"> </span> </span>; 3 x 3 3 x 2 column-major:</div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(2, 37, 199);" class="">i32</span> 3, <span style="color: rgb(2, 37, 199);" class="">i32</span> 3, <span style="color: rgb(2, 37, 199);" class="">i32</span> 3, <span style="color: rgb(2, 37, 199);" class="">i32</span> 2, <span style="color: rgb(2, 37, 199);" class="">i1</span> <span style="color: rgb(201, 27, 0);" class="">true</span>)</div><div style="font-family: Menlo; font-size: 11px; margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(201, 27, 0);" class="">store</span> <8 x float> <span style="color: rgb(202, 48, 199);" class="">%mul</span>, <8 x float>* <span style="color: rgb(202, 48, 199);" class="">%MUL</span>, align 16</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> </div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Each computation takes full shape information. The matrix shape is described with the row and columns dimensions and are passed to the intrinsic as constant parameters. We can also pass layout information like whether the matrices are laid out in row-major or column-major order. We’re using column major order in the example and as such the 3 x 3 x float matrix is flattened into a 12 x float vector with one element padding at the end of each column.</div></div></div></div></blockquote><div><br class=""></div><div>I don’t understand this. What is the benefit of providing layout info to element wise operations? This defeats the goal of having simple lowering and representation: you are encoding an ND vector form into the IR in a really ugly way, and this will cause a proliferation of intrinsics that are redundant with the core ops.</div></div></div></blockquote><div><br></div><div>The reason we need that information so that for example we can lower an operation on a 3-element column into a vector of 2 and a scalar op. This should be beneficial for power consumption since for example in the case of a 3x3 with a single element padding rather than operating on 12 elements you’d operate only on 9 (vector ops consume more power than their scalar counterparts).</div><div><br></div><div>That said we should be able to remove these intrinsics in the long term. Once we have masking on the core ops in the IR, we should be able to express the same semantics without dedicated intrinsics.</div><br><blockquote type="cite"><div dir="ltr"><div><div><br class=""></div><div>Also, are 2d matrices really as general as we want to go here? Generally you go from 1 to 2 to N, and it seems lik you are proposing going from 1 (scalar) to 2 (vectors) to 3 (2d arrays) without giving N. If you want to provide layout info for general ND arrays, a single bit is not going to be enough nor is your row/col size representation.</div></div></div></blockquote><div><br></div><div>Yes, we could start with an ND-ready interface too. I was going to start with just matrix because that is my immediate need and then generalize from there but I guess we can start with the more generalized intrinsic even if we first only focus on the 2D implementation?</div><br><blockquote type="cite"><div dir="ltr"><div><br class=""><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">The amount of padding does not require any new parameters. We can compute it using the shape information and the size of the flattened matrix (e.g. %c which is a <3 x 2 x float> also has one element of padding: Elts / Columns - Rows = 8 / 2 - 3 = 1).</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">In order to expose the padding elements to element-wise operation (fadd), option B maps those to intrinsics. We can expose the padding bytes in other ways such that we can still use the built-in element-wise operators. One way would be to extend the vector types with specifying the padding, something like <12 x float pad(3, 7, 11)> (John McCall’s idea) or removing the padding with explicit shufflevectors (Chandler’s idea). I explored the lattter under <b class="">option C</b>. With option C, the same sequence of operations look like this:</div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Helvetica Neue"; color: rgb(69, 69, 69); min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(69, 69, 69);" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%a.padded</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <12 x float>, <12 x float>* <span style="color: rgb(202, 48, 199);" class="">%A</span>, align 16</div><div style="margin: 0px; font-stretch: normal; line-height: normal; color: rgb(0, 197, 199);" class=""><span style="" class=""> </span>; remove padding</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span style="color: rgb(202, 48, 199);" class="">%a</span> = <span style="color: rgb(201, 27, 0);" class="">shufflevector</span> <12 x float> <span style="color: rgb(202, 48, 199);" class="">%a.padded</span>, <12 x float> <span style="color: rgb(201, 27, 0);" class="">undef</span>,</b></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <9 x i32> <i32 0, <span style="color: rgb(2, 37, 199);" class="">i32</span> 1, <span style="color: rgb(2, 37, 199);" class="">i32</span> 2, <span style="color: rgb(2, 37, 199);" class="">i32</span> 4, <span style="color: rgb(2, 37, 199);" class="">i32</span> 5, <span style="color: rgb(2, 37, 199);" class="">i32</span> 6, <span style="color: rgb(2, 37, 199);" class="">i32</span> 8, <span style="color: rgb(2, 37, 199);" class="">i32</span> 9, <span style="color: rgb(2, 37, 199);" class="">i32</span> 10></b></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%b.padded</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <12 x float>, <12 x float>* <span style="color: rgb(202, 48, 199);" class="">%B</span>, align 16</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span style="color: rgb(202, 48, 199);" class="">%b</span> = <span style="color: rgb(201, 27, 0);" class="">shufflevector</span> <12 x float> <span style="color: rgb(202, 48, 199);" class="">%b.padded</span>, <12 x float> <span style="color: rgb(201, 27, 0);" class="">undef</span>,</b></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <9 x i32> <i32 0, <span style="color: rgb(2, 37, 199);" class="">i32</span> 1, <span style="color: rgb(2, 37, 199);" class="">i32</span> 2, <span style="color: rgb(2, 37, 199);" class="">i32</span> 4, <span style="color: rgb(2, 37, 199);" class="">i32</span> 5, <span style="color: rgb(2, 37, 199);" class="">i32</span> 6, <span style="color: rgb(2, 37, 199);" class="">i32</span> 8, <span style="color: rgb(2, 37, 199);" class="">i32</span> 9, <span style="color: rgb(2, 37, 199);" class="">i32</span> 10></b></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""> <span style="color: rgb(202, 48, 199);" class="">%c.padded</span> = <span style="color: rgb(201, 27, 0);" class="">load</span> <8 x float>, <8 x float>* <span style="color: rgb(202, 48, 199);" class="">%C</span>, align 16</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span style="color: rgb(202, 48, 199);" class="">%c</span> = <span style="color: rgb(201, 27, 0);" class="">shufflevector</span> <8 x float> <span style="color: rgb(202, 48, 199);" class="">%c.padded</span>, <8 x float> <span style="color: rgb(201, 27, 0);" class="">undef</span>,</b></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <6 x i32> <i32 0, <span style="color: rgb(2, 37, 199);" class="">i32</span> 1, <span style="color: rgb(2, 37, 199);" class="">i32</span> 2, <span style="color: rgb(2, 37, 199);" class="">i32</span> 4, <span style="color: rgb(2, 37, 199);" class="">i32</span> 5, <span style="color: rgb(2, 37, 199);" class="">i32</span> 6></b></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 13px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><b class=""> <span style="color: rgb(202, 48, 199);" class="">%add</span> = <span style="color: rgb(201, 27, 0);" class="">fadd</span> <9 x float> <span style="color: rgb(202, 48, 199);" class="">%a</span>, <span style="color: rgb(202, 48, 199);" class="">%b</span></b></div></div></div></div></div></blockquote><div><br class=""></div><div>I don’t understand why you’re trying to avoid adding padding. If you are worried about snans, then it seems that you could arrange for the producers of padding to have some guaranteed properties instead of being undef.</div></div></div></blockquote><div><br></div><div>It’s more the extra work/power consumed by the padding elements that concerns me.</div><br><blockquote type="cite"><div dir="ltr"><div><div><br class=""></div><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div></div><div style="font-size: 14px;" class=""><span style="font-family: Menlo; font-size: 13px;" class="">------------------------- </span></div><div class="">Matrix Operation Lowering and Fusion</div><div class=""><div style="font-size: 14px;" class=""><span style="font-family: Menlo; font-size: 13px;" class="">------------------------- </span></div></div><div class=""><span style="font-family: Menlo; font-size: 13px;" class=""><br class=""></span></div><div class="">Common to all of these options is that we are proposing a new IR pass that pre-legalizes matrix operations by lowering them to operations that are natively supported by the HW. This means decomposing the operations into native SIMD operations.</div><div class=""><br class=""></div><div class="">This pass will be used to also de-interleave a chain of matrix operations to manage register pressure. </div></div></div></blockquote><div><br class=""></div><div>Cool. While I’m not really thrilled with yet another “codegenprepare” style pass, I agree it is the most pragmatic given the lack of pervasive global isel etc.</div><div><br class=""></div><div>Just an observation: this pass can be dropped in today and would be useful for large vectors, independent of your matrix work.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div><div class="">Note that we only have shape and layout information on computations. We don’t have them on other instructions like: load, store, phi, select, bitcast, memcpy intrinsic etc. Since the shape and layout information is critical to avoid unnecessary shuffles when working on rows/columns we need to recover this by propagating this information to all matrix operations.</div></div></div></blockquote><div><div><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div class=""><br class=""></div></div></div></div></div></div><div>My sense is that this info is important for your lowering, and your approach of using dataflow analysis to recover this will fail in some cases.</div><div><br class=""></div><div>Since layout and padding information is important, it seems most logical to put this into the type. Doing so would make it available in all these places.</div><div><br class=""></div>That said, I still don’t really understand why you *need* it.<br class=""></div></div></blockquote><div><br></div><div>This seems like the main sticking point so let’s close on this first and see if my answers above are satisfying.</div><div><br></div><div>Thanks for taking a look!</div><div><br></div><div>Adam</div><br><blockquote type="cite"><div dir="ltr"><div><div><br class=""></div><div>-Chris</div></div><br class=""></div></blockquote></body></html>