<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Oct 16, 2018 at 11:12 AM Chris Lattner via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">On Oct 10, 2018, at 11:09 PM, Adam Nemet via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:</div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><div style="word-wrap:break-word;line-break:after-white-space"><div dir="auto" style="word-wrap:break-word;line-break:after-white-space">Hi,<div><br></div><div>We are proposing first-class type support for a new matrix type.  </div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>Interesting!  Here are some thoughts, I’m sorry but I haven’t read the responses downthread.</div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><blockquote type="cite"><div><div style="word-wrap:break-word;line-break:after-white-space"><div dir="auto" style="word-wrap:break-word;line-break:after-white-space"><div>This is a natural extension of the current vector type with an extra dimension.<br>For example, this is what the IR for a matrix multiply would look like for a 4x4 matrix with element type float:<br><br><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div>%0 = load <4 x 4 x float>, <4 x 4 x float>* %a, align 16</div><div>%1 = load <4 x 4 x float>, <4 x 4 x float>* %b, align 16</div><div>%2 = call <4 x 4 x float> @llvm.matrix.multiply.m4_4f32.m4_4f32.m4_4f32(<4 x 4 x float> %0, <4 x 4 x float> %1)</div><div>store <4 x 4 x float> %2, <4 x 4 x float>* %c, align 16</div></blockquote></div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>LLVM already has a pretty general vector type (arbitrary number of elements).  I’m aware of hardware that has rectangular vectors, e.g. nvidia tensor cores, Google has a proprietary in-house design with non-square vector registers, etc.</div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><blockquote type="cite"><div><div style="word-wrap:break-word;line-break:after-white-space"><div dir="auto" style="word-wrap:break-word;line-break:after-white-space"><div><div>Currently we support element-wise binary operations, matrix multiply, matrix-scalar multiply, matrix transpose, extract/insert of an element.  Besides the regular full-matrix load and store, we also support loading and storing a matrix as a submatrix of a larger matrix in memory.  We are also planning to implement vector-extract/insert and matrix-vector multiply.<br><br>All of these are currently implemented as intrinsics.  Where applicable we also plan to support these operations with native IR instructions (e.g. add/fadd).<br></div></div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div>Ok.  Makes sense, I agree that supporting the existing pointwise vector operations makes sense.</div><div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><br><blockquote type="cite"><div><div style="word-wrap:break-word;line-break:after-white-space"><div dir="auto" style="word-wrap:break-word;line-break:after-white-space"><div><div>These are exposed in clang via builtins.  E.g. the above operations looks like this in C/C++:<br><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div>typedef float mf4x4_t __attribute__((matrix_type(4, 4)));</div><div><br></div><div>mf4x4_t add(mf4x4_t a, mf4x4_t b) {</div><div>  return __builtin_matrix_multiply(a, b);</div><div>}</div></blockquote></div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>I’d recommend splitting the clang discussion from the LLVM discussion, they are completely different tradeoffs involved.  I’ll focus on the LLVM IR side of things.</div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><div><br></div><blockquote type="cite"><div><div style="word-wrap:break-word;line-break:after-white-space"><div dir="auto" style="word-wrap:break-word;line-break:after-white-space"><div><div>** Benefits **<br><br>Having matrices represented as IR values allows for the usual algebraic and redundancy optimizations.  But most importantly, by lifting memory aliasing concerns, we can guarantee vectorization to target-specific vectors.  <br></div></div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>Right, it is basically the same benefit as having a vector type.  You also get the ability to have specific alignments etc.</div><div><br></div><div>I think there are several options in the design space here:</div><div><br></div><div>1. Do nothing to the type system, but just use the existing vector types (<16 x float> in this case) with a new set of operations.</div><div>2. Introduce a “matrix” concept and associated operations.</div><div>3. Introduce N-dimensional vector registers and associated operations.</div><div><br></div><div><br></div><div>Personally, I’d be in favor of going with #1 followed by #3 followed distantly by #2.</div></div></div></blockquote><div><br></div><div>FWIW, I strongly prefer #1 to the other options.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><div>The reason I’m opposed to a matrix *type* is that this is far too specific of a concept to put into LLVM.  We don’t even have signedness of integers in the type system: the instruction set is the major load bearing part of the IR design, and the instruction set is extensible through intrinsics.</div></div></div></blockquote><div><br></div><div>Strongly agree.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div>Arguing in favor of #1: AFAICT, you only need to add the new intrinsics to do matmul etc.  You could just define them to take 1D vectors but apply math to them that interprets them as a 2D space.  This is completely an IR level modeling issue, and would be a very non-invasive patch.  You’re literally just adding a few intrinsics.  All the pointwise operations and insert/extract/shuffles will “just work”.  The frontend handles mapping 2d indices to 1D indices.<br></div></div></blockquote><div><br></div><div>Even better, it makes it easy to support interesting row-major and col-major style operations w/o further diversification of the type system.</div></div></div>