<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">Hi Chris,</div><div dir="ltr"><br>On Oct 23, 2018, at 11:19 AM, Chris Lattner <<a href="mailto:clattner@nondot.org">clattner@nondot.org</a>> wrote:<br><br></div><blockquote type="cite"><div dir="ltr"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Oct 17, 2018, at 11:57 PM, Tim Shen <<a href="mailto:timshen@google.com" class="">timshen@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="gmail_quote"><div dir="ltr" class="">On Tue, Oct 16, 2018, 11:12 Chris Lattner via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word;line-break:after-white-space" class=""><div dir="auto" style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""><div class="">Adding a new dedicated first-class type has several advantages over mapping them directly to existing IR types like vectors in the front end. Matrices have the unique requirement that both rows and columns need to be accessed efficiently. By retaining the original type, we can analyze entire chains of operations (after inlining) and determine the most efficient <b class="">intermediate layout</b> for the matrix values between ABI observable points (calls, memory operations).<br class=""></div></div></div></div></div></blockquote><div class=""><br class=""></div><div class="">I don’t understand this point at all.</div></div></div></blockquote></div><div class=""><br class=""></div><div class="">I *think* what it says is that a matrix type like <4 x 4 x i32> can be designed in a way that it does not imply the data layout (row major, column major, etc), so that passes feel free to transpose the data into another layout if it's profitable.</div><div class=""><br class=""></div><div class="">It also seems to say that there can be such a global analysis pass to assign one layout per use, then insert necessary transposes. Such pass tries to achieve a global maximum of performance.</div><div class=""><br class=""></div><div class="">However, the argument seems to imply that a vector type like <16 x i32> can't do so. In the favor of option #1, I argue that the plain <16 x i32> enables the same optimization opportunities, as long as the uses are not on ABI boundaries.</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
</blockquote></div>
</div></blockquote></div><br class=""><div class="">Adam and I discussed this at the devmtg, and indeed his idea is to have a “codegen prepare” sort of pass that does some amount of pre-legalization of matrices (which should also be applicable to large vectors) with the goal of reducing register pressure etc.</div><div class=""><br class=""></div><div class="">Adam, can you please summarize the discussions you had and what you see as the next steps here? Thanks!</div></div></blockquote><div><br></div><div>I’d like to write up the main alternatives (flattened vector + shape/layout-aware intrinsics vs. N-dimensional vector) and contrast them with IR at the various stages.</div><div><br></div><div>I am busy with some internal stuff at the moment but hoping to get to this next week.</div><div><br></div><div>Thanks,</div><div>Adam</div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div class=""><br class=""></div><div class="">-Chris</div><div class=""><br class=""></div></div></blockquote></body></html>