<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div><blockquote type="cite" class=""><div class="">On Dec 19, 2018, at 11:09 AM, Stephen Canon via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><blockquote type="cite" class="">On Dec 18, 2018, at 10:18 PM, Adam Nemet <<a href="mailto:anemet@apple.com" class="">anemet@apple.com</a>> wrote:<br class=""></blockquote><div class=""><blockquote type="cite" class=""><br class=""><div class=""><div dir="auto" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div dir="ltr" class=""><div class=""><div class="">I don’t understand this.  What is the benefit of providing layout info to element wise operations?  This defeats the goal of having simple lowering and representation: you are encoding an ND vector form into the IR in a really ugly way, and this will cause a proliferation of intrinsics that are redundant with the core ops.</div></div></div></blockquote><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">The reason we need that information so that for example we can lower an operation on a 3-element column into a vector of 2 and a scalar op.  This should be beneficial for power consumption since for example in the case of a 3x3 with a single element padding rather than operating on 12 elements you’d operate only on 9 (vector ops consume more power than their scalar counterparts).</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">That said we should be able to remove these intrinsics in the long term.  Once we have masking on the core ops in the IR, we should be able to express the same semantics without dedicated intrinsics.</div></div></div></blockquote><br class=""></div><div class="">There may be some cases where this holds (maybe with 5x5 or something), but most of the time I would expect to get better power from doing a four-element vector op with one wasted lane than doing two arithmetic ops (plus possibly extracts and inserts, depending on physical layout details).</div><div class=""><br class=""></div><div class="">Explicit masking or arranging for zero in padding lanes seems like a better way forward to me.</div><div class="">– Steve</div></div></div></blockquote><br class=""></div><div>I spent some time chatting with Adam about this and have a better understanding of his concerns here. It seems to me that if having masking intrinsics is the long-term solution we want, we should do that now (for add and sub) rather than building arbitrary matrix layout info into intrinsics, since a mask has all the information that we actually need.</div><div><br class=""></div><div>– Steve</div></body></html>