<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Michael, </div><div><br></div><div>I understand that you are very motivated to get the HLE change in, but I still think that the right approach is to implement target specific intrinsics start-to-end. The current approach is unacceptable because it adds complexity and hurts the compile time and it does not add any value for most users. Please implement this as intrinsics without any changes to the IR. </div><div><br></div><div>Thanks,</div><div>Nadav</div><div><br></div><br><div><div>On Apr 17, 2013, at 10:10 AM, Michael Liao <<a href="mailto:michael.liao@intel.com">michael.liao@intel.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Hi Nadav<br><br>On Wed, 2013-04-17 at 09:13 -0700, Nadav Rotem wrote:<br><blockquote type="cite"><br><blockquote type="cite">Here, I am only talking hinting atomic instruction IR with HLE.<br>Hinting<br>existing atomic instruction is straight-forward as it captures the<br>goal<br>maintaining a portable IR between target with HLE and the one<br>without<br>HLE. In addition, all atomic-aware optimizations will work the same.<br><br></blockquote><br><br>You are still proposing to change the IR by adding new metadata.<br></blockquote><br>Yeah, the IR change is the same as the one in previous proposal. As I<br>just heard the concern on adding new feature in SelectionDAG not IR.<br><br><blockquote type="cite"><br><br><blockquote type="cite">For this proposal, adding a lower pass isn't straightforward<br>compared to<br>add trivial feature in SelectionDAG directly but it still obvious<br>enough<br>to understand.<span class="Apple-converted-space"> </span><br></blockquote><br><br>Neither one of these changes is trivial. Both are very intrusive.<br>People who don't care about HLE can't opt-out and have to pay the cost<br>of the added complexity and compile time (see below).<span class="Apple-converted-space"> </span><br></blockquote><br>Could you list optimization handling atomic instruction but check<br>metadata attached on them? AFAIK, all existing optimizations around<br>atomic instructions don't check metadata at all. As HLE hint doesn't<br>change the semantic of atomic instruction at all, if you don't care HLE,<br>you don't need check them. They will work the same.<br><br><blockquote type="cite"><br><blockquote type="cite">The major benefit after adding a lowering pass<br>translating atomic instruction into target native atomic intrinsic<br>is:<br>it will reduce burden of implementing all atomic instructions on the<br>backend side. Most target hardwares only provide the minimal atomic<br>support instructions (i.e. LL/SC in all RISC targets), with this<br>pass,<br>those targets only need to care the codegen of LL/SC.<br><br></blockquote><br><br>Why is it the job of the compiler to implement HLE-intrinsics on<br>non-HLE targets ? It needs to be in a library.<span class="Apple-converted-space"> </span><br></blockquote><br>It's not specific part for HLE support but a refactoring on how we share<br>the atomic instruction code generation among targets as I notice most of<br>targets in our backends cannot support full feature of atomic<br>instructions and each backend duplicates too much efforts on supporting<br>atomic instructions. I added here as the major issue in previous<br>discussion is the new changes added in DAG. This refactoring will bypass<br>DAG in some kind.<br><br><blockquote type="cite"><br><blockquote type="cite"><blockquote type="cite"><br><br>Other compilers have other considerations and I don't think that<br>we<br>need to compromise compile time or flexibility for this,<br>especially if<br>we have other alternatives.<span class="Apple-converted-space"> </span><br></blockquote><br>Could you elaborate the compile time overhead? From my measure, this<br>pass is fast as this pass has O(N) complexity and only processes<br>atomic<br>instructions. If we could keep tracking whether a function has<br>atomic<br>instruction or not, we could be even faster by skipping them<br>totally.<br><br><br></blockquote><br><br>Yes, its an O(N) pass that scans all of the instructions and does<br>something with some of the instructions. The problem is that most<br>people who don't care about will still have to pay the cost of the HLE<br>implementation.<br></blockquote><br>Again. That pass is added to refactor the atomic instruction code<br>generation in our current backend. By itself, it's not a HLE-specific<br>part. The reason I put it in this proposal is because people have<br>concern on adding new features in SelectionDAG.<br><br>In addition, this pass is easily skipped by keeping track whether a<br>function has atomic instructions. If no, we skip this pass. The overhead<br>is quite manageable.<br><br>- Michael<br><br><blockquote type="cite"><br><br><br><blockquote type="cite">I agree with you on one hardware features like CRC, encryption, but<br>not<br>HLE feature, which is a common feature for hardware with TM support.<br></blockquote><br><br>I don't think that HLE is very common.<span class="Apple-converted-space"> </span><br><br><blockquote type="cite">I<br>don't think GCC community doesn't consider the approach you<br>mentioned.<br>In fact, if you look into GCC mailing list, people rejected that<br>proposal originally and preferred the approach hinting atomic<br>builtins.<br><br></blockquote><br>I would like to see the HLE features implemented as intrinsics from<br>start to finish without adding any burden on the rest of the<br>compiler. </blockquote></div></blockquote></div><br></body></html>