<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 3, 2016 at 8:43 AM, Mehdi Amini via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div></div><span class="gmail-"><blockquote type="cite"><div><span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline">The key bit here is that I can describe transformations in terms of these abstract domains without knowing anything about how the frontend might be using such a domain or how the backend might lower it.  In particular, if I have the sequence:</span></div></blockquote></span><span class="gmail-"><blockquote type="cite"><div><span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline">%v = load i64, %p atomic scope {domain3 only}</span></div></blockquote></span><span class="gmail-"><blockquote type="cite"><div><span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline">fence seq_cst scope={domain1 only}</span></div></blockquote></span><span class="gmail-"><blockquote type="cite"><div><span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline">%v2 = load i64, %p atomic scope {domain3 only}</span></div></blockquote></span><span class="gmail-"><blockquote type="cite"><div><span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline">I can tell that the two loads aren't order with respect to the fence and that I can do load forwarding here.</span></div></blockquote></span><span class="gmail-"></span><span class="gmail-"><div><br></div></span><div>I see the current proposal as a strip-down version what you describe: the optimizer can reason about operations inside a single scope, but can’t assume anything cross-scope (they may or may not interact with each other).</div><div><br></div><div>What you describes seems like having always non-overlapping domains (from the optimizer point of view), and require the frontend to express the overlapping by attaching a “list" of domains that an atomic operation interacts with.</div></div></div></blockquote><div><br><br></div></div>There is another way to tackle this, and Chandler had hinted at it in an old thread:<br><a href="http://lists.llvm.org/pipermail/llvm-dev/2015-January/080236.html">http://lists.llvm.org/pipermail/llvm-dev/2015-January/080236.html</a><br><br>Quoting from Chandler's email:<br>"Essentially, I think target-independent optimizations are still attractive, but we might want to just force them to go through an actual target-implemented API to interpret the scopes rather than making the interpretation work from first principles. I just worry that the targets are going to be too different and we may fail to accurately predict future targets' needs."<br><br></div><div class="gmail_extra">Note that in Philip's example above, the optimization is not really asking whether the two loads are ordered. It is asking whether the second load can be reordered to occur before the fence. Whatever the question, it can be implemented as a query to the target as a simple predicate. For example, "isOrdered(inst1, inst2)" or "canEliminate(store1, store2)". The latter query is when the optimizer wants to eliminate a store if it is followed by another store to the same location. The target can interpret the scope in whatever way and return true/false.<br><br></div><div class="gmail_extra">The advantage here is that now the optimizer does not need to know anything at all about the scopes. For example, in memory models like OpenCL, the scopes are nested, and it should be sufficient to specify just one bit in the mask and it could "automatically include" lower bits. The optimizer does not need to know that. In fact the implementation need not even be a bitmask. It can just be a set of opaque "sigils" like in the original design.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">In practice, I am wondering how often will scopes really affect optimizations. At least on targets that have memory models similar to OpenCL 2.x, it's likely that most queries have answers independent of scopes.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Sameer.<br></div></div>