<div dir="ltr"><div>Thank you for the summary, James.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 10, 2020 at 9:01 PM James Y Knight <<a href="mailto:jyknight@google.com">jyknight@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">I view what happened is that we've introduced effectively a new form of IR (pre-coroutine-split IR), yet when doing so, we failed to realize the impact that a "suspend" allowing "thread identity" to change would have on the rest of LLVM. So, oops! And now we need to design what things really should be looking like here going forward.<br></div><div class="gmail_quote"><div><br></div><div>Let me try to make the two proposals so far more concrete.</div><div><br></div><div>Assumptions:</div><div>- We're not going to move CoroSplit early in the pipeline, so this issue really does need to be solved.</div><div>- We should not change the meaning of the C `attribute((const))` to prohibit reading the thread-identity, but, we can change what LLVM IR that transforms into.</div><div>- As a prerequisite, we have already solved the TLS-address-is-a-constant issue -- e.g. by moving TLS address lookup to an intrinsic (I have some other comments about that issue, separately).</div></div></div></blockquote><div><br></div><div>Sounds right.<br></div><div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Proposal 1:</div><div>- The readnone attribute is redefined to prohibit depending on thread id.</div></div></div></blockquote><div><br></div><div>As I've explained before, I don't think this is a redefinition. It is the context in which the definition is interpreted that has changed, not the definition itself.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>- Create a new attribute "readthreadidonly", which is similar to readnone, except that it may access the current thread identity.</div><div>- The new @llvm.threadlocaladdr intrinsic (or whatever it ends up as) is given this attribute.</div><div></div><div>- Clang's `attribute((const))` emits this attribute, instead of readnone.</div><div>- Any other frontends which emit the "readnone" attribute may also need to be updated and emit readthreadidonly instead of readonly.</div><div>- Optimization passes that are oblivious to the new readthreadidonly will fail correct -- they won't <i>misoptimize</i> by moving such a call across an llvm.coro.suspend call (good) -- but they also will regress performance of straightline code (bad).</div><div>- To avoid regressing performance, we'll need to update all optimization passes which currently handle readnone, to also handle readthreadidonly similarly.</div><div> - Option 1: (quick'n'dirty) treat readthreadidonly as readnone (allowing arbitrary movement) on any function which doesn't contain an llvm.coro.suspend, and treat readthreadidonly as inaccessiblememonly if there are any suspends.</div><div> - Option 2: actually understand that readthreadid can be moved anywhere except across an llvm.coro.suspend call.</div></div></div></blockquote><div><br></div><div>I wonder if there's an:</div><div><br></div><div>Option 3: After (or during) coroutine splitting, upgrade readthreadidonly to readnone.</div><div><br></div><div>This is based on the assumption that after coroutine splitting, threadid is no longer mutable state.</div><div><br></div><div>Doing the upgrade any earlier is incorrect because of function inlining.</div><div><br></div><div>This is functionally a subset of what Option 1 is doing, but requires changes to fewer transforms.<br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Proposal 2:</div><div>- The readnone attribute (continues to) allow depending on thread id.</div><div><div>- C frontend's `attribute((const))` continues to emit the readnone attribute.</div><div>- The new @llvm.threadlocaladdr intrinsic is marked readnone.</div><div></div></div><div>- (Optional) add a not_threadid_dependent attribute, which says that a function doesn't care which thread it's invoked on.<br></div><div>- Neither clang nor other frontends need to change which IR attributes they emit.</div><div></div><div>- Optimization passes must be taught not to move readnone calls across llvm.coro.suspend, or they will be incorrect.</div><div> - Option 1: (quick'n'dirty) treat 'readnone' as 'inaccessiblememonly' in any function which contains a llvm.coro.suspend.</div><div> - Option 2: actually understand that readnone can move anywhere in the function except across an llvm.coro.suspend call (unless not_threadid_dependent is present, in which case it can also move across that).</div><div></div><div><br></div><div>I'm still slightly in favor of proposal 2, but either seems OK. Both seem like they'll be effectively the same amount of work to implement in LLVM. I feel like having the "not_threadid_dependent" attribute be separate from "readnone" (doesn't access memory) is logical and consistent -- and having them be distinct may potentially also be useful in other circumstances.</div><div> </div></div></div>
</blockquote></div><div><br></div><div>If we compare the two proposals, they ultimately offer the same level of expressiveness:</div><div><br></div><div>p1 readnone == p2 readnone+not_threadid_dependent</div><div>p1 readthreadidonly == p2 readnone<br></div><div>p1 _no attribute_ == p2 _no attribute_<br></div><div><br></div><div>... and so yes, it's plausible that they're effectively the same amount of work in the end.</div><div><br></div><div>The arguments against each of the proposals as far as I'm aware:</div><div><br></div><div><div>Downside of proposal 1: it introduces an attribute "readthreadidonly", which seems unappealing, presumably because it feels like too much special-casing of threadid.<br></div><div><br></div><div>Downside of proposal 2: it changes the meaning of "read none" from "reads no mutable state" to "reads almost no mutable state", which is unintuitive and invites bugs due to misunderstandings.</div><div><br></div>I agree that a "readthreadidonly" seems unappealing, but the downside of proposal 2 weighs far heavier to me. It adds permanent mental baggage that everybody working with LLVM IR will have to carry around with them for the indefinite future.</div><div><br></div><div>Perhaps there are ways to address the downside of proposal 1? For example, could we make a parameterized version of `readnone` that reads like `readnoneexcept(threadid)`? This could also fit nicely with a `readnoneexcept(inaccessiblemem)`.</div><div><br></div><div>Cheers,</div><div>Nicolai<br></div>-- <br><div dir="ltr" class="gmail_signature">Lerne, wie die Welt wirklich ist,<br>aber vergiss niemals, wie sie sein sollte.</div></div>