<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Oct 25, 2016 at 10:48 PM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com" target="_blank">clattner@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>Responding to both of your emails in one, sorry for the delay:</div><div><br></div><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><span><div>On Oct 25, 2016, at 11:20 AM, Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:</div></span><div><div dir="ltr"><div class="gmail_extra"><span><div class="gmail_extra">I think there are a couple of additional considerations we should make here:</div></span><div class="gmail_extra"><ul><li>What are we trying to model? To me it's clear that GlobalConstant is for modelling integers, not pointers. That alone may not necessarily be enough to motivate a representational change, but…</li></ul></div></div></div></div></div></div></blockquote><div>I understand where you’re coming from, but I think we’re modeling three different things, and disagreeing about how to clump them together. The three things I see in flight are:</div><div><br></div><div>1) typical globals that are laid out in some unknown way in the address space.</div><div>2) globals that may be tied to a specific knowable address range due to a limited compilation model (e.g. a deeply embedded core) that fits into an immedaite range (e.g. 0…255, 0…65536, etc).</div><div>3) Immediates that are treated as symbolic for CFI’s perspective (so they can’t just be used as a literal immediate) that are resolved at link time, but are known to have limited range.</div><div><br></div><div>There is also "4) immediates with an obvious known value”, but those are obviously ConstantInt’s and not interesting to discuss here.</div><div><br></div><div>The design I’m arguing for is to clump #2 and #3 into the same group.</div></div></div></blockquote><div><br></div><div>I am not sure if this is sound if we want the no-alias assumption (see also below) to hold for #2 but not for #3.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><div> This can be done one of two different ways, but both ways use the same “declaration side” reference, which has a !range metadata attached to it. The three approaches I see are:</div><div><br></div><div>a) Introduce a new GlobalConstant definition, whose value is the concrete address that the linker should resolve.</div><div>b) Use an alias as the definition, whose body is a ptrtoint constant of the same value.</div><div>c) Use a zero size globalvariable with a range metadata specifying the exact address decided.</div><div><br></div><div>I’m not very knowledgable about why approach b won’t work, but if it could, it seems preferable because it fits in with our current model.</div></div></div></blockquote><div><br></div><div>b would work in that it would give us the right bits in the object file, but it would be a little odd to use a different type for declarations as for definitions. That said, I don't have a strong objection to it.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><div>After that, approach C seems like a nice fallback because it again doesn’t require major extensions to our model. Finally, approach A seems like the most explicit model, but has the disadvantage of introducing a new IR concept that will be rarely used (and thus prone to bugs due to it not being widely tested).</div><span><div><br></div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_extra"><ul><li>If we make our representation fit the model, does that improve the quality of the code? There's clearly been a long standing assumption that GlobalValues are of pointer type. Breaking that assumption has in fact found potential bugs (see e.g. my changes to ValueTracking.cpp in D25930, as well as D25917 which was a requirement for the representational change), and seems like it may help prevent bugs in the future, so I'd say that the answer is most likely yes.</li></ul></div></div></div></div></div></div></blockquote></span>I’d argue the other side of it. The quality of the code is higher if we have invariants (like all globals are pointers) because that simplifies assumptions by eliminating cases where “is a pointer” appears to be true, but isn’t actually true in all cases.</div></div></blockquote><div><br></div><div>We'd still be able to have an "is a pointer" invariant, but on a specific subset of globals. In practice I'd expect that a frontend that chooses to make that assumption can use GlobalPointer in place of GlobalValue.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>I’m not an expert on CFI or how widely it will ultimately impact the compiler hacker consciousness, but I’m pretty sure that the current model for globals and functions will remain more prominent. If you choose to break this invariant, you’ll be continually swimming upstream against assumptions made throughout the compiler, both in code written today but also in code written in the future.</div></div></blockquote><div><br></div><div>I think this is as least as likely to be true in the case where we do choose to use pointer modelling, as those assumptions are going to exist no matter what representation we choose, at least until they are found and fixed. The difference is that when assumptions are broken, they will be more likely to break loudly (e.g. casting assertion failures) as opposed to subtly as could have happened with the non-null assumption in ValueTracking. CFI will have full integration testing on supported platforms, and has been deployed in Chrome on Linux since August, so at least the important known use cases should continue to work. The unimportant use cases may not, but that seems like it could be the case for any IR feature.</div><div><br></div><div>You can see for yourself from D25930 that (relatively speaking) very few places in the entire codebase need to care about this at the type system level. I was in fact surprised at how little code I needed to change for an assumption that has apparently existed since 2001. Granted this doesn't reveal non-type-system-level assumptions, but as I mentioned those assumptions would exist anyway.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><span>On Oct 25, 2016, at 6:52 PM, Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:<br></span><div><span><blockquote type="cite"><br><div><div dir="ltr"><div class="gmail_extra">So, to summarise the overall consensus on this thread:</div><div class="gmail_extra">- We should introduce a GlobalConstant class which supports definitions and declarations</div></div></div></blockquote><div><br></div></span><div>Why would globalconstant need to support declarations? Seems like an external global should be able to handle this case.</div></div></div></blockquote><div><br></div><div>An external globalvariable? I am not sure how to reconcile this with your comment on D25878:</div><div><br></div><div>> LLVM IR global variables should always be assumable that they don't alias.</div><div><br></div><div>I'm thinking of the case where we have two globalconstants which happen to have the same value.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><span><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra">- It should derive from GlobalValue</div><div class="gmail_extra">- No type changes for GlobalValue (i.e. still required to be pointer type)</div><div class="gmail_extra">- To communicate the range we should use !range metadata and support it on GlobalConstant as well as GlobalVariable</div><div class="gmail_extra"><br></div><div class="gmail_extra">Although I am not convinced that this is the right representation I appear to be in the minority here, so if this seems reasonable to folks I will go ahead and implement it.<br></div></div></div></blockquote><br></span></div><div>I’m not trying to steamroll you over: what is your specific concern?</div></div></blockquote><div><br></div><div>My specific concern is pointer modeling. I think Chandler put it well in his most recent mail.</div><div><br></div><div>Thanks,</div></div>-- <br><div class="gmail-m_-5392016208327710415gmail-m_-4934007853574112141m_5506649115950594877gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>
</div></div>