<div dir="ltr"><div dir="ltr">I have written a longer article that resulted as a byproduct of thinking through the problem space of this proposal: <a href="https://nhaehnle.blogspot.com/2021/06/can-memcpy-be-implemented-in-llvm-ir.html">https://nhaehnle.blogspot.com/2021/06/can-memcpy-be-implemented-in-llvm-ir.html</a></div><div><br></div><div>What happened is that I ended up questioning some really fundamental things, like, can we even implement memcpy? :) The answer is a qualified Yes, but I found it to be a good framework for thinking about the fundamentals of what is discussed here, so I published this in the hope that others find it useful.</div><div><br></div><div>tl;dr: This discussion is ultimately all about pointer provenance. There is a gap in the expressiveness of LLVM IR when it comes to that, with surprising consequences for memcpy (and similar operations). From an aesthetics point of view, filling this gap has a lot of appeal, and the "byte" proposal points in that direction. However, I have some issues with the details of the proposal, and it is so intrusive that it needs to be justified by more than just aesthetics.<br><br>The correctness issues in the problem space can be solved by much less intrusive means. The justification for the more intrusive means would be better alias analysis, but I don't think this case has been built well enough so far. We should also consider alternatives (though I don't think there are any that are truly simple).<br><br>Apart from that, we need to be much more precise in our documentation of pointer provenance in LangRef (e.g.: what does llvm.memcpy do, exactly -- the mentioned bug 37469 could technically be a bug in the loop idiom recognizer!), and I like the idea of an `unrestrict(p)` instruction as a simpler and more evocative spelling of `inttoptr(ptrtoint(p))`.<br><br>I would also like to better understand how this interacts with the C99 "restrict" work that Jeroen pointed out. Overall, this is an important discussion to have but I feel we're only at the very beginning.<br></div><div><br></div><div>tl;dr of the tl;dr: It's complicated :)<br></div><div><br></div><div>Cheers,</div><div>Nicolai<br></div><div><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 10, 2021 at 1:15 AM Hal Finkel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

On 6/9/21 12:03, Chris Lattner wrote:<br>

> On Jun 6, 2021, at 8:52 AM, Hal Finkel <<a href="mailto:hal.finkel.llvm@gmail.com" target="_blank">hal.finkel.llvm@gmail.com</a>> wrote:<br>

>> I'll take this opportunity to point out that, at least historically, <br>

>> the reason why a desire to optimize around ptrtoint keeps resurfacing <br>

>> is because:<br>

>><br>

>>  1. Common optimizations introduce them into code that did not <br>

>> otherwise have them (SROA, for example, see convertValue in SROA.cpp).<br>

>><br>

>>  2. They're generated by some of the ABI code for argument passing <br>

>> (see clang/lib/CodeGen/TargetInfo.cpp).<br>

>><br>

>>  3. They're present in certain performance-sensitive code idioms <br>

>> (see, for example, ADT/PointerIntPair.h).<br>

>><br>

>> It seems to me that, if there's design work to do in this area, one <br>

>> should consider addressing these now-long-standing issues where we <br>

>> introduce ptrtoint by replacing this mechanism with some other one.<br>

>><br>

> I completely agree.  These all have different solutions, I’d prefer to <br>

> tackle them one by one.<br>

><br>

> -Chris<br>

><br>

<br>

I agree, these different problems have three different solutions. Also, <br>

let me add that I see three quasi-separable discussions here (accounting <br>

for past discussions on the same topic):<br>

<br>

  1. Do we have a consistency problem with how we treat pointers and <br>

their provenance information? The answer here is yes (see, e.g., the GVN <br>

examples from this thread).<br>

<br>

  2. Do we need to do more than be as conservative as possible around <br>

ptrtoint/inttoptr usages? This is relevant because trying to be clever <br>

here is often where inconsistencies around our pointer semantics are <br>

exposed, although it's not always the case that problems involve <br>

inttoptr. Addressing the points I raised above will lessen the <br>

motivation to be more aggressive here (although, in itself, that will <br>

not fix the semantic inconsistencies around pointers).<br>

<br>

  3. Does introducing a byte type help resolve the semantic issues <br>

around pointers? I don't yet understand why this might help.<br>

<br>

Thanks again,<br>

<br>

Hal<br>

<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature">Lerne, wie die Welt wirklich ist,<br>aber vergiss niemals, wie sie sein sollte.</div></div>