[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM
Nuno Lopes via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 11 04:18:09 PDT 2021
Thank you for the writeup!
I think it summarizes the problem & the solution space pretty well and clearly. Well worth a read for anyone interested in contributing to the discussion.
From: Nicolai Hähnle
Sent: 11 June 2021 06:47
Subject: Re: [cfe-dev] [llvm-dev] [RFC] Introducing a byte type to LLVM
I have written a longer article that resulted as a byproduct of thinking through the problem space of this proposal: https://nhaehnle.blogspot.com/2021/06/can-memcpy-be-implemented-in-llvm-ir.html
What happened is that I ended up questioning some really fundamental things, like, can we even implement memcpy? :) The answer is a qualified Yes, but I found it to be a good framework for thinking about the fundamentals of what is discussed here, so I published this in the hope that others find it useful.
tl;dr: This discussion is ultimately all about pointer provenance. There is a gap in the expressiveness of LLVM IR when it comes to that, with surprising consequences for memcpy (and similar operations). From an aesthetics point of view, filling this gap has a lot of appeal, and the "byte" proposal points in that direction. However, I have some issues with the details of the proposal, and it is so intrusive that it needs to be justified by more than just aesthetics.
The correctness issues in the problem space can be solved by much less intrusive means. The justification for the more intrusive means would be better alias analysis, but I don't think this case has been built well enough so far. We should also consider alternatives (though I don't think there are any that are truly simple).
Apart from that, we need to be much more precise in our documentation of pointer provenance in LangRef (e.g.: what does llvm.memcpy do, exactly -- the mentioned bug 37469 could technically be a bug in the loop idiom recognizer!), and I like the idea of an `unrestrict(p)` instruction as a simpler and more evocative spelling of `inttoptr(ptrtoint(p))`.
I would also like to better understand how this interacts with the C99 "restrict" work that Jeroen pointed out. Overall, this is an important discussion to have but I feel we're only at the very beginning.
tl;dr of the tl;dr: It's complicated :)
On Thu, Jun 10, 2021 at 1:15 AM Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote:
On 6/9/21 12:03, Chris Lattner wrote:
> On Jun 6, 2021, at 8:52 AM, Hal Finkel <hal.finkel.llvm at gmail.com <mailto:hal.finkel.llvm at gmail.com> > wrote:
>> I'll take this opportunity to point out that, at least historically,
>> the reason why a desire to optimize around ptrtoint keeps resurfacing
>> is because:
>> 1. Common optimizations introduce them into code that did not
>> otherwise have them (SROA, for example, see convertValue in SROA.cpp).
>> 2. They're generated by some of the ABI code for argument passing
>> (see clang/lib/CodeGen/TargetInfo.cpp).
>> 3. They're present in certain performance-sensitive code idioms
>> (see, for example, ADT/PointerIntPair.h).
>> It seems to me that, if there's design work to do in this area, one
>> should consider addressing these now-long-standing issues where we
>> introduce ptrtoint by replacing this mechanism with some other one.
> I completely agree. These all have different solutions, I’d prefer to
> tackle them one by one.
I agree, these different problems have three different solutions. Also,
let me add that I see three quasi-separable discussions here (accounting
for past discussions on the same topic):
1. Do we have a consistency problem with how we treat pointers and
their provenance information? The answer here is yes (see, e.g., the GVN
examples from this thread).
2. Do we need to do more than be as conservative as possible around
ptrtoint/inttoptr usages? This is relevant because trying to be clever
here is often where inconsistencies around our pointer semantics are
exposed, although it's not always the case that problems involve
inttoptr. Addressing the points I raised above will lessen the
motivation to be more aggressive here (although, in itself, that will
not fix the semantic inconsistencies around pointers).
3. Does introducing a byte type help resolve the semantic issues
around pointers? I don't yet understand why this might help.
LLVM Developers mailing list
llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev