[cfe-dev] [RFC] Introducing a byte type to LLVM

Mon Jun 7 05:25:49 PDT 2021

The purpose of an example is to make an assumption about what IR
we have, and to show that it becomes wrong after optimization. I am not
sure I get your comment here.

Same here, by just looking at the IR code here, I don't think you can
> really be sure what the type of the thing being copied is.

That is exactly the point. We do not know what type we are copying - it may
be an integer, or it may be a pointer. Importantly, we can see that the
ptr2int instruction disappeared, and the optimized code returning the
integer actually can escape the pointer. Using a byte type instead of i64
removes implicit pointer casts and therefore helps AA to catch this case.

One can do bitcasts etc, to obscure the actual type of the bytes being
> copied.
> In both those examples, 8 bytes are copied, and the same value is
> returned. So the end program will function the same when run.
> Essentially, there is not enough information in the above code to
> determine if the 8 bytes copied are part of a pointer or not.
> For AA analysis, I would say, more information is needed.

This is just an example of a wrong optimization. It is important because
bugs like this appear partially because of this exact optimization:
https://bugs.llvm.org/show_bug.cgi?id=37469
The mecpy is replaced with load/store pairs and store forwarding happens
incorrectly. I haven't shown it in full, and hence there may be a bit of
confusion. I am happy to elaborate more if this is still unclear!

Thanks,
George

On Mon, Jun 7, 2021 at 2:58 PM James Courtier-Dutton <james.dutton at gmail.com>
wrote:

> On Fri, 4 Jun 2021 at 17:35, George Mitenkov via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Hi Johannes,
> >
> > Sure! The underlying problem is that raw-memory access handlers are
> treated
> > as integers, while they are not really integers. Especially std::byte
> that specifically
> > states that it has raw-memory access semantics. This semantic mismatch
> can make
> > AA wrong and a pointer to escape.
> >
> > Consider the following LLVM IR that copies a pointer:
> You are making an assumption here. By just looking at the IR code
> here, I don't think you can really be
> sure what the type of the thing being copied is.
> > %src8 = bitcast i8** %src to i8*
> > %dst8 = bitcast i8** %dst to i8*
> > call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1
> false)
> > %load = load i8*, i8** %dst
> > %addr = ptrtoint i8* %load to i64
> > ret i64 %addr
> >
> > If we optimize the call to memcpy, then the IR becomes
> Same here, by just looking at the IR code here, I don't think you can
> really be sure what the type of the thing being copied is.
> > %src64 = bitcast i8** %src to i64*
> > %dst64 = bitcast i8** %dst to i64*
> > %addr = load i64, i64* %src64, align 1
> > store i64 %addr, i64* %dst64, align 1
> > ret i64 %addr
> >
>
> One can do bitcasts etc, to obscure the actual type of the bytes being
> copied.
> In both those examples, 8 bytes are copied, and the same value is
> returned. So the end program will function the same when run.
> Essentially, there is not enough information in the above code to
> determine if the 8 bytes copied are part of a pointer or not.
> For AA analysis, I would say, more information is needed.
>
> One can only really be sure what type those bytes are, and that they
> are a pointer when they are actually used as a pointer argument to a
> LOAD or STORE.
> There are some other operations that can also be used to infer whether
> it is a pointer or not, but the LOAD/STORE is the simplest example.
>
> Kind Regards
>
> James
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210607/49ab52cb/attachment.html>