[cfe-dev] [RFC] Introducing a byte type to LLVM

Johannes Doerfert via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 7 07:56:22 PDT 2021


On 6/7/21 7:25 AM, George Mitenkov wrote:
> The purpose of an example is to make an assumption about what IR
> we have, and to show that it becomes wrong after optimization. I am not
> sure I get your comment here.

FWIW, I didn't understand the comment either.


> Same here, by just looking at the IR code here, I don't think you can
>> really be sure what the type of the thing being copied is.
> That is exactly the point. We do not know what type we are copying - it may
> be an integer, or it may be a pointer. Importantly, we can see that the
> ptr2int instruction disappeared, and the optimized code returning the
> integer actually can escape the pointer. Using a byte type instead of i64
> removes implicit pointer casts and therefore helps AA to catch this case.

This is a fundamental issue I have with this example and the explanation:
What does "catch this case" mean? The pointer does escape, what is
there to catch/improve?


> One can do bitcasts etc, to obscure the actual type of the bytes being
>> copied.
>> In both those examples, 8 bytes are copied, and the same value is
>> returned. So the end program will function the same when run.
>> Essentially, there is not enough information in the above code to
>> determine if the 8 bytes copied are part of a pointer or not.
>> For AA analysis, I would say, more information is needed.
> This is just an example of a wrong optimization. It is important because
> bugs like this appear partially because of this exact optimization:
> https://bugs.llvm.org/show_bug.cgi?id=37469
> The mecpy is replaced with load/store pairs and store forwarding happens
> incorrectly. I haven't shown it in full, and hence there may be a bit of
> confusion. I am happy to elaborate more if this is still unclear!

I still believe the ptr2int in there is the problem, or better our
handling of it. That said, I have not understand what the byte type
actually would do for this example, or at least not what it would
do that is different from the proper handling of ptr2int.

~ Johannes


> Thanks,
> George
>
>
> On Mon, Jun 7, 2021 at 2:58 PM James Courtier-Dutton <james.dutton at gmail.com>
> wrote:
>
>> On Fri, 4 Jun 2021 at 17:35, George Mitenkov via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>> Hi Johannes,
>>>
>>> Sure! The underlying problem is that raw-memory access handlers are
>> treated
>>> as integers, while they are not really integers. Especially std::byte
>> that specifically
>>> states that it has raw-memory access semantics. This semantic mismatch
>> can make
>>> AA wrong and a pointer to escape.
>>>
>>> Consider the following LLVM IR that copies a pointer:
>> You are making an assumption here. By just looking at the IR code
>> here, I don't think you can really be
>> sure what the type of the thing being copied is.
>>> %src8 = bitcast i8** %src to i8*
>>> %dst8 = bitcast i8** %dst to i8*
>>> call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1
>> false)
>>> %load = load i8*, i8** %dst
>>> %addr = ptrtoint i8* %load to i64
>>> ret i64 %addr
>>>
>>> If we optimize the call to memcpy, then the IR becomes
>> Same here, by just looking at the IR code here, I don't think you can
>> really be sure what the type of the thing being copied is.
>>> %src64 = bitcast i8** %src to i64*
>>> %dst64 = bitcast i8** %dst to i64*
>>> %addr = load i64, i64* %src64, align 1
>>> store i64 %addr, i64* %dst64, align 1
>>> ret i64 %addr
>>>
>> One can do bitcasts etc, to obscure the actual type of the bytes being
>> copied.
>> In both those examples, 8 bytes are copied, and the same value is
>> returned. So the end program will function the same when run.
>> Essentially, there is not enough information in the above code to
>> determine if the 8 bytes copied are part of a pointer or not.
>> For AA analysis, I would say, more information is needed.
>>
>> One can only really be sure what type those bytes are, and that they
>> are a pointer when they are actually used as a pointer argument to a
>> LOAD or STORE.
>> There are some other operations that can also be used to infer whether
>> it is a pointer or not, but the LOAD/STORE is the simplest example.
>>
>> Kind Regards
>>
>> James
>>


More information about the cfe-dev mailing list