[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

Nuno Lopes via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 4 12:06:28 PDT 2021


On 4 Jun 2021, at 11:24, George Mitenkov wrote:

Hi all,

Together with Nuno Lopes and Juneyoung Lee we propose to add a new byte
type to LLVM to fix miscompilations due to load type punning. Please see
the proposal below. It would be great to hear the
feedback/comments/suggestions!


Motivation
==========

char and unsigned char are considered to be universal holders in C. They
can access raw memory and are used to implement memcpy. i8 is the LLVM’s
counterpart but it does not have such semantics, which is also not
desirable as it would disable many optimizations.

I don’t believe this is correct. LLVM does not have an innate
concept of typed memory. The type of a global or local allocation
is just a roundabout way of giving it a size and default alignment,
and similarly the type of a load or store just determines the width
and default alignment of the access. There are no restrictions on
what types can be used to load or store from certain objects.

C-style type aliasing restrictions are imposed using tbaa
metadata, which are unrelated to the IR type of the access.

 

It’s debatable whether LLVM considers memory to be typed or not. If we don’t consider memory to be typed, then *all* integer load operations have to be considered as potentially escaping pointers. Example:
store i32* %p, i32** %q
%q2 = bitcast i32** %q to i64*
%v = load i64* %q2

This program stores a pointer and then loads it back as an integer. So there’s an implicit pointer-to-integer cast, which escapes the pointer. If we allow this situation to happen, then the alias analysis code is broken, as well as several optimizations. LLVM doesn’t consider loads as potential pointer escape sites. It would probably be a disaster (performance wise) if it did!

 

The introduction of the byte type allow us to make all pointer <-> integer casts explicit, so that we don’t have to make all integer loads as escaping. It also allow us to say that LLVM optimizations are correct, and we “just” need to create a few new optimization to get rid of the extra bytecast instructions when they are provably not needed.

TBAA is unrelated to the problem we are trying to solve here.

 

Nuno

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210604/a01249c7/attachment.html>


More information about the llvm-dev mailing list