[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

Sun Jun 6 11:11:08 PDT 2021

Hi Madhur,

I can argue that if b8 is proposed then why not b16

for half? Why not b32 for some other reason?

We do propose to have b<N> for different Ns, as per proposal:

> we denote the byte type as b<N>, where N is the number of bits.

Maybe that was not explicit enough. But note that the only byte bit width
produced
by the frontend is b8 (from char, unsigned char/std::byte). The other bit
widths
come from LLVM optimizations, such as already mentioned memcpy:

%src8 = bitcast i8** %src to i8*

%dst8 = bitcast i8** %dst to i8*

call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1 false)

is transformed (by instcombine) into

%src64 = bitcast i8** %src to i64*

%dst64 = bitcast i8** %dst to i64*

%val = load i64, i64* %src64

store i64 %val, i64* %dst64

What we propose is to have roughly

%src64 = bitcast i8** %src to *b64**

%dst64 = bitcast i8** %dst to *b64**

%val = load *b64*, *b64** %src64

store *b64* %val, *b64** %dst64

Having this just copies memory as-is, and cannot introduce implicit

ptr2int/int2ptr casts.

Given the problem,
> *I'd say we should think about a way to annotate*
> *types with attribute or metadata or flags which optimizations can use**to
> do a better job.* The attribute/metadata could carry the semantic
> meaning for the type. Frontends can generate this "type
> attribute/metadata"
> and optimizations can choose to use this extra information to do the
> better job. It would be a hint though and not a mandate for optimizations.
> This approach is very similar to attributes in LLVM IR and just like an IR
> function can have attributes, a type can also posses attributes/metadata.
> (Whether it should be an attribute or metadata is a choice but
> that would not deviate from the purpose).

I see your point and it is definitely easier to fix everything with
metadata/attributes.
I am concerned with this approach because it just postpones a bigger
problem.
There are already lots of metadata, attributes and IR/optimizations
have become
complicated enough. Instead of fixing a problem *while we can*, we just add
attributes,
then more attributes... I believe that at some point this will just become
legacy and
impossible to fix.

Thanks,
George

On Sun, Jun 6, 2021 at 12:32 PM Madhur Amilkanthwar via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> HI George,
>
> I don't think this is scalable model to add a new type just to benefit
> an analysis and draw specific conclusions from it. I can argue that
> if b8 is proposed then why not b16 for half? Why not b32 for some
> other reason? This won't stop just there and one can go beyond
> and introduce types to benefit domain specific languages.
>
> Given the problem, *I'd say we should think about a way to annotate*
> * types with attribute or metadata or flags which optimizations can use*
> *to do a better job.* The attribute/metadata could carry the semantic
> meaning for the type. Frontends can generate this "type
> attribute/metadata"
> and optimizations can choose to use this extra information to do the
> better job. It would be a hint though and not a mandate for optimizations.
> This approach is very similar to attributes in LLVM IR and just like an IR
> function can have attributes, a type can also posses attributes/metadata.
> (Whether it should be an attribute or metadata is a choice but
> that would not deviate from the purpose).
>
> This approach is far more adoptable and convincing than introducing
> a whole new type which would be massive complexity for the type system.
>
>
>
> On Sun, Jun 6, 2021 at 2:32 PM James Courtier-Dutton via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Also, the comment below is wrong. At this point, arr3 is equivalent to
>> arr2, which is q.
>>
>>  // Now arr3 is equivalent to arr1, which is p.
>>   int *r;
>>   memcpy(&r, (unsigned char *)arr3, sizeof(r));
>>   // Now r is p.
>>   *p = 1;
>>   *r = 10;
>>
>>
>>
>> On Sun, 6 Jun 2021 at 08:54, James Courtier-Dutton
>> <james.dutton at gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I would also oppose adding a byte type, but mainly because the bug
>> > report mentioned (https://bugs.llvm.org/show_bug.cgi?id=37469) is not
>> > a bug at all.
>> > The example in the bug report is just badly written C code.
>> > Specifically:
>> >
>> > int main() {
>> >   int A[4], B[4];
>> >   printf("%p %p\n", A, &B[4]);
>> >   if ((uintptr_t)A == (uintptr_t)&B[4]) {
>> >     store_10_to_p(A, &B[4]);
>> >     printf("%d\n", A[0]);
>> >   }
>> >   return 0;
>> > }
>> >
>> > "int B[4];" allows values between 0 and 3 only, and referring to 4 in
>> > &B[4] is undef, so in my view, it is correctly optimised out which is
>> > why it disappears in -O3.
>> >
>> > Kind Regards
>> >
>> > James
>> >
>> >
>> > On Sun, 6 Jun 2021 at 05:26, Chris Lattner via cfe-dev
>> > <cfe-dev at lists.llvm.org> wrote:
>> > >
>> > > On Jun 4, 2021, at 11:25 AM, John McCall via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:On 4 Jun 2021, at 11:24, George Mitenkov
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > Together with Nuno Lopes and Juneyoung Lee we propose to add a new
>> byte
>> > > type to LLVM to fix miscompilations due to load type punning. Please
>> see
>> > > the proposal below. It would be great to hear the
>> > > feedback/comments/suggestions!
>> > >
>> > >
>> > > Motivation
>> > > ==========
>> > >
>> > > char and unsigned char are considered to be universal holders in C.
>> They
>> > > can access raw memory and are used to implement memcpy. i8 is the
>> LLVM’s
>> > > counterpart but it does not have such semantics, which is also not
>> > > desirable as it would disable many optimizations.
>> > >
>> > > I don’t believe this is correct. LLVM does not have an innate
>> > > concept of typed memory. The type of a global or local allocation
>> > > is just a roundabout way of giving it a size and default alignment,
>> > > and similarly the type of a load or store just determines the width
>> > > and default alignment of the access. There are no restrictions on
>> > > what types can be used to load or store from certain objects.
>> > >
>> > > C-style type aliasing restrictions are imposed using tbaa
>> > > metadata, which are unrelated to the IR type of the access.
>> > >
>> > > I completely agree with John.  “i8” in LLVM doesn’t carry any
>> implications about aliasing (in fact, LLVM pointers are going towards being
>> typeless).  Any such thing occurs at the accesses, and are part of TBAA.
>> > >
>> > > I’m opposed to adding a byte type to LLVM, as such semantic carrying
>> types are entirely unprecedented, and would add tremendous complexity to
>> the entire system.
>> > >
>> > > -Chris
>> > >
>> > > _______________________________________________
>> > > cfe-dev mailing list
>> > > cfe-dev at lists.llvm.org
>> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
> mail are of my own and my employer has no take in it. *
> Thank You.
> Madhur D. Amilkanthwar
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210606/32ff2481/attachment-0001.html>