[PATCH] More precise aliasing for char arrays

Sanjin Sijaric ssijaric at codeaurora.org
Thu Jun 26 03:51:58 PDT 2014


Hi Arthur,

Thanks for looking at this.

The way I understood that comment was to mean that pointers to char can alias with anything. 

Section 3.10 of the C++11 standard says:

"If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic
data members (including, recursively, an element or non-static data member of a subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
— a char or unsigned char type."

(I don't have the C Standard handy at the moment)

The way I am reading this is that "char *p; p = ...; d = *p;", where d is an int, will result in "d = *p" not getting hoisted out, since *p is an lvalue of type char. That is defined behaviour.  On the other hand, for "int *p; p = &c; d = *p;" where c is a char, "d = *p" is free to get hoisted out since *p is an lvalue of type int, which isn't the dynamic type of the object (char), or any of the other classifications from the list.  Am I misinterpreting the standard here?

If not, then having "char *p;" and "b" as an int array should produce no change in the aliasing information.  With this change, aliasing is identical for the following compared to what is currently generated.  

    char *p;
    typedef struct {
      char a;
      int b[100];
      char c;
    } S;

    S x;

    void func1 (char d) {
      for (int i = 0; i < 100; i++) {
        x.b[i] += 1;
        d = *p;  <==== will not get hoisted
        x.a += d;
      }
    }

I get the following (for aarch64), where "!tbaa !7" that is associated with "d = *p" has the base of "omnipotent char", which will prevent it from getting hoisted.  For example, TBAA will determine that !7 is aliased with !1 when it's doing analysis on "store i32 %add, i32* %arrayidx, align 4, !tbaa !1" and "%2 = load i8* %1, align 1, !tbaa !7"  

!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"int", metadata !3, i64 0}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
!5 = metadata !{metadata !6, metadata !6, i64 0}
!6 = metadata !{metadata !"any pointer", metadata !3, i64 0}
!7 = metadata !{metadata !3, metadata !3, i64 0}
!8 = metadata !{metadata !9, metadata !3, i64 0}
!9 = metadata !{metadata !"", metadata !3, i64 0, metadata !3, i64 4, metadata !3, i64 404}


for.body:                                         ; preds = %for.cond
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i32]* getelementptr inbounds (%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
  %0 = load i32* %arrayidx, align 4, !tbaa !1
  %add = add nsw i32 %0, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !1  <============ !1 and !7 are aliased
  %1 = load i8** @p, align 8, !tbaa !5
  %2 = load i8* %1, align 1, !tbaa !7  <==========
  %conv = zext i8 %2 to i32
  %3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, !tbaa !8
  %conv1 = zext i8 %3 to i32
  %add2 = add nsw i32 %conv1, %conv
  %conv3 = trunc i32 %add2 to i8
  store i8 %conv3, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, !tbaa !8
  %inc = add nsw i32 %i.0, 1
  br label %for.cond



This change is in CGExpr.cpp in EmitArraySubscriptExpr when dealing with accesses to non-VLAs, so that the load/store that is produced is annotated with a tbaa tag that doesn't have "omnipotent char" as the base.

If we go back to the original case with "int *p;" and b as the char array, today we get the following where "store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1" and "%2 = load i32* %1, align 4, !tbaa !6 " are aliased with each other due to !tbaa !1 having the base type as "omnipotent char".  So, "x.b[i] = ..." is preventing "d = *p" from getting hoisted out.   This is undefined behaviour if I am reading the standard correctly.  If that is the case, I think we should be able to hoist "d = *p" out of the loop.

!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"any pointer", metadata !2, i64 0}
!6 = metadata !{metadata !7, metadata !7, i64 0}
!7 = metadata !{metadata !"int", metadata !2, i64 0}
!8 = metadata !{metadata !9, metadata !2, i64 0}
!9 = metadata !{metadata !"", metadata !2, i64 0, metadata !2, i64 1, metadata !2, i64 101}

for.body:                                         ; preds = %for.cond
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i8]* getelementptr inbounds (%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
  %0 = load i8* %arrayidx, align 1, !tbaa !1
  %conv = zext i8 %0 to i32
  %add = add nsw i32 %conv, 1
  %conv1 = trunc i32 %add to i8
  store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1   <====== !1 and !6 are aliased
  %1 = load i32** @p, align 8, !tbaa !4
  %2 = load i32* %1, align 4, !tbaa !6  <=====
  %conv2 = trunc i32 %2 to i8
  %conv3 = zext i8 %conv2 to i32
  %3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, !tbaa !8
  %conv4 = zext i8 %3 to i32
  %add5 = add nsw i32 %conv4, %conv3
  %conv6 = trunc i32 %add5 to i8
  store i8 %conv6, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, !tbaa !8
  %inc = add nsw i32 %i.0, 1
  br label %for.cond


With the change, we get the following, where !7 and !1 are not aliased with each other since !1 has the base type "char".

!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"char", metadata !3, i64 0}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
!5 = metadata !{metadata !6, metadata !6, i64 0}
!6 = metadata !{metadata !"any pointer", metadata !3, i64 0}
!7 = metadata !{metadata !8, metadata !8, i64 0}
!8 = metadata !{metadata !"int", metadata !3, i64 0}
!9 = metadata !{metadata !10, metadata !3, i64 0}
!10 = metadata !{metadata !"", metadata !3, i64 0, metadata !3, i64 1, metadata !3, i64 101}

for.body:                                         ; preds = %for.cond
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i8]* getelementptr inbounds (%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
  %0 = load i8* %arrayidx, align 1, !tbaa !1
  %conv = zext i8 %0 to i32
  %add = add nsw i32 %conv, 1
  %conv1 = trunc i32 %add to i8
  store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1  <=====  !1 and !7 are not aliased anymore
  %1 = load i32** @p, align 8, !tbaa !5
  %2 = load i32* %1, align 4, !tbaa !7  <=====
  %conv2 = trunc i32 %2 to i8
  %conv3 = zext i8 %conv2 to i32
  %3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, !tbaa !9
  %conv4 = zext i8 %3 to i32
  %add5 = add nsw i32 %conv4, %conv3
  %conv6 = trunc i32 %add5 to i8
  store i8 %conv6, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, !tbaa !9
  %inc = add nsw i32 %i.0, 1
  br label %for.cond

The "omnipotent char" as the base is preserved when dereferencing pointers to char, etc.  The only thing that changes is the underlying base type of the TBAA tag associated with accesses to elements of non-VLAs of type char.  When the base type on the tbaa tag is changed to "char", it should ensure that these accesses are not aliased with any pointers other than pointers to char.

Thanks,
Sanjin

-----Original Message-----
From: Arthur O'Dwyer [mailto:arthur.j.odwyer at gmail.com] 
Sent: Wednesday, June 25, 2014 7:35 PM
To: Sanjin Sijaric
Cc: cfe-commits
Subject: Re: [PATCH] More precise aliasing for char arrays

On Wed, Jun 25, 2014 at 3:26 PM, Sanjin Sijaric <ssijaric at codeaurora.org> wrote:
>
>>     int *p;
>>     typedef struct {
>>       char a;
>>       char b[100];
>>       char c;
>>     } S;
>>
>>     S x;
>>
>>     void func1 (char d) {
>>       for (int i = 0; i < 100; i++) {
>>         x.b[i] += 1;
>>         d = *p;
>>         x.a += d;
>>       }
>>     }
>>
>> It seems like you want the compiler to hoist the read of `*p` above the write to `x.b[i]`.
>> But that isn't generally possible, is it? because the caller might have executed
>>
>>    p = &x.b[3];
>>
>> before the call to func1.
>
> Here, "p" is a pointer to int, whereas b is a char array.  Wouldn't "p = &x.b[3];" break ansi aliasing rules?

I was under the impression that that was the entire point of "omnipotent char"!
See lines 106-108 in the very file you're changing:

00106     // Character types are special and can alias anything.
00107     // In C++, this technically only includes "char" and "unsigned char",
00108     // and not "signed char". In C, it includes all three. For now,
00109     // the risk of exploiting this detail in C++ seems likely to outweigh
00110     // the benefit.

Source: http://clang.llvm.org/doxygen/CodeGenTBAA_8cpp_source.html

–Arthur





More information about the cfe-commits mailing list