[PATCH] Fix for bug 21725: wrong results with union and strict-aliasing

Hal Finkel hfinkel at anl.gov
Tue Mar 17 14:34:57 PDT 2015


----- Original Message -----

> From: "Daniel Berlin" <dberlin at dberlin.org>
> To: "Jeroen Dobbelaere" <jeroen.dobbelaere at gmail.com>
> Cc: reviews+D8056+public+7b63cedb34d51bb1 at reviews.llvm.org, "Hal
> Finkel" <hfinkel at anl.gov>, cfe-commits at cs.uiuc.edu,
> "cfe-dev at cs.uiuc.edu Developers" <cfe-dev at cs.uiuc.edu>
> Sent: Tuesday, March 17, 2015 4:24:01 PM
> Subject: Re: [PATCH] Fix for bug 21725: wrong results with union and
> strict-aliasing

> On Tue, Mar 17, 2015 at 2:03 PM, Jeroen Dobbelaere <
> jeroen.dobbelaere at gmail.com > wrote:

> > On Tue, Mar 17, 2015 at 7:16 PM, hfinkel at anl.gov < hfinkel at anl.gov
> > >
> > wrote:
> 

> > > In http://reviews.llvm.org/D8056#141899 , @jeroen.dobbelaere
> > > wrote:
> > 
> 

> > > > Further investigation shows that
> > > > llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp is not ready to
> > > > handle struct types where multiple members have the same (0)
> > > > offset. The clang fix only makes sense once that issue is
> > > > resolved.
> > 
> 

> > > Can you work on this? If not (or even if you can), can you file a
> > > bug
> > > report with an example?
> > 
> 

> > > http://reviews.llvm.org/D8056
> > 
> 

> > > EMAIL PREFERENCES
> > 
> 
> > > http://reviews.llvm.org/settings/panel/emailpreferences/
> > 
> 

> > Yes, I am looking into it. I have a version now where 'most things'
> > work, but it seems that the tbaa information that clang produces is
> > not expressive enough when the struct/union has array members. For
> > example:
> 

> > union U03 {
> 
> > short mS[4];
> 
> > int mI;
> 
> > };
> 

> > results in:
> 

> > !5 = !{!"omnipotent char", !6, i64 0}
> 
> > !6 = !{!"Simple C/C++ TBAA"}
> 
> > !7 = !{!8, !8, i64 0}
> 
> > !8 = !{!"int", !5, i64 0}
> 
> > !9 = !{!10, !10, i64 0}
> 
> > !10 = !{!"short", !5, i64 0}
> 
> > !40 = !{!41, !8, i64 0}
> 
> > !41 = !{!"_ZTS3U03", !5, i64 0, !8, i64 0}
> 

> > Note that In !41, we loose the information about the 'short'
> > member.
> 

> I would start here :)

> You should make it produce info about all the members.
I agree. 

> > A function like:
> 

> > void test_u03b(union U03* a, union U03* b)
> 
> > {
> 
> > a->mS[0]=1;
> 
> > b->mI=2;
> 

> > // CHECK: Function: _Z9test_u03b{{.*}}
> 
> > //FIXME: CHECK: MayAlias: store i32 2, i32* %{{.*}}, align 4, !tbaa
> > !{{.*}} <-> store i16 1, i16* %{{.*}}, align 2, !tbaa !{{.*}}
> 
> > }
> 

> > results in wrong information: an access to mS[0] results in a
> > 'short'
> > (!10) access. The access to mI results in '!40'.
> 

> > Note that for the short '!10' access, we loose the information
> > about
> > the associated union.
> 

> > Because !41 refers to char '!5' and int '8', but not to short
> > '!10',
> > we cannot detect that there might
> 
> > be overlap between '!40' and '!10'.
> 

> RIght, but this is clearly a bug in the info above.

> > When we have two arrays of different type, the problem becomes even
> > worse as we now have no
> 
> > connection to the union any more from the two accesses...
> 

> I went through this on the whiteboard with folks when they designed
> this, and pointed this out - it's pretty easy to get into situations
> where you have two random pieces of memory, and they were part of a
> union, and you don't know this.

> You can make them as easy or as hard as you like.

> The answer I got was "we're fine with people having to make this
> explicit".
I don't understand what this means. How should they do that? The language has a defined set of semantics, and we need to follow them. At the very least, if we don't want to fully represent the TBAA hierarchy for some types (I'm not sure why we'd not do this, but...) then we should omit the metadata all together. 

-Hal 

> > One issue with getting array accesses right that I see, is that the
> > 'offset' field can suddenly be variable.
> 

> Not really. You can't say where it points, so the range would be
> "everything" anyway.
> You should just say it accesses the entire array (but not anything
> else).

> > Maybe a good enough (and correct) fallback that we should consider
> > is
> > to make such _union_ accesses alias with everything... (omnipotent
> > char) in stead of mapping it to the basic type.
> 

> It is possible, with the sets we have, to get the aliasing right :)
> I know this because GCC does it.

> This design is nearly identical to GCC's TBAA and subvariable design
> :)

> > Any thoughts ?
> 

> > Jeroen Dobbelaere
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150317/e57682c9/attachment.html>


More information about the cfe-commits mailing list