[llvm] r217708 - llvm-profdata: Avoid undefined behaviour when reading raw profiles

David Blaikie dblaikie at gmail.com
Tue Sep 23 15:53:33 PDT 2014


On Tue, Sep 23, 2014 at 3:47 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2014-Sep-15, at 16:01, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> >> On Mon, Sep 15, 2014 at 3:51 PM, Justin Bogner <mail at justinbogner.com>
> wrote:
> >> David Blaikie <dblaikie at gmail.com> writes:
> >> > On Fri, Sep 12, 2014 at 2:22 PM, Justin Bogner <mail at justinbogner.com>
> wrote:
> >> >
> >> >     Author: bogner
> >> >     Date: Fri Sep 12 16:22:55 2014
> >> >     New Revision: 217708
> >> >
> >> >     URL: http://llvm.org/viewvc/llvm-project?rev=217708&view=rev
> >> >     Log:
> >> >     llvm-profdata: Avoid undefined behaviour when reading raw profiles
> >> >
> >> >     The raw profiles that are generated in compiler-rt always add
> padding
> >> >     so that each profile is aligned, so we can simply treat files that
> >> >     don't have this property as malformed.
> >> >
> >> > Any chance of changing this to not rely on UB (this code looks like
> it's
> >> > violating C++ aliasing rules pretty consistently) and instead using
> memcpy?
> >>
> >> It's always casting from char *, which IIUC doesn't break aliasing
> >> rules, so I *think* that this isn't relying on UB.
> >>
> > [+Richard for some language lawyering]
> >
> > I believe the rules are:
> >
> > You can take a pointer to any entity, cast it to a char* then back, and
> continue using it as the original type. But this only works if the pointer
> was originally a pointer to that type you're dereferencing by, which isn't
> the case here.
> >
> > You can take a pointer to T (where T is a standard layout(?)), cast to
> pointer to char, read the bytes of the object then write them back into
> some other object (via a char* that actually points to a T - derived the
> same way as the original) you can do that safely.
> >
> > But none of this allows taking a char* that wasn't derived from a T*,
> casting it to a T* and reading.
> >
> > But this is just my high-level understanding, I don't actually know
> where the words are in the standard to support this position.
>
> C++11 has them in the bullet point list in section 3.10.
>

Ah, thanks 3.10p10 specifically. Huh, didn't know you could read an
unsigned int as an int, for example... that's a bit weird, but OK.


> Sorry about this, looks like I missed that subtlety :/.
>
> > Hoping Richard can fill us in/confirm/deny and/or I can go hunting.
> >
> > - David
> >
> >
> > Assuming I'm right about that, the current design avoids unnecessarily
> > copying from the file to an auxiliary structure just to read the data
> > and throw it away.
>
> I think the optimizer will see through the `memcpy`s anyway, so this
> probably isn't a big deal.
>

That's certainly the goal - file bugs whenever this doesn't work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140923/c96febff/attachment.html>


More information about the llvm-commits mailing list