[LLVMdev] [cfe-dev] UB in TypeLoc casting

Mon Feb 11 12:40:30 PST 2013

On Mon, Feb 11, 2013 at 12:02 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Wed, Dec 5, 2012 at 1:06 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> On Thu, Nov 29, 2012 at 8:16 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>> On Thu, Nov 29, 2012 at 7:43 PM, Eli Friedman <eli.friedman at gmail.com>
>>> wrote:
>>>>
>>>> On Thu, Nov 29, 2012 at 3:49 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>> > Moving to LLVM dev to discuss the possibility of extending the cast
>>>> > infrastructure to handle this.
>>>> >
>>>> > On Tue, Nov 20, 2012 at 5:51 PM, John McCall <rjmccall at apple.com> wrote:
>>>> >> On Nov 18, 2012, at 5:05 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>> >>> TypeLoc casting looks bogus.
>>>> >>>
>>>> >>> TypeLoc derived types return true from classof when the dynamic type
>>>> >>> is not actually an instance of the derived type. The TypeLoc is then
>>>> >>> (erroneously) cast to the derived type and the derived type's implicit
>>>> >>> copy ctor is executed (so long as you remember to assign the result of
>>>> >>> cast<SpecificTypeLoc>), if you copy, otherwise the SpecificTypeLoc's
>>>> >>> member functions are actually invoked on a TypeLoc object - bogas, but
>>>> >>> it works (because there's no other members in the SpecificTypeLoc
>>>> >>> types).
>>>> >>
>>>> >> Yep.  See also LLVM's IntrinsicInst.  This kind of UB is very common
>>>> >> and
>>>> >> essentially harmless, but if you want to stamp it out, feel free.
>>>> >>
>>>> >>> Richard / Kostya: what's the word on catching this kind of UB
>>>> >>> (essentially: calling member functions of one type on an instance not
>>>> >>> of that type)? (in the specific case mentioned below, as discussed in
>>>> >>> the original thread, ASan or MSan might be able to help)
>>>> >>>
>>>> >>> Clang Dev: what should we do to fix this? I don't think it's helpful
>>>> >>> to add machinery to llvm::cast to handle concrete type conversion, so
>>>> >>> I think we should consider a non-llvm::cast solution for this
>>>> >>> hierarchy. Replace the llvm::isa calls with getTypeLocClass calls (or
>>>> >>> a separate wrapper for that) & add a SpecificTypeLoc(const TypeLoc&)
>>>> >>> ctor for every specific TypeLoc that has an appropriate assert & calls
>>>> >>> up through to the base copy ctor. It'll be a fair bit of code to add
>>>> >>> to TypeLoc, but nothing complicated.
>>>> >>>
>>>> >>> Any other ideas?
>>>> >>
>>>> >> I don't see why isa<> and cast<> aren't the right code interface for
>>>> >> this,
>>>> >> rather than reinventing something else.  You just need to teach cast<>
>>>> >> to construct and return a proper temporary.
>>>> >
>>>> > LLVM-dev folks: what do you reckon? Should we extend the cast
>>>> > infrastructure to be able to handle value type conversions?
>>>>
>>>> No, I don't think that makes sense; the casting infrastructure is
>>>> complicated enough without having to deal with this.  It would be easy
>>>> enough to implement a method on TypeLoc to handle this,
>>>>
>>>> > It doesn't really feel like the right fit to me, but I'll defer to the
>>>> > community's wisdom if it's overwhelming. I'd just like to see a
>>>> > solution wherein we can't write this particular kind of bug again,
>>>> > ideally.
>>>>
>>>> Attaching proof-of-concept solution which prevents this kind of bug.
>>>> (Ugly, and requires C++11 support as written, but potentially
>>>> interesting anyway.)
>>>
>>>
>>> The C++11 should be easy to remove (move the enable_if to the return type).
>>> This approach seems reasonable to me.
>>
>> Thanks Eli - that does seem to do the trick (took me a little while to
>> wrap my head around it, though). I've modified it, as Richard
>> suggested, to use enable_if in the return type & it builds in C++98
>> and correctly catches the TypeLoc usage.
>>
>> I'll have to find some time to actually go & fix TypeLoc before we can
>> submit this, though.
>>
>> (& technically we could allow casting where the parameter type is a
>> temporary, so long as you only cast to the same type - but that's not
>> much use. I wonder if we should trivially disallow casting when the
>> source/dest type are the same anyway? But I suppose there's no harm in
>> allowing that either, except that it might confuse the reader a bit
>> when they see a no-op cast like that)
>
> Looking into this I've fixed a few other things that were a little
> broken by the looseness of llvm::cast & friends (see r174853). I've
> been prototyping a fix to TypeLoc specifically, hit some hiccups in
> ASTMatchers (probably surmountable - Manuel helped me with r174862
> (might need one other change there)) but otherwise seems achievable.
>
> Beyond that, though, I've hit one hierarchy in the Static Analyzer
> that does this as well: ProgramPoint. On IRC Jordan Rose mentioned
> that there's another more pervasive use of this pattern in the Static
> Analyzer, the SVal hierarchy.
>
> So, Ted, how objectionable would it be for me to introduce something
> like (names subject to adjustment):
>
> template<typename T>
> T SVal::castAs();
>
> template<typename T>
> llvm::Optional<T> SVal::getAs();
>
> (the implementations of these functions might involve invoking private
> FooSVal(SVal) ctors in each SVal derived type - so adding a bit of
> boilerplate ctors to all those classes)

Richard Smith pointed out that this could be done without the
boilerplate with something like:

Derived d;
d.Base::operator=(base);

a bit nasty, but valid/well-defined & would avoid writing the
boilerplate delegating Derived(Base) ctors in every derived type

>
> and replace "cast<FooSVal>(bar)" with "bar.castAs<FooSVal>()" and
> dyn_cast with getAs similarly? & similarly for the ProgramPoint (&
> TypeLoc - which I'm already working on) hierarchies?
>
>
> Full disclosure:
> Obviously the more important part of this is fixing the easily-written
> bugs such as r174862 & that doesn't necessarily require us to fix the
> UB, nor to migrate these APIs away from the LLVM RTTI infrastructure.
>
> We could, for example, add value-type-conversion support to the LLVM
> RTTI infrastructure that checks that if you pass a temporary it would
> explicitly do a value conversion and return a value not a
> reference/pointer. (this wouldn't scale to dyn_cast, though - that'd
> have to return Optional<T> not T* since there's no T object for a T*
> to point to - we could still, technically, wedge this in to dyn_cast
> but that might be a bit much) And we could do this without fixing the
> UB type punning here (still reinterpret_casting the base to the
> derived type & using the implicit copy ctor) or while fixing the UB
> (require the derived types to have derived(base) ctors) but keeping
> the cast API usage the same.
>
> It's my personal opinion (& see Eli's agreement above) that it's best
> just to leave this out of the LLVM RTTI infrastructure entirely & have
> these hierarchies handle it themselves.
>
> Thanks,
> - David