[LLVMdev] [cfe-dev] UB in TypeLoc casting

David Blaikie dblaikie at gmail.com
Mon Feb 11 12:02:32 PST 2013


On Wed, Dec 5, 2012 at 1:06 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Thu, Nov 29, 2012 at 8:16 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>> On Thu, Nov 29, 2012 at 7:43 PM, Eli Friedman <eli.friedman at gmail.com>
>> wrote:
>>>
>>> On Thu, Nov 29, 2012 at 3:49 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>> > Moving to LLVM dev to discuss the possibility of extending the cast
>>> > infrastructure to handle this.
>>> >
>>> > On Tue, Nov 20, 2012 at 5:51 PM, John McCall <rjmccall at apple.com> wrote:
>>> >> On Nov 18, 2012, at 5:05 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>> >>> TypeLoc casting looks bogus.
>>> >>>
>>> >>> TypeLoc derived types return true from classof when the dynamic type
>>> >>> is not actually an instance of the derived type. The TypeLoc is then
>>> >>> (erroneously) cast to the derived type and the derived type's implicit
>>> >>> copy ctor is executed (so long as you remember to assign the result of
>>> >>> cast<SpecificTypeLoc>), if you copy, otherwise the SpecificTypeLoc's
>>> >>> member functions are actually invoked on a TypeLoc object - bogas, but
>>> >>> it works (because there's no other members in the SpecificTypeLoc
>>> >>> types).
>>> >>
>>> >> Yep.  See also LLVM's IntrinsicInst.  This kind of UB is very common
>>> >> and
>>> >> essentially harmless, but if you want to stamp it out, feel free.
>>> >>
>>> >>> Richard / Kostya: what's the word on catching this kind of UB
>>> >>> (essentially: calling member functions of one type on an instance not
>>> >>> of that type)? (in the specific case mentioned below, as discussed in
>>> >>> the original thread, ASan or MSan might be able to help)
>>> >>>
>>> >>> Clang Dev: what should we do to fix this? I don't think it's helpful
>>> >>> to add machinery to llvm::cast to handle concrete type conversion, so
>>> >>> I think we should consider a non-llvm::cast solution for this
>>> >>> hierarchy. Replace the llvm::isa calls with getTypeLocClass calls (or
>>> >>> a separate wrapper for that) & add a SpecificTypeLoc(const TypeLoc&)
>>> >>> ctor for every specific TypeLoc that has an appropriate assert & calls
>>> >>> up through to the base copy ctor. It'll be a fair bit of code to add
>>> >>> to TypeLoc, but nothing complicated.
>>> >>>
>>> >>> Any other ideas?
>>> >>
>>> >> I don't see why isa<> and cast<> aren't the right code interface for
>>> >> this,
>>> >> rather than reinventing something else.  You just need to teach cast<>
>>> >> to construct and return a proper temporary.
>>> >
>>> > LLVM-dev folks: what do you reckon? Should we extend the cast
>>> > infrastructure to be able to handle value type conversions?
>>>
>>> No, I don't think that makes sense; the casting infrastructure is
>>> complicated enough without having to deal with this.  It would be easy
>>> enough to implement a method on TypeLoc to handle this,
>>>
>>> > It doesn't really feel like the right fit to me, but I'll defer to the
>>> > community's wisdom if it's overwhelming. I'd just like to see a
>>> > solution wherein we can't write this particular kind of bug again,
>>> > ideally.
>>>
>>> Attaching proof-of-concept solution which prevents this kind of bug.
>>> (Ugly, and requires C++11 support as written, but potentially
>>> interesting anyway.)
>>
>>
>> The C++11 should be easy to remove (move the enable_if to the return type).
>> This approach seems reasonable to me.
>
> Thanks Eli - that does seem to do the trick (took me a little while to
> wrap my head around it, though). I've modified it, as Richard
> suggested, to use enable_if in the return type & it builds in C++98
> and correctly catches the TypeLoc usage.
>
> I'll have to find some time to actually go & fix TypeLoc before we can
> submit this, though.
>
> (& technically we could allow casting where the parameter type is a
> temporary, so long as you only cast to the same type - but that's not
> much use. I wonder if we should trivially disallow casting when the
> source/dest type are the same anyway? But I suppose there's no harm in
> allowing that either, except that it might confuse the reader a bit
> when they see a no-op cast like that)

Looking into this I've fixed a few other things that were a little
broken by the looseness of llvm::cast & friends (see r174853). I've
been prototyping a fix to TypeLoc specifically, hit some hiccups in
ASTMatchers (probably surmountable - Manuel helped me with r174862
(might need one other change there)) but otherwise seems achievable.

Beyond that, though, I've hit one hierarchy in the Static Analyzer
that does this as well: ProgramPoint. On IRC Jordan Rose mentioned
that there's another more pervasive use of this pattern in the Static
Analyzer, the SVal hierarchy.

So, Ted, how objectionable would it be for me to introduce something
like (names subject to adjustment):

template<typename T>
T SVal::castAs();

template<typename T>
llvm::Optional<T> SVal::getAs();

(the implementations of these functions might involve invoking private
FooSVal(SVal) ctors in each SVal derived type - so adding a bit of
boilerplate ctors to all those classes)

and replace "cast<FooSVal>(bar)" with "bar.castAs<FooSVal>()" and
dyn_cast with getAs similarly? & similarly for the ProgramPoint (&
TypeLoc - which I'm already working on) hierarchies?


Full disclosure:
Obviously the more important part of this is fixing the easily-written
bugs such as r174862 & that doesn't necessarily require us to fix the
UB, nor to migrate these APIs away from the LLVM RTTI infrastructure.

We could, for example, add value-type-conversion support to the LLVM
RTTI infrastructure that checks that if you pass a temporary it would
explicitly do a value conversion and return a value not a
reference/pointer. (this wouldn't scale to dyn_cast, though - that'd
have to return Optional<T> not T* since there's no T object for a T*
to point to - we could still, technically, wedge this in to dyn_cast
but that might be a bit much) And we could do this without fixing the
UB type punning here (still reinterpret_casting the base to the
derived type & using the implicit copy ctor) or while fixing the UB
(require the derived types to have derived(base) ctors) but keeping
the cast API usage the same.

It's my personal opinion (& see Eli's agreement above) that it's best
just to leave this out of the LLVM RTTI infrastructure entirely & have
these hierarchies handle it themselves.

Thanks,
- David



More information about the llvm-dev mailing list