[cfe-dev] -Wunreachable-code and templates

Tue Jan 10 00:20:55 PST 2012

[bumping with the latest sync'd patch]

On Fri, Dec 2, 2011 at 10:16 AM, David Blaikie <dblaikie at gmail.com> wrote:
> On Fri, Dec 2, 2011 at 8:36 AM, Ted Kremenek <kremenek at apple.com> wrote:
>> On Dec 2, 2011, at 12:14 AM, David Blaikie wrote:
>>
>> Before doing that tidyup I decided to have a go at the "simple" thing
>> - doing unreachable code on templates instead of their instantiations.
>> No doubt it's not the tidiest implementation (suppressing all the
>> other analysis based warnings when 'dependent' is true is a bit
>> intrusive/repetitive) but it seems to show the concept fairly well.
>>
>> I don't think we'd discussed in detail what problems might arise from
>> this approach - certainly we know cases that we can't prove (such as a
>> dependent function call which may or may not be noreturn) but
>> non-dependent calls to noreturn functions, for example, seem to work
>> correctly (without regressing your test cases that cover the dependent
>> noreturn case).
>>
>> Are there other cases I've missed where this approach could be problematic?
>>
>>
>> Hi David,
>>
>> Thank you for being persistent.  :)
>>
>> I think this approach is worth pursuing further only if there is an
>> algorithmic justification that it is doing the right thing.  Since each
>> instantiation of a template can have slightly different control-flow (mainly
>> due to noreturn, but there may be other cases we haven't considered), we'd
>> need to be able to say that the "control-flow" of the uninstantiated
>> template represents some kind of superset of the possible control-flow when
>> the template is instantiated (or at least a superset of the statements that
>> are reachable).
>>
>> If 'noreturn' is the only issue, then this may actually work.  The
>> uninstantiated template may have 'noreturn' calls, but by definition those
>> will appear in the instantiated template as well.  Ab instantiation may have
>> additional 'noreturn' calls, but my understanding is that this will only
>> cause more code to become unreachable in the particular instantiation.  Thus
>> as far as 'noreturn' is concerned, the uninstantiated template represents a
>> correct under-approximation of the unreachable code in all instantiation.
>>  Is this logic sound?
>
> I think the generalization goes something like this:
>
> To get a false positive with this approach it would be necessary for
> edges to be added to the CFG during the transformation from template
> pattern to template instantiation.
>
> So far as I can think this shouldn't be the case - my only concern
> would be exceptions. But it seems catch blocks are considered
> reachable even when the corresponding try has no code at all, so, at
> least for now, these aren't an issue. (but if we started adding in
> edges to catch blocks only when they were reachable, we'd have to err
> on the side of caution for any dependent expressions that might throw
> - adding edges to all catch blocks in such a case, rather than adding
> no edges initially & only adding the required edges at instantiation
> time)
>
>> If it is, my main concern then comes down to the following:
>>
>> (1) Are there are other issues that we are not considering where the
>> uninstantiated template doesn't faithfully provide an under approximation of
>> the reachable code?
>
> I've tested the exception case above & that seems to be always
> reachable anyway. I'll try my hand at looking through the template
> instnantiation code to see if there's any cases where we might end up
> adding edges rather than removing them.

I haven't really delved into this further - is there a particular
form/approach you'd like to see to prove this property? (that new
edges are never added when instantiating a template (except when
splitting an existing block in the CFG (eg: by a dependent call
becoming noreturn - splitting the block at the call site)))

>> (2) If we decide this approach is viable, which uninstantiated templates do
>> we analyze?  For performance reasons, I don't think we should analyze all
>> the uninstantiated templates, as we may find ourselves repeatedly analyzing
>> a huge portion of the Boost and the STL, etc.  My suggestion is that when we
>> encounter an instantiation, we go and check if we have already analyzed the
>> uninstantiated template for this warning, and if not, do the analysis.
>
> By repeatedly, you mean in each translation unit, right? Because we
> shouldn't have to visit them more than once per TU (we won't be
> visiting them per insntantiation). But, yes, the idea of analyzing
> every boost template someone includes when they're only using a subset
> does sound a little costly - I might be interested to compare times,
> though.
>
> Only analyzing instantiated templates (& keeping a list of those
> templates we've already seen/analyzed) seems OK to me, assuming that
> didn't represent a prohibitive memory cost to maintain the set of
> visited templates (can't see why it should, but it is a small
> time/space tradeoff all the same).

Is there a particular technique/suite/process I should use to try to
measure the possible performance regression of performing reachable
code analysis on all templates? (of course I may need to come up with
my own test cases using the STL and/or boost to really grind things a
bit)

>> (3) Somewhat tangential, we would want to analyze explicit template
>> instantiations an their own, since they represent "concrete" functions that
>> were provided, not instantiated ones.
>
> I'll double check this, but I assume my prototype changes already
> account for this, since they didn't regress your test cases.
>
>> (4) I don't know the if the CFGBuilder is going to work all that well on
>> uninstantiated templates.  There may be bugs here, or other fundamental
>> issues (which i don't know about).
>
> I'm planning to keep plodding along compiling LLVM & Clang with
> -Weverything (with a few things disabled) so I'll see what comes of
> it. With my prototype I didn't have any regressions across the Clang
> test suite, for what that's worth.
>
>> (5) We wouldn't want to do this approach wholesale for all analyses.  I
>> think it mainly makes sense for the reachable code warnings, e.g.
>> -Wunreachable-code or warning if a function doesn't return.  Analyses like
>> -Wuninitialized, etc., would want to keep working on instantiated templates
>> since (a) they need the more precise semantics of the instantiated templates
>> to work correctly and (b) it is okay if we flag a warning in one
>> instantiation that doesn't appear in an other (since that is a genuine
>> issue).
>
> The prototype patch attached to my previous email already accounted
> for this (I think) - just not in a very nice way.
>
> To formalize this requirement, though: Warnings that we only want to
> emit in reachable code should use a pessimistic definition of
> reachable code (err on the side of assuming code is unreachable) to
> avoid false positives. Warnings about unreachable code should use a
> pessimistic definition of unreachable code (err on the side of
> assuming code is reachable) to avoid false positives.
>
> So, yes, unreachable warnings can be done on templates but array
> bounds checking, return paths, etc, should be done on instantiations.
> Though, ideally, we could find some way to emit
> instantiation-independent problems once, rather than once for every
> instantiation (see the array bounds test case modifications in my
> previous patch for an example that exposes this problem) - either by
> analyzing the template first & trying to avoid
> instantiation-independent cases when we analyze the instantiation, or
> simply by filtering out the duplicates in some way whenever we visit
> an instantiation.

Thanks again,
- David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: template_unreachable.diff
Type: text/x-diff
Size: 4502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120110/c81d37cd/attachment.diff>