[cfe-dev] -Wunreachable-code and templates

Fri Dec 2 10:16:31 PST 2011

On Fri, Dec 2, 2011 at 8:36 AM, Ted Kremenek <kremenek at apple.com> wrote:
> On Dec 2, 2011, at 12:14 AM, David Blaikie wrote:
>
> Before doing that tidyup I decided to have a go at the "simple" thing
> - doing unreachable code on templates instead of their instantiations.
> No doubt it's not the tidiest implementation (suppressing all the
> other analysis based warnings when 'dependent' is true is a bit
> intrusive/repetitive) but it seems to show the concept fairly well.
>
> I don't think we'd discussed in detail what problems might arise from
> this approach - certainly we know cases that we can't prove (such as a
> dependent function call which may or may not be noreturn) but
> non-dependent calls to noreturn functions, for example, seem to work
> correctly (without regressing your test cases that cover the dependent
> noreturn case).
>
> Are there other cases I've missed where this approach could be problematic?
>
>
> Hi David,
>
> Thank you for being persistent.  :)
>
> I think this approach is worth pursuing further only if there is an
> algorithmic justification that it is doing the right thing.  Since each
> instantiation of a template can have slightly different control-flow (mainly
> due to noreturn, but there may be other cases we haven't considered), we'd
> need to be able to say that the "control-flow" of the uninstantiated
> template represents some kind of superset of the possible control-flow when
> the template is instantiated (or at least a superset of the statements that
> are reachable).
>
> If 'noreturn' is the only issue, then this may actually work.  The
> uninstantiated template may have 'noreturn' calls, but by definition those
> will appear in the instantiated template as well.  Ab instantiation may have
> additional 'noreturn' calls, but my understanding is that this will only
> cause more code to become unreachable in the particular instantiation.  Thus
> as far as 'noreturn' is concerned, the uninstantiated template represents a
> correct under-approximation of the unreachable code in all instantiation.
>  Is this logic sound?

I think the generalization goes something like this:

To get a false positive with this approach it would be necessary for
edges to be added to the CFG during the transformation from template
pattern to template instantiation.

So far as I can think this shouldn't be the case - my only concern
would be exceptions. But it seems catch blocks are considered
reachable even when the corresponding try has no code at all, so, at
least for now, these aren't an issue. (but if we started adding in
edges to catch blocks only when they were reachable, we'd have to err
on the side of caution for any dependent expressions that might throw
- adding edges to all catch blocks in such a case, rather than adding
no edges initially & only adding the required edges at instantiation
time)

> If it is, my main concern then comes down to the following:
>
> (1) Are there are other issues that we are not considering where the
> uninstantiated template doesn't faithfully provide an under approximation of
> the reachable code?

I've tested the exception case above & that seems to be always
reachable anyway. I'll try my hand at looking through the template
instnantiation code to see if there's any cases where we might end up
adding edges rather than removing them.

> (2) If we decide this approach is viable, which uninstantiated templates do
> we analyze?  For performance reasons, I don't think we should analyze all
> the uninstantiated templates, as we may find ourselves repeatedly analyzing
> a huge portion of the Boost and the STL, etc.  My suggestion is that when we
> encounter an instantiation, we go and check if we have already analyzed the
> uninstantiated template for this warning, and if not, do the analysis.

By repeatedly, you mean in each translation unit, right? Because we
shouldn't have to visit them more than once per TU (we won't be
visiting them per insntantiation). But, yes, the idea of analyzing
every boost template someone includes when they're only using a subset
does sound a little costly - I might be interested to compare times,
though.

Only analyzing instantiated templates (& keeping a list of those
templates we've already seen/analyzed) seems OK to me, assuming that
didn't represent a prohibitive memory cost to maintain the set of
visited templates (can't see why it should, but it is a small
time/space tradeoff all the same).

> (3) Somewhat tangential, we would want to analyze explicit template
> instantiations an their own, since they represent "concrete" functions that
> were provided, not instantiated ones.

I'll double check this, but I assume my prototype changes already
account for this, since they didn't regress your test cases.

> (4) I don't know the if the CFGBuilder is going to work all that well on
> uninstantiated templates.  There may be bugs here, or other fundamental
> issues (which i don't know about).

I'm planning to keep plodding along compiling LLVM & Clang with
-Weverything (with a few things disabled) so I'll see what comes of
it. With my prototype I didn't have any regressions across the Clang
test suite, for what that's worth.

> (5) We wouldn't want to do this approach wholesale for all analyses.  I
> think it mainly makes sense for the reachable code warnings, e.g.
> -Wunreachable-code or warning if a function doesn't return.  Analyses like
> -Wuninitialized, etc., would want to keep working on instantiated templates
> since (a) they need the more precise semantics of the instantiated templates
> to work correctly and (b) it is okay if we flag a warning in one
> instantiation that doesn't appear in an other (since that is a genuine
> issue).

The prototype patch attached to my previous email already accounted
for this (I think) - just not in a very nice way.

To formalize this requirement, though: Warnings that we only want to
emit in reachable code should use a pessimistic definition of
reachable code (err on the side of assuming code is unreachable) to
avoid false positives. Warnings about unreachable code should use a
pessimistic definition of unreachable code (err on the side of
assuming code is reachable) to avoid false positives.

So, yes, unreachable warnings can be done on templates but array
bounds checking, return paths, etc, should be done on instantiations.
Though, ideally, we could find some way to emit
instantiation-independent problems once, rather than once for every
instantiation (see the array bounds test case modifications in my
previous patch for an example that exposes this problem) - either by
analyzing the template first & trying to avoid
instantiation-independent cases when we analyze the instantiation, or
simply by filtering out the duplicates in some way whenever we visit
an instantiation.

Thanks again for your help,
- David