[cfe-dev] fix for Clang PR 8419
Zhanyong Wan (λx.x x)
wan at google.com
Fri Nov 19 23:26:32 PST 2010
Thanks for thinking through this and sharing your analysis, Ted! I'm
all for fixing this in a principled way. However, the crash I
reported is blocking some experiments I'm doing. Therefore I'd really
like to put in a fix for the symptom first to unblock me, and then we
can take the time necessary to fix the whole thing the right way.
Would you oppose to that? Thanks,
On Fri, Nov 19, 2010 at 11:07 PM, Ted Kremenek <kremenek at apple.com> wrote:
> On Nov 19, 2010, at 5:24 PM, Zhanyong Wan (λx.x x) wrote:
>
>> I'll document this in my new patch. Also, how would you suggest to
>> add a test to catch violation of #2?
>
> Stepping back, I think we may need some extra design here, either in the ASTs or in the analyzer, to model lvalue-to-rvalue conversions correctly for C++.
>
> With C, lvalue-to-rvalue conversions happen in specific cases, e.g.:
>
> - Loading from a variable
>
> - Dereferencing a pointer
>
> Semantically such conversions result in a "load" of a value from a memory address. This is what 'EvalLoad()' does in GRExprEngine. Because the places where such conversions occur in C are fairly limited, up to now we have been able to model lvalue-to-rvalue conversions without much effort.
>
> With C++ (especially with the use of references) there is a lot more cases to reason about. It raises the question of whether we should try to recreate the logic in codegen in GRExprEngine to infer all of these cases, or to try and represent them explicitly in the AST.
>
> There are advantages to representing such conversions explicitly in the AST. For one, the distinction of an lvalue versus an rvalue in the CFG probably evaporates. Any place where we see an lvalue-to-rvalue conversion simply means the subexpression was a pointer (or reference), and the conversion represents a load. Moreover, different analyses (not just GRExprEngine) will be able to more adeptly reason about the semantics of analyzed code without having the burden to re-infer where all the lvalue-to-rvalue conversions take place.
>
> The other advantage is that it takes care of discrepanancies between C and C++ like the following:
>
> $ cat t.c
> void foo() {
> int a = 0;
> ++(++a);
> }
>
> $ clang -fsyntax-only t.c
> t.c:3:3: error: expression is not assignable
> ++(++a);
> ^ ~~~~~
> 1 error generated.
>
> $ clang -fsyntax-only -x c++ t.c
> $
>
> Here C and C++ differ on the semantics of '++'. When the file is compiled as C code, we get an error because the expression '++a' evaluates to an rvalue. When we compile the code as C++, however, the expression evaluate to an lvalue. This difference isn't represented in the AST at all.
>
> NOTE: This is also another reason why GRExprEngine might crash in some cases with '++' on C++ code, because it assumes the C semantics.
>
> To explain the difference some more, consider:
>
> int a = 0;
> int b = ++a;
>
> With C, the '++a' evaluates to an rvalue immediately, which is assigned to 'b'. In C++, the expression '++a' first evaluates to an lvalue, and then there is an implicit lvalue-to-rvalue conversion that loads the new value. Of course the compiler doesn't need to generate more IR to represent this at the execution level, but this is what is conceptually happening in the language.
>
> So to summarize, these discrepancies of how lvalues and rvalues are reasoned about in C versus C++ is really something we need to solve systematically in the analyzer (and/or ASTs) with a cohesive solution. That's why I think this individual case that you reported is only symptomatic of a bigger issue that needs a well-designed solution.
>
> Cheers,
> Ted
--
Zhanyong
More information about the cfe-dev
mailing list