[cfe-dev] Should we build semantically invalid nodes?

Thu Oct 23 08:29:49 PDT 2008

On Thu, Oct 23, 2008 at 12:02 AM, Argiris Kirtzidis <akyrtzi at gmail.com> wrote:
> Doug Gregor wrote:
>>
>> Hmm... in the case of an error when type-checking the casts, we should
>> not actually build a CXXCastExpr node. Rather, Sema::ActOnCXXCasts
>> should return "true" so that the parser knows that there was an error.
>>
>> I suggest that both CheckConstCast and CheckReinterpretCast return a
>> bool that is true when there is an error and false otherwise (this is
>> the scheme used elsewhere in Clang). Then, ActOnCXXCasts will never
>> build an ill-formed node in the AST (which is a good thing).
>>
>
> Is there any harm in building semantically invalid nodes ?

Yes, there is. It means that consumers of these nodes have to be able
to cope with potentially bad inputs. Then, each semantic-checking
routine needs to cope with (1) all correct inputs, (2) all incorrect
inputs that could come from a program that was well-formed up until
this point, and (3) all broken inputs that come from previously
ill-formed code. The standard always tells us what (1) and (2) can be
(sometimes directly, sometimes through exclusion), but (3) is an
almost unbounded set of bogus input depending on the twisted
imagination of our users and our ability to mangle bad inputs into
other bad inputs.

> It will not affect clients that care about semantics since they will not
> analyse the AST if there are errors (there could be invalid decls).
> The clients that only care about the textual representation of the program,
> and not about the semantics, will get seriously impacted by not building the
> nodes, e.g. a refactoring client will not be able to pickup the use of a
> variable because the expression that references it is not added to the AST.

If a client doesn't care about semantics, it can either (1) let Sema
do its work and then ignore the resulting types, or (2) provide a
different Action that doesn't do the type-checking.

> To mirror the decls marked as "invalid" we could have exprs marked as
> "invalid" too.

GCC does this, where essentially any node in the AST can be
"error_mark_node" if there was an error. A *lot* of code in GCC is
dedicated to checking for and propagating error_mark_node; it probably
ends up costing them performance, but the real cost is that they end
up fixing a lot of tiny "ice-on-invalid" bugs where someone forgot to
check for some outlandish ill-formed code that results in a weird AST
node or an error_mark_node where it isn't expected. It's always seemed
like a losing battle to me, and the code is littered with
error_mark_node checks.

  - Doug