[cfe-dev] how to tolerate the assertion failures in llvm and clang

Mon Jun 25 12:22:48 PDT 2018

On Mon, Jun 25, 2018 at 12:16 PM George Karpenkov <ekarpenkov at apple.com>
wrote:

>
> On Jun 25, 2018, at 12:13 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Jun 25, 2018 at 11:51 AM George Karpenkov <ekarpenkov at apple.com>
> wrote:
>
>>
>> On Jun 25, 2018, at 9:38 AM, David Blaikie via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>> So, yes, the reading of the Clang AST is a bit of a special case & hasn't
>> met this interpretation of "do not crash on arbitrary inputs" - but the
>> intent is that it does not crash on ASTs produced by the same version of
>> Clang (if you model the "system" as the totality of a single clang writing
>> something to disk then reading it back again - that is not expected to
>> crash). So any assertions that are firing in that situation should be, I
>> believe, considered to be programmer errors.
>>
>> Are you going outside that scope - and dealing with 'untrusted' ASTs
>> potentially generated from other versions of Clang? Coming from unknown
>> sources, etc?
>>
>>
>> Generating AST by hand, or taking them from elsewhere is known to be very
>> fragile, and we don’t really have a check for it.
>> In that case assertions are a *good* thing, as at least they say that
>> something went wrong as opposed to e.g. silently ignoring the AST.
>>
>
> Depends what the source is/what they're doing - might be a "best effort"
> sort of thing. "Hey, I have some AST lying around - analyze whatever parts
> of it you can".
>
>> If so, then, yeah - forking is probably the only good bet, I would
>> imagine.
>>
>>
>> A proper fix here would be to e.g. have an ASTVerifier which could take
>> an AST DAG and then return whether it’s valid or not.
>>
>
> Yep, that'd be the ideal - though that's a lot of work (the set of
> invariants is large/not well known - basically all we have is "ASTs are
> correct by construction in Clang" as the definition of invariants) & I
> could imagine that these folks aren't in a position to do all that work -
> so "try to load it and bail out on assertions/crashes" might be the best
> tradeoff for them.
>
>
> Sure, but I don’t see how not having assertions would help.
> By the time they have started running over the incorrect AST the tool is
> already in the broken state,
> and there’s no way to get out of there.
>
> A “best effort” solution would be to have a wrapper script which would
> restart the analysis without importing,
> and again, assertions would help here to know when to restart.
>

Sure - I think that's what everyone's talking about on this thread.

(OK, yeah, some of the queries were around whether parts of the AST could
be ignored/elided when they failed to maintain the invariants - and, yes,
that's probably impractical owing to the nature of the AST as not being...
a tree. (as we joke, it's more of a "concrete semantic graph"... ) - so,
yeah, ignoring parts of the AST that are invalid isn't especially practical
(I could imagine some really cunning fork/operation based probing to find
out what you can/can't do to this invalid AST, but it'd be super tricky
because of the lazy loading of ASTs & invariant checking - very
impractical))

>
>
>
>> Any solutions bypassing asserts are *bad* as they actively hide bugs,
>> which decreases the overall quality.
>>
>> Displaying errors to users is a different matter entirely, and should be
>> done in a different layer.
>>
>>
>> If you're within that scope - patches and bugs are likely to be quite
>> welcome. Though, again, if you're dealing with totally arbitrary C++ -
>> Clang is far from fuzzer-clean and it's not too hard to find
>> (especially/mostly invalid) inputs that can crash Clang - if those would
>> present a major problem to your product, then again forking may be
>> necessary (as much as it might be nice to cleanup all/most of those cases,
>> it would probably be a large undertaking).
>>
>> On Sun, Jun 24, 2018 at 2:25 AM Gábor Márton via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> We are struggling with the same problem with my team. We work on cross
>>> translation unit (CTU) static analysis. Sometimes the AST node of a
>>> function imported by the ASTImporter is not complete or wrong which causes
>>> assertion failures in the analysis engine or in the checkers. However if we
>>> had known that the function could not be imported then we wouldn't have
>>> started to import that, this way preventing the assert to fire and the
>>> crash of the process. Note, when the assertions are disabled, usually a
>>> segmentation fault happens.
>>>
>>> The original intention of the assertions in the LLVM infrastructure is
>>> to signal programming errors. They are really useful to catch errors - made
>>> by the LLVM/Clang programmer - as early as possible.
>>> See https://llvm.org/docs/CodingStandards.html#assert-liberally .
>>> Generally, we say that we should distinguish between programming errors
>>> and errors in the input. Programming errors should be handled with asserts,
>>> errors in the input should be handled with other means (some projects use
>>> exceptions).
>>> However, in our special case of the CTU, the AST is an input. Which can
>>> be wrongly assembled by the ASTImporter. Unfortunately the checkers cannot
>>> distinguish whether the error is because of a programming error in the
>>> checker itself or whether the AST as an input is wrong.
>>>
>>> Here is what we thought so far: Do a fork before the actual import and
>>> if the child died then don't start the import at all. This seems quite
>>> complicated and performance issues may arise. Also this may just hide some
>>> real errors, so we did not start to implement it. Rather we have the
>>> strategy now to gradually fix all the assertions. Sooner or later the
>>> ASTImporter will provide a complete and correct AST and the checkers are
>>> also changed to be able to handle a crippled AST. Until then we have
>>> crashes but every week we have fewer. And very importantly, we plan to hide
>>> the crash report from our users since it is useful only to the developers.
>>>
>>> Hope this helps,
>>> Gabor
>>>
>>>
>>>
>>> On Sun, Jun 24, 2018 at 10:24 AM alan snape via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> My team is developing a static analysis tool based on clang and llvm,
>>>> but the assertion failures in the source code of llvm and clang will always
>>>> crash the program execution, which is not acceptable in a **stable
>>>> product**. The tool analyzes the functions one by one in the Call Graph SCC
>>>> order, so is there any way to tolerate the assertion failures and continue
>>>> the analysis on the next function when assertion failures occur on calling
>>>> some APIs of clang and llvm? (crashes only the analysis of the function
>>>>  (the analysis methods of the FunctionDecl) being analyzed, not the
>>>> entire program)
>>>>
>>>> Thanks,
>>>> Ella
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180625/7587d383/attachment.html>