[cfe-dev] [analyzer][RFC] Get info from the LLVM IR for precision

Wed Aug 5 14:22:28 PDT 2020

Just to be clear, we should definitely avoid having our analysis results 
depend on optimization levels. It should be possible to avoid that, 
right? The way i imagined this, we're only interested in picking up LLVM 
analyses, which can be run over unoptimized IR just fine(?) We should 
probably not be optimizing the IR at all in the process(?)

On 05.08.2020 12:17, Artem Dergachev wrote:
> I'm excited that this is actually moving somewhere!
>
> Let's see what consequences do we have here. I have some thoughts but 
> i don't immediately see any architecturally catastrophic consequences; 
> you're "just" generating llvm::Function for a given AST FunctionDecl 
> "real quick" and looking at the attributes. This is happening 
> on-demand and cached, right??? I'd love to hear more opinions. Here's 
> what i see:
>
> 1. We can no longer mutate the AST for analysis purposes without the 
> risk of screwing up subsequent codegen. And the risk would be pretty 
> high because hand-crafting ASTs is extremely difficult. Good thing we 
> aren't actually doing this.
>     1.1. But it sounds like for the CTU users it may amplify the 
> imperfections of ASTImporter.
>
> 2. Ok, yeah, we now may have crashes in CodeGen during analysis. 
> Normally they shouldn't be that bad because this would mean that 
> CodeGen would crash during normal compilation as well. And that's 
> rare; codegen crashes are much more rare than analyzer crashes. Of 
> course a difference can be triggered by #ifndef __clang_analyzer__ but 
> it still remains a proof of valid crashing code, so that should be rare.
>     2.1. Again, it's worse with CTU because imported ASTs have so far 
> never been tested for compatibility with CodeGen.
>
> Let's also talk about the benefits. First of all, *we still need the 
> source code available during analysis*. This isn't about peeking into 
> binary dependencies and it doesn't immediately aid CTU in any way; 
> this is entirely about improving upon conservative evaluation on the 
> currently available AST, for functions that are already available for 
> inlining but are not being inlined for whatever reason. In fact, in 
> some cases we may later prefer such LLVM IR-based evaluation to 
> inlining, which may improve analysis performance (i.e., less path 
> explosion) *and* correctness (eg., avoid unjustified state splits).
>
> On 05.08.2020 08:29, Gábor Márton via cfe-dev wrote:
>> Hi,
>>
>> I have been working on a prototype that makes it possible to access 
>> the IR from the components of the Clang Static Analyzer.
>> https://reviews.llvm.org/D85319
>>
>> There are many important and useful analyses in the LLVM layer that 
>> we can use during the path sensitive analysis. Most notably, the 
>> "readnone" and "readonly" function attributes 
>> (https://llvm.org/docs/LangRef.html) which can be used to identify 
>> "pure" functions (those without side effects). In the prototype I am 
>> using the pureness info from the IR to avoid invalidation of any 
>> variables during conservative evaluation (when we evaluate a pure 
>> function). There are cases when we get false positives exactly 
>> because of the too conservative invalidation.
>>
>> Some further ideas to use info from the IR:
>> - We should invalidate only the arg regions for functions with 
>> "argmemonly" attribute.
>> - Use the smarter invalidation in cross translation unit analysis 
>> too. We can get the IR for the other TUs as well.
>> - Run the Attributor 
>> <https://llvm.org/doxygen/structllvm_1_1Attributor.html> passes on 
>> the IR. We could get range values for return values or for arguments. 
>> These range values then could be fed to StdLibraryFunctionsChecker to 
>> make the proper assumptions. And we could do this in CTU mode too, 
>> these attributes could form some sort of a summary of these 
>> functions. Note that I don't expect a meaningful summary for more 
>> than a few percent of all the available functions.
>>
>> Please let me know if you have any further ideas about how we could 
>> use IR attributes (or anything else) during the symbolic execution.
>>
>> There are some concerns as well. There may be some source code that 
>> we cannot CodeGen, but we can still analyse with the current CSA. 
>> That is why I suppress CodeGen diagnostics in the prototype. But in 
>> the worst case we may run into assertions in the CodeGen and this may 
>> cause regression in the whole analysis experience. This may be the 
>> case especially when we get a compile_commands.json from a project 
>> that is compiled only with e.g. GCC.
>>
>> Thanks,
>> Gabor
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200805/5a5cf031/attachment.html>