[cfe-dev] [analyzer][RFC] Get info from the LLVM IR for precision

Tue Sep 15 21:55:09 PDT 2020

When you read through my previous email, please replace all occurrences of
"CFG" with "CG". I wanted to refer to the CallGraph (CG) and not to the
ControlFlowGraph (CFG).

On Wed, Sep 9, 2020 at 5:01 PM Gábor Márton <martongabesz at gmail.com> wrote:

> Continuing with this topic, I have spent some time and investigated the
> GlobalsModRef analysis in detail. Below are my findings about what is
> missing currently from our infrastructure to effectively implement the
> 'pureness' analysis:
> 1) Clang CFG does not handle strongly connected components (SCC). Globals
> modref should do a bottom-up SCC traversal of the call graph. First we must
> analyze the leaf SCCs and then can we populate up the data in the call
> graph. We shall handle functions that strictly call each other (they form
> an SCC in the graph) in one step.
> 2) We cannot effectively get the "use information" from the Clang AST.
> During the mod/ref analysis we should take a look at all the uses of one
> global variable (see for e.g. GlobalsAAResult::AnalyzeUsesOfPointer). In
> LLVM Every `Value` has a "use list" that keeps track of which other Values
> are using the Value.
>
> Solving 1) seems quite straightforward. We could implement the
> Kosaraju-Sharir algorithm on the CFG: have a topological sort on the
> reverse CFG and then run a standard DFS.
>
> 2) seems a bit more complex.
> - One idea could be to do an AST visitation and build up the "user info"
> during that.
> - Another, more effective approach could be to register a custom
> ASTMutationListener and then overwrite the DeclarationMarkedUsed
> <https://clang.llvm.org/doxygen/classclang_1_1ASTMutationListener.html#a4d05aa6f36bd676f21764e42c4b91ffc>
> method. This method has the advantage that we don't have to traverse the
> whole AST to gather the "user" info, rather we can build that up during the
> parsing. The detrimental effects are that this could be a serious
> architectural change.
>
> Artem, Gabor,
> What do you think, what do you suggest, is it worth going on and
> implementing SCCs into the Clang CFG? Regarding 2) is it worth putting much
> effort implementing that, which method would you prefer?
> Perhaps implementing all these things would be a wasted effort, once we
> realize that we do want to go forward with CIL (Clang Intermediate
> Language). Should we take a few steps back and rather put (huge) efforts
> into CIL, what do you think? In my opinion, CIL could take years until
> usable, so it does seem as if it is worth to start implementing the missing
> parts for the Clang CFG and AST.
>
> Thanks,
> Gabor
>
> On Tue, Aug 25, 2020 at 9:37 AM Gábor Márton <martongabesz at gmail.com>
> wrote:
>
>> > And as John says, that'd have the advantage of being more predictable;
>> we'd no longer have to investigate sudden changes in analysis results that
>> are in fact caused by backend changes.
>> I believe that all individual LLVM passes are implemented in a way that
>> we can reuse them in any exotic pipeline. Of course there are dependencies
>> between the passes, but besides that I don't think that Clang backend
>> changes should matter that much. Otherwise, custom pipelines would be a
>> nightmare to maintain.
>>
>> > In particular i'm worried for people who treat analyzer warnings as
>> errors in their builds; for them any update in the compiler would now cause
>> their build to fail
>> Well, we could protect them by swallowing all the diags from the CodeGen
>> part. And if CodeGen has diags then we could omit the IR.
>>
>> > So i believe that implementing as many of these analyses over the Clang
>> CFG (or in many cases it might be over the AST as well) would be beneficial
>> and should be done regardless of this experiment. Gabor, how much did you
>> try that? Because i believe you should try that and compare the results, at
>> least for some analyses that are easy to implement.
>> Yeah, I agree that it is worth trying to implement at least the simplest
>> ones in the Clang CFG. Thus we would see if anything is missing from our
>> infra in the CSA and we could compare the results and their performance. I
>> am thinking about starting with the pureness info, that involves
>> implementing GlobalsModRef over the Clang CFG.
>>
>> > The reason why the use of LLVM IR in the static analyzer gets really
>> interesting is because there are already a huge lot of analyses already
>> implemented over it and getting access to them "for free" (in terms of
>> implementation cost) is fairly tempting. I think that's the only real
>> reason;
>> There is another reason (as G. Horvath mentions as well): many of the
>> analyses are quite painful to implement on our current CFG compared to an
>> already lowered representation like the LLVM IR. However, I agree that
>> maybe it should not be the LLVM IR that we need to lower. There is a
>> desire/attempt to use MLIR in Clang. I can't wait to hear the presentation
>> about CIL (Common MLIR Dialect for C/C++ and Fortran) in the upcoming LLVM
>> dev meeting, it would be great to know the status.
>> Still, I think it could take years until we can have a proper Clang
>> Intermediate Language incorporated into the Clang CFG. Contrary to this, we
>> could immediately start to use already implemented analyses on top of the
>> LLVM IR.
>>
>> Gabor
>>
>>
>> On Mon, Aug 17, 2020 at 1:31 PM Gábor Horváth <xazax.hun at gmail.com>
>> wrote:
>>
>>>
>>> On Sun, 16 Aug 2020 at 21:57, Artem Dergachev <noqnoqneo at gmail.com>
>>> wrote:
>>>
>>>>
>>>> So i believe that implementing as many of these analyses over the Clang
>>>> CFG (or in many cases it might be over the AST as well) would be beneficial
>>>> and should be done regardless of this experiment.
>>>>
>>>
>>> While I do agree that this would be awesome, I think many of those
>>> analyses are quite painful to implement on our current CFG compared to an
>>> already lowered representation like the LLVM IR which can be canonicalized
>>> and there are fewer corner cases and peculiarities to handle compared to
>>> the C++ language. Having the option to derive certain information from a
>>> representation that is easier to work with for some purposes might be
>>> useful for future analyses as well, not only for leveraging currently
>>> implemented analyses. Having a proper Clang IR could of course void this
>>> argument.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200916/9ed49098/attachment.html>