[cfe-dev] [analyzer][RFC] Get info from the LLVM IR for precision

Wed Sep 9 08:01:09 PDT 2020

Continuing with this topic, I have spent some time and investigated the
GlobalsModRef analysis in detail. Below are my findings about what is
missing currently from our infrastructure to effectively implement the
'pureness' analysis:
1) Clang CFG does not handle strongly connected components (SCC). Globals
modref should do a bottom-up SCC traversal of the call graph. First we must
analyze the leaf SCCs and then can we populate up the data in the call
graph. We shall handle functions that strictly call each other (they form
an SCC in the graph) in one step.
2) We cannot effectively get the "use information" from the Clang AST.
During the mod/ref analysis we should take a look at all the uses of one
global variable (see for e.g. GlobalsAAResult::AnalyzeUsesOfPointer). In
LLVM Every `Value` has a "use list" that keeps track of which other Values
are using the Value.

Solving 1) seems quite straightforward. We could implement the
Kosaraju-Sharir algorithm on the CFG: have a topological sort on the
reverse CFG and then run a standard DFS.

2) seems a bit more complex.
- One idea could be to do an AST visitation and build up the "user info"
during that.
- Another, more effective approach could be to register a custom
ASTMutationListener and then overwrite the DeclarationMarkedUsed
<https://clang.llvm.org/doxygen/classclang_1_1ASTMutationListener.html#a4d05aa6f36bd676f21764e42c4b91ffc>
method. This method has the advantage that we don't have to traverse the
whole AST to gather the "user" info, rather we can build that up during the
parsing. The detrimental effects are that this could be a serious
architectural change.

Artem, Gabor,
What do you think, what do you suggest, is it worth going on and
implementing SCCs into the Clang CFG? Regarding 2) is it worth putting much
effort implementing that, which method would you prefer?
Perhaps implementing all these things would be a wasted effort, once we
realize that we do want to go forward with CIL (Clang Intermediate
Language). Should we take a few steps back and rather put (huge) efforts
into CIL, what do you think? In my opinion, CIL could take years until
usable, so it does seem as if it is worth to start implementing the missing
parts for the Clang CFG and AST.

Thanks,
Gabor

On Tue, Aug 25, 2020 at 9:37 AM Gábor Márton <martongabesz at gmail.com> wrote:

> > And as John says, that'd have the advantage of being more predictable;
> we'd no longer have to investigate sudden changes in analysis results that
> are in fact caused by backend changes.
> I believe that all individual LLVM passes are implemented in a way that we
> can reuse them in any exotic pipeline. Of course there are dependencies
> between the passes, but besides that I don't think that Clang backend
> changes should matter that much. Otherwise, custom pipelines would be a
> nightmare to maintain.
>
> > In particular i'm worried for people who treat analyzer warnings as
> errors in their builds; for them any update in the compiler would now cause
> their build to fail
> Well, we could protect them by swallowing all the diags from the CodeGen
> part. And if CodeGen has diags then we could omit the IR.
>
> > So i believe that implementing as many of these analyses over the Clang
> CFG (or in many cases it might be over the AST as well) would be beneficial
> and should be done regardless of this experiment. Gabor, how much did you
> try that? Because i believe you should try that and compare the results, at
> least for some analyses that are easy to implement.
> Yeah, I agree that it is worth trying to implement at least the simplest
> ones in the Clang CFG. Thus we would see if anything is missing from our
> infra in the CSA and we could compare the results and their performance. I
> am thinking about starting with the pureness info, that involves
> implementing GlobalsModRef over the Clang CFG.
>
> > The reason why the use of LLVM IR in the static analyzer gets really
> interesting is because there are already a huge lot of analyses already
> implemented over it and getting access to them "for free" (in terms of
> implementation cost) is fairly tempting. I think that's the only real
> reason;
> There is another reason (as G. Horvath mentions as well): many of the
> analyses are quite painful to implement on our current CFG compared to an
> already lowered representation like the LLVM IR. However, I agree that
> maybe it should not be the LLVM IR that we need to lower. There is a
> desire/attempt to use MLIR in Clang. I can't wait to hear the presentation
> about CIL (Common MLIR Dialect for C/C++ and Fortran) in the upcoming LLVM
> dev meeting, it would be great to know the status.
> Still, I think it could take years until we can have a proper Clang
> Intermediate Language incorporated into the Clang CFG. Contrary to this, we
> could immediately start to use already implemented analyses on top of the
> LLVM IR.
>
> Gabor
>
>
> On Mon, Aug 17, 2020 at 1:31 PM Gábor Horváth <xazax.hun at gmail.com> wrote:
>
>>
>> On Sun, 16 Aug 2020 at 21:57, Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>>
>>> So i believe that implementing as many of these analyses over the Clang
>>> CFG (or in many cases it might be over the AST as well) would be beneficial
>>> and should be done regardless of this experiment.
>>>
>>
>> While I do agree that this would be awesome, I think many of those
>> analyses are quite painful to implement on our current CFG compared to an
>> already lowered representation like the LLVM IR which can be canonicalized
>> and there are fewer corner cases and peculiarities to handle compared to
>> the C++ language. Having the option to derive certain information from a
>> representation that is easier to work with for some purposes might be
>> useful for future analyses as well, not only for leveraging currently
>> implemented analyses. Having a proper Clang IR could of course void this
>> argument.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200909/022afcad/attachment-0001.html>