[PATCH] D153059: [-Wunsafe-buffer-usage] Group parameter fix-its

Fri Jun 23 15:33:39 PDT 2023

ziqingluo-90 added inline comments.

================
Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:2293
+    //      search of connected components.
+    if (!ParmsNeedFix.empty()) {
+      auto First = ParmsNeedFix.begin(), Last = First;
----------------
NoQ wrote:
> ziqingluo-90 wrote:
> > NoQ wrote:
> > > ziqingluo-90 wrote:
> > > > NoQ wrote:
> > > > > What about the situation where params aren't seen as unsafe yet, but they're discovered to be unsafe later? Eg.
> > > > > ```
> > > > > void foo(int *p, int *q) {
> > > > >   int *p2 = p;
> > > > >   p2[5] = 7;
> > > > >   int *q2 = q;
> > > > >   q2[5] = 7;
> > > > > }
> > > > > ```
> > > > > Will we notice that `p` and `q` need to be fixed simultaneously?
> > > > > 
> > > > > ---
> > > > > 
> > > > > I suspect that the problem of parameter grouping can't be solved by pre-populating strategy implications between all parameters. It'll either cause you to implicate all parameters (even the ones that will never need fixing), or cause you to miss connections between parameters that do need fixing (as the connection is discovered later).
> > > > > 
> > > > > Connection between parameters needs to be added *after* the implications algorithm has finished. And once it's there (might be already there if I missed something), then this part of code probably becomes unnecessary.
> > > > Yes.  They will be noticed.  Here is why:
> > > > 
> > > > The code here first forms a ring over `p` and  `q` in the assignment (directed) graph:  
> > > > ```
> > > > p <--> q
> > > > ```
> > > > Then the two initialization statements (implemented in [[ https://reviews.llvm.org/D150489 | D150489 ]]) add two more edges to the graph:
> > > > ```
> > > > p2 --> p <--> q <-- q2
> > > > ```
> > > > The algorithm later searches the graph starting from unsafe variables `p2` and `q2`, respectively,  Starting from `p2`, the algorithm discovers that `p2` and `p`, as well as `p` and `q` depend on each other.  Starting from `q2`,  the algorithm discovers that `q2` and `q`, as well as `p` and `q` depend on each other.   The dependency graph is sort of undirected. So eventually, the four variables `p2`, `p`, `q`, `q2` are in the same group.
> > > > 
> > > > The code here first forms a ring over `p` and  `q` in the assignment (directed) graph
> > > 
> > > I don't think it does. The ring is formed over the `ParamsNeedFix` list, which is empty because none of these parameters are listed in `UnsafeOps.byVar`.
> > Actually `p` and `q` will be in `ParamsNeedFix`.  Because they are implicated by `p2` and `q2`, who are unsafe variables.   `ParamsNeedFix` include parameters that need fix, either because they are used in unsafe operations or they are implicated by some Gadgets.
> > 
> > But anyway, I now think it's not a good idea to lump parameters with variables grouped by implication, as you pointed out that variable implication also implies bounds propagation. 
> > 
> > Instead, I think it might be more appropriate to make a variable group structured:  
> >   - Let's call it a //connected components// for a set of variables that need to be fixed together and sharing bounds information. (The term //connected components// has been used in the code to refer to a group of variables. It keeps its meaning here.)
> >   - Let's redefine a Variable Group to be a set of mutually disjoint //connected components//s (CCs).  If there are more than one CCs in a variable group,  each of the CCs contains at least one parameter.
> > 
> > For generating fix-its for a variable group, this patch still works as it only requires that variables in a variable group are either all fixed or given up as a whole.
> > For generating diagnostic messages, we now have more information about relations among grouped variables.
> > Actually `p` and `q` will be in ParamsNeedFix. Because they are implicated by `p2` and `q2`, who are unsafe variables.  `ParamsNeedFix` include parameters that need fix, either because they are used in unsafe operations or they are implicated by some Gadgets.
> 
> Ohh, right, it's populated twice, I even commented on the other part but didn't notice it populates the same list!
> 
> But in this case you'll probably join `p` and `q` in
> ```
> void foo(int *p, int *q) {
>   int *p2 = p;
>   p2[5] = 7;
>   int *q2 = q;
> }
> ```
> right? (In this case `q` is implicated by `q2` so it gets added to the list, but `q2` itself isn't unsafe or implicated by anything, so it doesn't need fixing, but connecting it to `p` would cause it to be fixed unnecessarily). So I still think this can't be correct unless you do it after the crawler finishes.
> 
> > But anyway, I now think it's not a good idea to lump parameters with variables grouped by implication, as you pointed out that variable implication also implies bounds propagation.
> 
> Yeah, I think it makes a lot more sense. Just look at variable groups already built by the existing crawler, and if more than one of them contains parameters, report all groups that contain parameters through one combined warning.
Though I'm going to remove this part,  achieving some agreement on this algorithm may help me understand the problem better.

> right? (In this case q is implicated by q2 so it gets added to the list, but q2 itself isn't unsafe or implicated by anything, so it doesn't need fixing, but connecting it to p would cause it to be fixed unnecessarily)
Right.  the code in this patch does not form `ParmsNeedFix` correctly.  `ParmsNeedFix` should be the set of parameters that are either unsafe or being implicated (transitively) by unsafe variables.   Then the group would be "correct" I think.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153059/new/

https://reviews.llvm.org/D153059