[PATCH] D153059: [-Wunsafe-buffer-usage] Group parameter fix-its

Ziqing Luo via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 21 12:57:07 PDT 2023


ziqingluo-90 added inline comments.


================
Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:2077
+
   for (const auto &[VD, Fixables] : FixablesForUnsafeVars.byVar) {
     FixItsForVariable[VD] =
----------------
NoQ wrote:
> There's a bug in variable naming: `FixablesForUnsafeVars`actually contains fixables for *all* variables. Rashmi correctly renames it in D150489 to reflect its actual behavior. So IIUC you're generating fixits for all variables here, which is probably not what you want.
Thanks for pointing out the incorrectness of the variable name.   Actually, whether a variable is unsafe is irrelevant to this function.  The function only cares about all variables that need fixes.  So I think the incorrect name of the variable does not affect the functionality of my changes here.


================
Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:2278-2281
+          if (isParameterOf(Impl.first, D))
+            ParmsNeedFix.insert(Impl.first);
+          if (isParameterOf(Impl.second, D))
+            ParmsNeedFix.insert(Impl.second);
----------------
NoQ wrote:
> Does this really go in both directions? Shouldn't it be just one direction?
> 
> ```lang=c++
> void foo(int *p) {
>   int *q = new int[10];
>   p = q;
>   q[5] = 7;
> }
> ```
> In this case `q` is an unsafe local buffer, but `p` is a (mostly) safe single-object pointer that doesn't need fixing, despite implication from `q` to `p`. Of course this example is somewhat unrealistic (people don't usually overwrite their parameters), but it is also the only situation where the other direction matters.
You are right!  It should be one-direction.


================
Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:2293
+    //      search of connected components.
+    if (!ParmsNeedFix.empty()) {
+      auto First = ParmsNeedFix.begin(), Last = First;
----------------
NoQ wrote:
> What about the situation where params aren't seen as unsafe yet, but they're discovered to be unsafe later? Eg.
> ```
> void foo(int *p, int *q) {
>   int *p2 = p;
>   p2[5] = 7;
>   int *q2 = q;
>   q2[5] = 7;
> }
> ```
> Will we notice that `p` and `q` need to be fixed simultaneously?
> 
> ---
> 
> I suspect that the problem of parameter grouping can't be solved by pre-populating strategy implications between all parameters. It'll either cause you to implicate all parameters (even the ones that will never need fixing), or cause you to miss connections between parameters that do need fixing (as the connection is discovered later).
> 
> Connection between parameters needs to be added *after* the implications algorithm has finished. And once it's there (might be already there if I missed something), then this part of code probably becomes unnecessary.
Yes.  They will be noticed.  Here is why:

The code here first forms a ring over `p` and  `q` in the assignment (directed) graph:  
```
p <--> q
```
Then the two initialization statements (implemented in [[ https://reviews.llvm.org/D150489 | D150489 ]]) add two more edges to the graph:
```
p2 --> p <--> q <-- q2
```
The algorithm later searches the graph starting from unsafe variables `p2` and `q2`, respectively,  Starting from `p2`, the algorithm discovers that `p2` and `p`, as well as `p` and `q` depend on each other.  Starting from `q2`,  the algorithm discovers that `q2` and `q`, as well as `p` and `q` depend on each other.   The dependency graph is sort of undirected. So eventually, the four variables `p2`, `p`, `q`, `q2` are in the same group.



================
Comment at: clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp:13-15
+   expected-note{{change type of 'p' to 'std::span' to preserve bounds information, and change 'q' and 'a' to 'std::span' to propagate bounds information between them}} \
+   expected-note{{change type of 'q' to 'std::span' to preserve bounds information, and change 'p' and 'a' to 'std::span' to propagate bounds information between them}} \
+   expected-note{{change type of 'a' to 'std::span' to preserve bounds information, and change 'p' and 'q' to 'std::span' to propagate bounds information between them}}
----------------
NoQ wrote:
> QoL thing: //"to propagate bounds information between them"// isn't the correct reason in this case. Bounds information doesn't propagate between `p` and `q` and `a`. Each of them carries its own, completely unrelated bounds information.
> 
> We need to come up with a different reason. Or it might be better to combine the warnings into one warning here, so that we didn't need a reason:
> ```
> warning: 'p', 'q' and 'a' are unsafe pointers used for buffer access
> note: change type of 'p', 'q' and 'a' to 'std::span' to preserve bounds information
> ```
> 
> This gets a bit worse when parameters are implicated indirectly, but it still mostly works fine, for example:
> ```
> void foo(int *p, int *q) {
>   int *p2 = p;
>   p2[5] = 7;
>   int *q2 = q;
>   q2[5] = 7;
> }
> ```
> would produce:
> ```
> warning: 'p2' and 'q2' are unsafe pointers used for buffer access
> note: change type of 'p2' and 'q2' to 'std::span' to preserve bounds information, and
>       change 'p' and 'q' to 'std::span' to propagate bounds information between them
> ```
> 
> ---
> 
> In any case, this shows that the relationship between parameters (grouped together simply for being parameters) is very different from the relationship within groups of variables implicated by assignments (even if two or more of them are parameters). All the way down to the warning/note printer, we probably need to continue giving the parameter relationship a special treatment.
This is a good argument!  An edge from `p` to `q` in an assignment graph stands for not only that fixing `p` implicates fixing `q` but also that the bounds information is propagated from `q` to `p`.   In this sense, the way I connect parameters is inappropriate.

But my concern is that although treating parameters specially could improve the quality of the diagnostic messages, the code may be more complex and less re-used.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153059/new/

https://reviews.llvm.org/D153059



More information about the cfe-commits mailing list