[PATCH] D136811: -Wunsafe-buffer-usage: WIP: RFC: NFC: User documentation.

Thu Nov 3 13:37:20 PDT 2022

jkorous added a comment.

Thank you for the feedback Gábor!

================
Comment at: clang/docs/SafeBuffers.rst:36-37
+    hardened mode where C++ classes such as ``std::vector`` and ``std::span``,
+    together with their respective ``iterator`` classes, are protected
+    at runtime against buffer overflows. This changes buffer overflows from
+    extremely dangerous undefined behavior to predictable runtime crash.
----------------
xazax.hun wrote:
> Only buffer overflows, so comparing cases like:
> ```
> auto it1 = v1.begin();
> auto it2 = v2.begin();
> 
> if (it1 == it2) { ... }
> ```
> 
> are out of scope, right? 
> 
> The main reason I'm asking, because I am not sure how safe iterators are implemented and whether the same implementation would give opportunities for other safety checks without significant additional cost. 
Yes, this is out of scope for our diagnostics and code transformations. @ldionne Do you want to cover the safe iterator design part?

================
Comment at: clang/docs/SafeBuffers.rst:40-41
+  - Finally, in order to avoid bugs in newly converted code, the
+    Clang static analyzer provides a checker to find misconstructed
+    ``std::span`` objects.
+
----------------
xazax.hun wrote:
> Isn't this overly specific? What about `string_view`s? 
This is based on our current plans for near future.
While we are positive we will use `std::span` in the FixIts it is very unlikely we'd use `string_view`. That's why having a checker for `std::span` is more important for us now.

================
Comment at: clang/docs/SafeBuffers.rst:287-288
+
+The automatic fixit associated with the warning would transform this array
+into an ``std::array``:
+
----------------
xazax.hun wrote:
> I anticipate that doing these fixits for multi-dimensional arrays will be confusing. Is that in scope?
Do you mean that the dimensions get reversed if we replace `int arr_3d[3][4][5]` with `array<array<array<int, 5>, 4>,  3> arr_3d_r`?

================
Comment at: clang/docs/SafeBuffers.rst:324-326
+and binary compatibility. In order to avoid that, the fix will provide
+a compatibility overload to preserve the old functionality. The updated code
+produced by the fixit will look like this:
----------------
xazax.hun wrote:
> While this might be desirable for some projects, I can imagine other projects want to avoid generating the overload (just generate the fixes in one pass, and apply all of them in a second to avoid a "fixed" header render another TU invalid). But that would require fixits for the call sites. Do you plan to support those batch migration scenarios or is that out of scope?
We currently don't have a machinery that would allow us to analyze the whole project at once.

Two passes are generally not enough - that number is something like number of unique functions in the deepest call-stack.

Part of our fixits is attributing functions as unsafe and application of such fixits can create new warnings in callers.

Let's take an example:

```
foo(int* ptr) {
  ptr[5] = 5;
}

bar(int* ptr) {
  foo(ptr);
}

baz(int* ptr) {
  bar(ptr);
}
```

Initially it is not obvious that `bar` or `baz` might get transformed.

In the first iteration a fixit for `foo` is emitted and when applied we get:
```
foo(span<int> ptr) {
  ptr[5] = 5;
}

[[clang::unsafe_buffer_usage]] foo(int* ptr);

bar(int* ptr) {
  foo(ptr);
}

baz(int* ptr) {
  bar(ptr);
}
```

Only now we learn that `bar` (which can live in a diferent TU) should get transformed as well because it calls to unsafe function `foo`. But our analysis still won't see that `baz` will eventually have to be transformed too and it'll take another iteration.

================
Comment at: clang/docs/SafeBuffers.rst:352
+
+  - The compatibility overload contains a ``__placeholder__`` which needs
+    to be populated manually. In this case ``size`` is a good candidate.
----------------
aaron.ballman wrote:
> xazax.hun wrote:
> > Just a small concern. While users are not allowed to use identifiers like this, there is always a chance someone has a global variable of this name and the code ends up compiling. So, if we want to be absolutely safe, we should either have logic to pick a unique name, or not generate fixits in that case. Although, this is probably rare enough that we could just hand-wave.
> Agreed; we should not have a fixit that suggests use of a reserved identifier.
Our thinking is to actually use syntactically incorrect placeholder to make sure that users can't accidentally miss it.
This is related to:
https://reviews.llvm.org/D136811#inline-1324542

================
Comment at: clang/docs/SafeBuffers.rst:354-357
+  - Despite accepting a ``std::span`` which carries size information,
+    the fixed function still accepts ``size`` separately. It can be removed
+    manually, or it can be preserved, if ``size`` and ``buf.size()``
+    actually need to be different in your case.
----------------
xazax.hun wrote:
> I see this part a bit confusing. While it is true that we cannot be sure whether the size is actually the size of the buffer, I wonder if we should at least generate something to call it out the user has further action items. It could be a `/*revise me*/` comment after the size, or documentation, both, or something else. 
We have accepted that our analysis will have some limits. In situations where we have a reasonable suspicion that a certain parameter has the semantics of size we can somehow communicate that to the user. But it is very likely that in some cases the analysis won't be able to tell even that.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136811/new/

https://reviews.llvm.org/D136811