[PATCH] D136811: -Wunsafe-buffer-usage: WIP: RFC: NFC: User documentation.

Thu Nov 3 14:05:51 PDT 2022

xazax.hun added inline comments.

================
Comment at: clang/docs/SafeBuffers.rst:31
+    convert large amounts of old code to conform to the warning;
+  - Attribute ``[[unsafe_buffer_usage]]`` lets you annotate custom functions as
+    unsafe, while providing a safe alternative that can often be suggested by
----------------
aaron.ballman wrote:
> xazax.hun wrote:
> > aaron.ballman wrote:
> > > Hmmmm that attribute name looks problematic (more below in that section).
> > With C++ modules gaining some momentum, I wonder if it would be possible/beneficial to be able to annotate module import/export declarations with this attribute to signal that the APIs and the content in a module was not yet hardened. 
> I think it makes sense on module export (the module author knows whether they are buffer-safe or not), but what do you have in mind for import where the user may not be in a position to make that assertion validly?
Wouldn't `only_safe_buffers` make sense on the import side to ensure that no unhardened code is being imported from a specific module?

================
Comment at: clang/docs/SafeBuffers.rst:40-41
+  - Finally, in order to avoid bugs in newly converted code, the
+    Clang static analyzer provides a checker to find misconstructed
+    ``std::span`` objects.
+
----------------
jkorous wrote:
> xazax.hun wrote:
> > Isn't this overly specific? What about `string_view`s? 
> This is based on our current plans for near future.
> While we are positive we will use `std::span` in the FixIts it is very unlikely we'd use `string_view`. That's why having a checker for `std::span` is more important for us now.
Does this mean you'd recommend using a `span<char>` over `string_view` in all cases? I think that might surprise some users, so I wonder if it is worth documenting. 

================
Comment at: clang/docs/SafeBuffers.rst:287-288
+
+The automatic fixit associated with the warning would transform this array
+into an ``std::array``:
+
----------------
jkorous wrote:
> xazax.hun wrote:
> > I anticipate that doing these fixits for multi-dimensional arrays will be confusing. Is that in scope?
> Do you mean that the dimensions get reversed if we replace `int arr_3d[3][4][5]` with `array<array<array<int, 5>, 4>,  3> arr_3d_r`?
Yeah, I can imagine some people wanting bounds safety but having strong opinions about the reversed dimensions in their code. Moreover, if there are multi-dimensional array typedefs in the nesting, that can make things more complicated.

================
Comment at: clang/docs/SafeBuffers.rst:324-326
+and binary compatibility. In order to avoid that, the fix will provide
+a compatibility overload to preserve the old functionality. The updated code
+produced by the fixit will look like this:
----------------
jkorous wrote:
> xazax.hun wrote:
> > While this might be desirable for some projects, I can imagine other projects want to avoid generating the overload (just generate the fixes in one pass, and apply all of them in a second to avoid a "fixed" header render another TU invalid). But that would require fixits for the call sites. Do you plan to support those batch migration scenarios or is that out of scope?
> We currently don't have a machinery that would allow us to analyze the whole project at once.
> 
> Two passes are generally not enough - that number is something like number of unique functions in the deepest call-stack.
> 
> Part of our fixits is attributing functions as unsafe and application of such fixits can create new warnings in callers.
> 
> Let's take an example:
> 
> ```
> foo(int* ptr) {
>   ptr[5] = 5;
> }
> 
> bar(int* ptr) {
>   foo(ptr);
> }
> 
> baz(int* ptr) {
>   bar(ptr);
> }
> ```
> 
> Initially it is not obvious that `bar` or `baz` might get transformed.
> 
> In the first iteration a fixit for `foo` is emitted and when applied we get:
> ```
> foo(span<int> ptr) {
>   ptr[5] = 5;
> }
> 
> [[clang::unsafe_buffer_usage]] foo(int* ptr);
> 
> bar(int* ptr) {
>   foo(ptr);
> }
> 
> baz(int* ptr) {
>   bar(ptr);
> }
> ```
> 
> Only now we learn that `bar` (which can live in a diferent TU) should get transformed as well because it calls to unsafe function `foo`. But our analysis still won't see that `baz` will eventually have to be transformed too and it'll take another iteration.
I see, this explains it all, thanks! I'd love to have this motivating example in the design document why a 2-pass fix-it-all model does not work.

================
Comment at: clang/docs/SafeBuffers.rst:354-357
+  - Despite accepting a ``std::span`` which carries size information,
+    the fixed function still accepts ``size`` separately. It can be removed
+    manually, or it can be preserved, if ``size`` and ``buf.size()``
+    actually need to be different in your case.
----------------
jkorous wrote:
> xazax.hun wrote:
> > I see this part a bit confusing. While it is true that we cannot be sure whether the size is actually the size of the buffer, I wonder if we should at least generate something to call it out the user has further action items. It could be a `/*revise me*/` comment after the size, or documentation, both, or something else. 
> We have accepted that our analysis will have some limits. In situations where we have a reasonable suspicion that a certain parameter has the semantics of size we can somehow communicate that to the user. But it is very likely that in some cases the analysis won't be able to tell even that.
Not being able to cover all the cases sounds fine to me, and I'm glad you plan to communicate potential size params to the user when there is reasonable certainty.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136811/new/

https://reviews.llvm.org/D136811