[PATCH] D153059: [-Wunsafe-buffer-usage] Group parameter fix-its

Tue Jun 20 17:14:33 PDT 2023

NoQ added a comment.

I'll just leave this tiny bit of analysis here, that we did on the whiteboard a while ago.

> The "intermediate" overload `void f(std::span<int>, int *)` stays there but usually is not useful. If there are more than two parameters need to be fixed, there will be more such useless "intermediate" overloads.

This isn't even the biggest problem. The bigger problem is that //the process doesn't actually stop at Step 2//.

Let me rewrite this more slowly.

Step 0 - the initial code:

  // original body, original signature
  void foo(int *p, int *q, size_t sz) {
     int i = p[5]; // warning: unsafe buffer usage over 'p'
     int j = q[5]; // warning: unsafe buffer usage over 'q'
  }

Step 1 - fix parameter `p`:

  // FIXED: original body, new signature
  void foo(span<int> p, int *q, size_t sz) {
     int i = p[5];
     int j = q[5]; // warning: unsafe buffer usage over 'q'
  }

  // NEW: compatibility overload from step 1 - original signature
  [[unsafe_buffer_usage]]
  void f(int *p, int *q, size_t sz) {
     foo(span<int>(p, <#size#>), q, sz);
  }

Step 2 - fix parameter `q` as well:

  // FIXED: original code, even newer signature
  void foo(span<int> p, span<int> q, size_t sz) {
     int i = p[5];
     int j = q[5];
  }

  // NEW: compatibility overload from step 2
  [[unsafe_buffer_usage]]
  void foo(span<int> p, int *q, size_t sz) {
     foo(p, span<int>(q, <#size#>), sz);
  }

  // UNCHANGED: compatibility overload from step 1 - still carries original signature
  [[unsafe_buffer_usage]]
  void foo(int *p, int *q, size_t sz) {
     foo(span<int>(p, sz), q, sz); // warning: unsafe buffer usage over 'q'
  }

Step 3 - now we have a warning inside the overload from step 1 because it's no longer calling the newest safe function, but it's now calling the compatibility overload from step 2. So we need to change the parameter `q` of the bottom-most function to `span`.

  // UNCHANGED: original code, even newer signature - unchanged
  void foo(span<int> p, span<int> q, size_t sz) {
     int i = p[5];
     int j = q[5];
  }

  // UNCHANGED: compatibility overload from step 2
  [[unsafe_buffer_usage]]
  void foo(span<int> p, int *q, size_t sz) {
     foo(p, span<int>(q, sz), sz);
  }

  // FIXED: compatibility overload from step 1 - new safe signature
  [[unsafe_buffer_usage]]
  void foo(int *p, span<int> q, size_t sz) {
     foo(span<int>(p, sz), q, sz);
  }

  // NEW: compatibility overload from step 3 - original signature of the overload from step 1, aka simply original signature
  [[unsafe_buffer_usage]]
  [[unsafe_buffer_usage]] // sic!
  void foo(int *p, int *q, size_t sz) {
     foo(p, span<int>(q, <#size#>), sz); // warning: unsafe buffer usage over 'p'
  }

There are a few things that look wrong here.

The set of overloads still makes sense; we get ourselves 4 overloads in all combinations of spans and raw pointers. This on its own makes sense.

We've also somehow got the new raw overload (with two raw pointers) carry //two instances// of `[[unsafe_buffer_usage]]`. It lowkey makes sense because the function is unsafe with respect to both parameters, but we probably don't want people to write such code.

Now, here's where it gets really weird. You'd probably expect that new raw overload to call `foo(span<int>(p, sz), span<int>(q, sz), sz)`. If it just did that, it'd be perfectly reasonable code. But it doesn't do that! Indeed, by design the autogenerated compatibility overload calls the freshly-fixed function. Which in case of Step 3 was `foo(int *p, span<int> q, size_t sz)`.

Also, the new raw overload //already has a warning in it//! This is completely unacceptable. This happened because the function under fix was already annotated as `[[unsafe_buffer_usage]]`. The autofix wouldn't remove the attribute; the attribute is still completely appropriate there.

Finally, because there's a new warning, //the process doesn't stop after Step 3 either//. I'll leave imagining what Step 4 would look like as an easy exercise to the reader ^.^

----

So one way or another, we really don't want to be in the business of incrementally fixing parameters one-by-one. Fixing all parameters at once is much easier than fixing them incrementally and discarding all analysis state on each step. Analysis of "partially spanified" code is much harder than analysis of raw C-style code, so we prefer single-step transformation in most cases.

Another direction that we can explore is to avoid auto-fixing functions that already carry `[[unsafe_buffer_usage]]`. Indeed, it's already a compatibility overload and we know it; it's not *supposed* to be safe! This will prevent Step 3 from breaking things, so it'll leave us with 3 overloads - not ideal, but not incorrect either. It'll also eliminate the "double attribute" problem.

And we ultimately need to explore this direction because no matter how hard we avoid automatic partial transformations, people will still do manual partial transformations! And when they do, we need to demonstrate some reasonable behavior.

But for now let's focus on the basic use case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153059/new/

https://reviews.llvm.org/D153059