[PATCH] D137379: [-Wunsafe-buffer-usage] Add warnings for unsafe buffer accesses by array subscript operations

Mon Dec 5 18:58:02 PST 2022

NoQ added inline comments.

================
Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:168-169
+  static Matcher matcher() {
+    // FIXME: What if the index is integer literal 0? Should this be
+    // a safe gadget in this case?
+    return stmt(
----------------
xazax.hun wrote:
> NoQ wrote:
> > aaron.ballman wrote:
> > > xazax.hun wrote:
> > > > As per some of the discussions, in the future the compiler might be able to recognize certain safe patterns, e.g., when there is a simple for loop with known bounds, or when both the index and the array size is statically known.
> > > > 
> > > > I think here we need to make a very important design decision: Do we want the gadgets to have the right "safety" category when it is created (e.g., we have to be able to decide if a gadget is safe or not using matchers), or do we want some mechanisms to be able to promote an unsafe gadget to be a safe one? (E.g., do we want to be able to prove some unsafe gadgets safe using dataflow analysis in a later pass?)
> > > (FWIW, this is a great question and I really appreciate you asking it!)
> > My original design implies that safe gadgets are a separate hierarchy, so there will be a new gadget class for index zero, and this gadget will be changed to skip index zero. But I don't immediately see why such early separation is strictly necessary, other than for a bit of extra type safety (extra virtual methods of the `UnsafeGadget` class don't make sense on safe gadgets). We *do* have time to make this distinction later, before we get to emitting warnings.
> > 
> > So maybe eventually we'll end up replacing `isSafe()` with a pure virtual method and deprecate `SafeGadget` and `UnsafeGadget` base classes, if we see it cause too much duplication or it turns out that the extra analysis necessary to establish safety can't be nicely implemented in ASTMatchers. In this case I'll admit that I over-engineered it a bit.
> I'd strongly prefer `isSafe()` method over inheritance. While adding safe and unsafe gadgets for zero indices is not too bad, if we plan to expand this to more cases, like known array size + compile time constant index, it will get harder and harder to keep all the gadgets in sync. We can end up missing cases or detecting a code snippet both as safe and unsafe. The `isSafe` method would basically get rid of a whole class of problems, where safe and unsafe versions of the gadgets are overlapping. That being said, I am not insisting doing this change in this PR, it is OK doing the change later in a follow-up.
Ok so we've had an interesting conversation about this, it's actually harder than that and we'll probably have a much more divergence between unsafe gadgets and safe gadgets covering these operations. The problem is that most unsafe gadgets don't allow fixing code anyway, you typically need more context. For example, `ptr[i]` may represent unsafe code and `ptr[0]` may represent safe code, but that doesn't make `ptr[0]` a safe gadget, because it doesn't make it //fixable//. In particular, `&span[0]` crashes when `span` is empty, whereas `ptr[0]` doesn't crash when `ptr` points to one-past-end of the array, so blinding replacing `ptr[0]` with `span[0]` will break some code.

Of course even simple `ptr`, regardless of context, is always fixable to `span.data()`. But that's a low-quality fix that ruins libc++ runtime checks. So we'll do our best to avoid such fixes unless the extra context makes them inevitable.

So I suspect that we'll allow safe and unsafe gadgets to //partially overlap// and exclude unsafe gadgets from fixit machine entirely (i.e. move the `getFixits()` method to the `SafeGadget` class).

So, like I said in https://reviews.llvm.org/D137348?id=472990#inline-1345337, @malavikasamak is looking into this more deeply.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137379/new/

https://reviews.llvm.org/D137379