[PATCH] D136811: WIP: RFC: NFC: C++ Buffer Hardening user documentation.

Wed Nov 2 14:37:22 PDT 2022

jkorous added a comment.

Thank you for the feedback Aaron! We really appreciate the effort you put into this!

================
Comment at: clang/docs/SafeBuffers.rst:69-70
+containers such as ``std::span``, you can achieve bounds safety by
+*simply writing good modern C++ code*. This is what this solution is designed
+to help your codebase with.
+
----------------
aaron.ballman wrote:
> This makes it sound like the proposal doesn't cover C because there is no version of `std::span` there.
I read this as a question as: "Could an imaginary codebase benefit from the warnings if there were no fixits?".

I can imagine projects where it would be very useful to have 90% of the code guaranteed to be free of "potentially dangerous pointer arithmetic" and have such operations contained to the remaining 10% of the code. I can imagine a principled solution to bounds-safety in C but even just splitting the code into safe and 9x smaller "unsafe" part could be extremely useful when it comes to debugging, audits, automated and manual testing and similar. Having clang guarantee this property reduces the scope of code that needs to be processed and a deliberate use of this model by projects can unlock new possibilities.

That is why I feel that the proposal can benefit also C projects.

================
Comment at: clang/docs/SafeBuffers.rst:115
+   (TODO: Will automatic fixits be able to suggest custom containers or views?)
+   (TODO: Explain how to implement such checks in a custom container?)
+
----------------
aaron.ballman wrote:
> I think this would be a good idea, especially if you can use an example of a more involved (but still common) container. Like, how does someone harden a ring buffer?
I believe that we should stick to hardening low-level primitives - all that a ring buffer implementation has to do is to store data in `std::array` from hardened libc++ and it's all set.

================
Comment at: clang/docs/SafeBuffers.rst:128
+  ``+``, ``-``, ``+=``, ``-=``,
+      - unless that number is a compile time constant ``0``;
+      - subtraction between two pointers is also fine;
----------------
aaron.ballman wrote:
> One case I think this is going to trip up is: `this + 1` (which is not uncommon in practice: https://sourcegraph.com/search?q=context:global+-file:.*test.*+file:.*%5C.cpp+this+%2B+1&patternType=standard). How do users fix up that pointer arithmetic and do they ever get any safety or security benefits from it? (Note, this doesn't apply to `+=`, just `+`.)
That's a fair point and if we don't find a good solution we should ignore arithmetic with `this` and a constant.

================
Comment at: clang/docs/SafeBuffers.rst:133
+      - unless the pointer is a compile time constant ``0`` or ``nullptr``;
+      - a number of C/C++ standard library buffer manipulation functions
+        (such as std::memcpy() or std::next()) are hardcoded to act
----------------
aaron.ballman wrote:
> Eventually we should extend this to have an exhaustive list.
Absolutely! Is it ok if we populate the list as we land the implementation?

================
Comment at: clang/docs/SafeBuffers.rst:173
+  #pragma unsafe_buffer_usage begin
+
+Such pragmas not only enable incremental adoption with much smaller granularity,
----------------
aaron.ballman wrote:
> Have you considered allowing a short-hand form of the pragmas that are scoped to the current block scope (or file scope if applied at global scope)? e.g.,
> ```
> void func() {
>   {
>     #pragma only_safe_buffers
>     int stuff[10] = {};
>     int x = stuff[1]; // Warning: you're a bad person who should feel bad
>   }
>   int array[10] = {};
>   int y = array[1]; // this_is_fine.png
> }
> ```
We discussed other options but generally prefer simple pragmas for their flexibility.
Our current thinking is that for readability purposes we won't allow nesting of pragmas and would have end of scope to be explicit.
While this might be more verbose it would be dead simple to reason about.

================
Comment at: clang/docs/SafeBuffers.rst:181-188
+Even though similar functionality can be achieved with generic
+``#pragma clang diagnostic``, a custom pragma is preferable because
+the generic solution’s push/pop mechanism may be confusing. In particular,
+it’s not immediately obvious from the code which mode gets enabled after
+a “pop”, which makes it harder to reason about safety both for you and for
+the purposes of security audit. It is also generally valuable for such
+annotation to describe the property of the annotated code,
----------------
aaron.ballman wrote:
> > a custom pragma is preferable because the generic solution’s push/pop mechanism may be confusing.
> 
> Er, I'm not certain how much I agree with that. push/pop semantics on this have been around for a *long* time. I do agree that the user can write code such that another user may struggle to understand what the state of the system is after a pop operation, but that shouldn't be a deciding factor for whether we add a new pragma to do the same thing as another pragma. I think I'd rewrite this as:
> ```
> Similar functionality can be achieved with ``#pragma clang diagnostic``, but such use is discouraged because it
> reduces code readability due to being an imperative action rather than describing a property of the annotated
> code. That reduced readability also adds complexity to security code audits. Use of one of the dedicated
> pragmas is strongly recommended.
> ```
> or something along these lines. WDYT?
We'd love to arrive at a design that is very difficult to use incorrectly.

You are absolutely right that our preference for not using the diagnostic pragmas stems from the "untyped" pop operation.

Assuming this code is a starting point:
```
#pragma clang diagnostic push warning "-Wfoo"
// code
#pragma clang diagnostic pop //end of -Wfoo
// more code
```

if the user adds pragmas like this:
```
#pragma clang diagnostic push warning "-Wfoo"
#pragma clang diagnostic push warning "-Wsafe-buffers-usage"
// code
#pragma clang diagnostic pop //end of -Wfoo
// more code
#pragma clang diagnostic pop //end of -Wsafe-buffers-usage
```

they actually get this behavior:
```
#pragma clang diagnostic push warning "-Wfoo"
#pragma clang diagnostic push warning "-Wsafe-buffers-usage"
// code
#pragma clang diagnostic pop //end of -Wsafe-buffers-usage
// more code
#pragma clang diagnostic pop //end of -Wfoo
```

We should have the patch for pragmas ready to be put for review here in not too distant future. Maybe it'd be easier to discuss there?

================
Comment at: clang/docs/SafeBuffers.rst:213
+
+The attribute is NOT warranted when the function has runtime protection against
+overflows, assuming hardened libc++, assuming that containers constructed
----------------
aaron.ballman wrote:
> One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used".  But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example).
> 
> I think we should have more details on why the STL used at runtime doesn't matter, or if it still really does matter, we may want to reconsider the advice we give.
> 
> Also, we don't give similar advice for use of the pragmas. Should we? (Maybe split the advice as to when to use markings in general out into its own section and reference it from both the pragma and attribute sections?)
> One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used". But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example).

Our model depends on Standard Library used implementing the bounds checks.
With that the "container was used" distinction is absolutely crucial - that is what adds the bounds checks and already mitigates certain classes of bugs (even if the `span` passed in still has the same wrong size).

Let's assume we're starting with this buggy code:

```
void caller() {
  int buffer[10];
  callee(buffer, 20);
}

void callee(int* ptr, unsigned size) {
  ptr[size-1] = 42;
  ptr[size] = 42;
  ptr[100] = 42;
}
```
and transform it to (this is just a part of what we actually plan but hopefully illustrates the point):
```
void caller() {
  int buffer[10];
  callee(std::span<int>(buffer, 20), 20);
}

void callee(std::span<int> ptr, unsigned size) {
  ptr[size-1] = 42; // still unmitigated
  ptr[size] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++
  ptr[100] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++
}
```

================
Comment at: clang/docs/SafeBuffers.rst:260-263
+* The “replacement” parameter of ``[[deprecated]]``, which allows for automatic
+  fixits when the function has a drop-in replacement, becomes significantly more
+  powerful and flexible in ``[[unsafe_buffer_usage]]`` where it will allow
+  non-trivial automatic fixes.
----------------
aaron.ballman wrote:
> The deprecated attribute does not have a replacement argument, it has an argument for some kind of message to be printed when the deprecated interface is used. So I'm not certain what this is telling me either.
For better or worse the deprecated attribute is "overloaded" and this refers to the two-parameter version.

https://clang.llvm.org/docs/AttributeReference.html#deprecated

"When spelled as __attribute__((deprecated)), the deprecated attribute can have two optional string arguments. The first one is the message to display when emitting the warning; the second one enables the compiler to provide a Fix-It to replace the deprecated name with a new name."

Since it seems confusing even to you - maybe we should take this as a signal that our users would get just as confused and remove the reference to deprecated?
WDYT?

================
Comment at: clang/docs/SafeBuffers.rst:365-370
+The fixits emitted by the warning are correct modulo placeholders. Placeholders
+are the only reason why fixed code is allowed to fail to compile.
+Incorrectly resolving the placeholder is the only reason why fixed code
+will demonstrate incorrect runtime behavior compared to the original code.
+In an otherwise well-formed program it is always possible (and usually easy)
+to resolve the placeholder correctly.
----------------
aaron.ballman wrote:
> Because fixits can be applied automatically to an entire source file, we don't typically allow fix-its that can break code (or fixits where there are multiple choices for the user to pick from) unless they're attached to a *note* instead of a warning (or error).
> 
> Are you planning to follow that same pattern when placeholders are involved?
> 
> (For example, one potential use for automatic fix-its is to apply them automatically to source on commit before running precommit CI; similar to how folks sometimes run clang-format to automate "the source is different before/after the clang-format, so you should look into that".)
That's an extremely interesting question! Let me add our current thinking as a context for the discussion.

Our analysis will have its limits and will not be able to provide 100% complete and guaranteed correct code transformation in every case.

At the same time it sounds suboptimal both from our and our users perspective to not emit anything for such situations.
Imagine we have a code transformation that affects 20 lines of code and all we need is for the user to address a single function call argument. 
The thought of bailing, producing just a warning and having user to manually change all 20 lines of code sounds very unsatisfactory to me.
That's why we are thinking to use placeholders.

The next question is - can we not break the code when using placeholders?
By definition whenever we'd emit a placeholder our analysis can't provide a correct solution. That means that whatever way we'd make the code to compile can introduce a logic bug in the program. That is simply unacceptable given the overall goal of our effort.
That pretty much leaves us with only using placeholders that would break the build.

WDYT?

I wasn't aware of the precedent related to fixits attached to notes not having to lead to compilable code and we should consider that path.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136811/new/

https://reviews.llvm.org/D136811