[PATCH] D136811: WIP: RFC: NFC: C++ Buffer Hardening user documentation.

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 2 10:46:48 PDT 2022


aaron.ballman added a comment.

Thank you for getting a start on this documentation, this adds a lot of clarity to what was being proposed! I have some questions, and some minor things to fix.



================
Comment at: clang/docs/SafeBuffers.rst:19
+guidelines, and it provides runtime mitigations for the situations when
+following guidelines isn’t sufficient.
+
----------------
For consistency with the rest of our docs, you should make this change throughout.


================
Comment at: clang/docs/SafeBuffers.rst:25
+    a potentially unsafe operation is performed in your code;
+  - Pragmas ``unsafe_buffer_usage`` and ``only_safe_buffers`` and allow you to
+    annotate sections of code as either already-hardened or not-to-be-hardened,
----------------



================
Comment at: clang/docs/SafeBuffers.rst:31
+    convert large amounts of old code to conform to the warning;
+  - Attribute ``[[unsafe_buffer_usage]]`` lets you annotate custom functions as
+    unsafe, while providing a safe alternative that can often be suggested by
----------------
Hmmmm that attribute name looks problematic (more below in that section).


================
Comment at: clang/docs/SafeBuffers.rst:47
+not tied to this programming model. The warning may be desired independently,
+even in plain C code, if you’re hard-bent on isolating, annotating and auditing
+all buffer-related code in your codebase. That said, some of these guidelines
----------------



================
Comment at: clang/docs/SafeBuffers.rst:58
+operations crash on out-of-bounds accesses at runtime. And in case of C, this
+may be the only way to achieve safety. The main problem with such solution lies
+in dealing with the fact that most pointers in the code don’t have bounds
----------------



================
Comment at: clang/docs/SafeBuffers.rst:63-64
+either in a system of attributes significantly more advanced than
+the current solution, or a significant change in runtime representation
+of all pointers.
+
----------------
It would be helpful to have some details here about why a user would pick these proposed features over use of a sanitizer like asan or hwasan, which also finds out of bounds accesses.


================
Comment at: clang/docs/SafeBuffers.rst:69-70
+containers such as ``std::span``, you can achieve bounds safety by
+*simply writing good modern C++ code*. This is what this solution is designed
+to help your codebase with.
+
----------------
This makes it sound like the proposal doesn't cover C because there is no version of `std::span` there.


================
Comment at: clang/docs/SafeBuffers.rst:76
+
+Let’s start with how your code would need to look like to comply with
+the programming model.
----------------



================
Comment at: clang/docs/SafeBuffers.rst:90
+   manually ``delete[]`` it when appropriate.
+   (TODO: What about things like ``std::shared_ptr<T []>``)?
+2. When you deal with function or class interfaces that accept or return
----------------
We should mention as part of the model description that pointer-chasing structures like linked lists, trees, etc are not covered by the model.

(That said, I suspect the *implementation* of such things might be covered. e.g., a tree with `Node *Children[]`)


================
Comment at: clang/docs/SafeBuffers.rst:95
+   ``std::span``, which helps you deal with different container classes
+   uniformly.
+3. If you deal with interfaces you cannot change, such as an interface of
----------------
And what should C users do?


================
Comment at: clang/docs/SafeBuffers.rst:98
+   a third-party library you’re using, and these interfaces require you to pass
+   buffers as raw pointers, make sure you “containerize” these buffers
+   *as soon as possible*. For example, if the library returns a raw buffer
----------------
Similar to the single quote situation.


================
Comment at: clang/docs/SafeBuffers.rst:114
+   insufficient without such hardening.
+   (TODO: Will automatic fixits be able to suggest custom containers or views?)
+   (TODO: Explain how to implement such checks in a custom container?)
----------------
I would be surprised if we can find a heuristic that we'd feel confident is correct for most situations. e.g., `Foo buffer[10];` might be a flat array... or it might be a ring buffer without benefit of a wrapper class... or it may be a sparse matrix... and so on.


================
Comment at: clang/docs/SafeBuffers.rst:115
+   (TODO: Will automatic fixits be able to suggest custom containers or views?)
+   (TODO: Explain how to implement such checks in a custom container?)
+
----------------
I think this would be a good idea, especially if you can use an example of a more involved (but still common) container. Like, how does someone harden a ring buffer?


================
Comment at: clang/docs/SafeBuffers.rst:128
+  ``+``, ``-``, ``+=``, ``-=``,
+      - unless that number is a compile time constant ``0``;
+      - subtraction between two pointers is also fine;
----------------
One case I think this is going to trip up is: `this + 1` (which is not uncommon in practice: https://sourcegraph.com/search?q=context:global+-file:.*test.*+file:.*%5C.cpp+this+%2B+1&patternType=standard). How do users fix up that pointer arithmetic and do they ever get any safety or security benefits from it? (Note, this doesn't apply to `+=`, just `+`.)


================
Comment at: clang/docs/SafeBuffers.rst:132
+    attribute ``[[unsafe_buffer_usage]]``,
+      - unless the pointer is a compile time constant ``0`` or ``nullptr``;
+      - a number of C/C++ standard library buffer manipulation functions
----------------
`NULL` as well, I presume? (This matters for C where `NULL` often expands to `(void *)0`.)


================
Comment at: clang/docs/SafeBuffers.rst:133
+      - unless the pointer is a compile time constant ``0`` or ``nullptr``;
+      - a number of C/C++ standard library buffer manipulation functions
+        (such as std::memcpy() or std::next()) are hardcoded to act
----------------
Eventually we should extend this to have an exhaustive list.


================
Comment at: clang/docs/SafeBuffers.rst:138
+The warning doesn’t warn on single pointer use in general, such as
+dereference with operations like * and -> or [0]. If such operation causes
+a buffer overflow, there’s probably another unsafe operation nearby,
----------------



================
Comment at: clang/docs/SafeBuffers.rst:139-140
+dereference with operations like * and -> or [0]. If such operation causes
+a buffer overflow, there’s probably another unsafe operation nearby,
+that the warning does warn about. Pointer-based data structures
+such as linked lists or trees are allowed as they don't typically cause
----------------



================
Comment at: clang/docs/SafeBuffers.rst:173
+  #pragma unsafe_buffer_usage begin
+
+Such pragmas not only enable incremental adoption with much smaller granularity,
----------------
Have you considered allowing a short-hand form of the pragmas that are scoped to the current block scope (or file scope if applied at global scope)? e.g.,
```
void func() {
  {
    #pragma only_safe_buffers
    int stuff[10] = {};
    int x = stuff[1]; // Warning: you're a bad person who should feel bad
  }
  int array[10] = {};
  int y = array[1]; // this_is_fine.png
}
```


================
Comment at: clang/docs/SafeBuffers.rst:177
+is undesirable for a section of code (such as tight loop in performance-critical
+code, or implementation of a custom container). In such cases sections of code
+with unsafe buffer usage deserve being explicitly marked and easily auditable
----------------



================
Comment at: clang/docs/SafeBuffers.rst:181-188
+Even though similar functionality can be achieved with generic
+``#pragma clang diagnostic``, a custom pragma is preferable because
+the generic solution’s push/pop mechanism may be confusing. In particular,
+it’s not immediately obvious from the code which mode gets enabled after
+a “pop”, which makes it harder to reason about safety both for you and for
+the purposes of security audit. It is also generally valuable for such
+annotation to describe the property of the annotated code,
----------------
> a custom pragma is preferable because the generic solution’s push/pop mechanism may be confusing.

Er, I'm not certain how much I agree with that. push/pop semantics on this have been around for a *long* time. I do agree that the user can write code such that another user may struggle to understand what the state of the system is after a pop operation, but that shouldn't be a deciding factor for whether we add a new pragma to do the same thing as another pragma. I think I'd rewrite this as:
```
Similar functionality can be achieved with ``#pragma clang diagnostic``, but such use is discouraged because it
reduces code readability due to being an imperative action rather than describing a property of the annotated
code. That reduced readability also adds complexity to security code audits. Use of one of the dedicated
pragmas is strongly recommended.
```
or something along these lines. WDYT?


================
Comment at: clang/docs/SafeBuffers.rst:194
+
+The attribute ``[[unsafe_buffer_usage]]`` should be placed on functions
+that need to be avoided as they may cause buffer overflows. It is designed to
----------------
That's stealing a name from the C and C++ committees because it's not behind a vendor attribute namespace. I think it should be `[[clang::unsafe_buffer_usage]]` or `unsafe_buffer_usage` (with no specific syntax) instead.



================
Comment at: clang/docs/SafeBuffers.rst:213
+
+The attribute is NOT warranted when the function has runtime protection against
+overflows, assuming hardened libc++, assuming that containers constructed
----------------
One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used".  But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example).

I think we should have more details on why the STL used at runtime doesn't matter, or if it still really does matter, we may want to reconsider the advice we give.

Also, we don't give similar advice for use of the pragmas. Should we? (Maybe split the advice as to when to use markings in general out into its own section and reference it from both the pragma and attribute sections?)


================
Comment at: clang/docs/SafeBuffers.rst:214-215
+The attribute is NOT warranted when the function has runtime protection against
+overflows, assuming hardened libc++, assuming that containers constructed
+outside the function are well-formed. For example, function ``bar()`` below
+doesn’t need an attribute, because assuming buf is well-formed (has size that
----------------



================
Comment at: clang/docs/SafeBuffers.rst:216
+outside the function are well-formed. For example, function ``bar()`` below
+doesn’t need an attribute, because assuming buf is well-formed (has size that
+fits the original buffer it refers to), hardened libc++ protects this function
----------------



================
Comment at: clang/docs/SafeBuffers.rst:254
+
+Attribute ``[[unsafe_buffer_usage]]`` is similar to attribute [[deprecated]]
+but it has important differences:
----------------



================
Comment at: clang/docs/SafeBuffers.rst:257-259
+* Use of a function annotated by such attribute causes ``-Wunsafe-buffer-usage``
+  warning to appear, instead of ``-Wdeprecated``, so they can be
+  enabled/disabled independently as all four combinations make sense;
----------------
I don't think I understand what you're trying to say here.


================
Comment at: clang/docs/SafeBuffers.rst:260-263
+* The “replacement” parameter of ``[[deprecated]]``, which allows for automatic
+  fixits when the function has a drop-in replacement, becomes significantly more
+  powerful and flexible in ``[[unsafe_buffer_usage]]`` where it will allow
+  non-trivial automatic fixes.
----------------
The deprecated attribute does not have a replacement argument, it has an argument for some kind of message to be printed when the deprecated interface is used. So I'm not certain what this is telling me either.


================
Comment at: clang/docs/SafeBuffers.rst:265
+
+(TODO: Explain parameters of the attribute, how they aid automatic fixits)
+
----------------
FWIW, I think a significant amount of these attribute docs will move into AttrDocs.td (which should explain all the gory details about the attribute) and we can leave the high-level description here and link to the attribute documentation (rather than duplicate information).


================
Comment at: clang/docs/SafeBuffers.rst:271
+Every time your code preforms an unsafe operation that causes a
+``-Wunsafe-buffer-usage warning`` to appear, the warning may be accompanied
+by an automatic fix that changes types of one or more variables associated
----------------



================
Comment at: clang/docs/SafeBuffers.rst:273
+by an automatic fix that changes types of one or more variables associated
+with the unsafe operation from raw pointer or array type to safe container type.
+
----------------



================
Comment at: clang/docs/SafeBuffers.rst:322
+In spirit of our guidelines, the automatic fixit would prefer this function
+to accept a span instead of raw buffer, so that the span didn't need to be
+unwrapped. Of course, such change alone would break both source compatibility
----------------



================
Comment at: clang/docs/SafeBuffers.rst:365-370
+The fixits emitted by the warning are correct modulo placeholders. Placeholders
+are the only reason why fixed code is allowed to fail to compile.
+Incorrectly resolving the placeholder is the only reason why fixed code
+will demonstrate incorrect runtime behavior compared to the original code.
+In an otherwise well-formed program it is always possible (and usually easy)
+to resolve the placeholder correctly.
----------------
Because fixits can be applied automatically to an entire source file, we don't typically allow fix-its that can break code (or fixits where there are multiple choices for the user to pick from) unless they're attached to a *note* instead of a warning (or error).

Are you planning to follow that same pattern when placeholders are involved?

(For example, one potential use for automatic fix-its is to apply them automatically to source on commit before running precommit CI; similar to how folks sometimes run clang-format to automate "the source is different before/after the clang-format, so you should look into that".)


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136811/new/

https://reviews.llvm.org/D136811



More information about the cfe-commits mailing list