[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

Mon Jan 22 03:54:10 PST 2024

siddhesh wrote:

> Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.
> 

It's not just a union field, it could literally point to one of many potential objects, e.g.:

```
  struct A *obj1 = ...;
  char *obj2 = ...;
  int *obj3 = ...;

  void *ptr = cond1 == satisfied ? (cond2 == satisfied ? obj1 : obj2) : obj3;

  return __builtin_object_size (ptr, 1);
```
Here, `__builtin_object_size` will return the maximum estimate it can compute among the three potential objects that `ptr` could point to.  `__builtin_dynamic_object_size` is special in that it can handle this case, i.e. return an expression that will compute the size of the object returned through that condition but I retained the 'estimate' wording for the specification in that documentation because of reasons that have already been touched upon in this discussion.

In general, we want to allow returning a conservative estimate whenever possible and not fail. This is a deliberate design decision because the key usage of `__builtin_object_size` (and consequently, `__builtin_dynamic_object_size`) is exploit mitigation and we're much better off providing a conservative return value than completely bailing out. In the above case, returning the whole object size where the subobject size is unavailable is better than bailing out because while it cannot do the ideal thing of protecting the precise bounds, it at least bounds any potential overflows to the whole object., thus limiting what can be done in an exploit that uses this overflow.

I obviously have no comments on the patch itself, but hopefully this gives enough context to interpret the GCC implementation and/or documentation. It would be ideal to always return a precise object size expression, but falling back to an estimate is better than simply bailing out.

https://github.com/llvm/llvm-project/pull/78526