[llvm] [clang] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

Fri Jan 19 13:59:30 PST 2024

zygoloid wrote:

> > But mode 0 and 1 are in general asking for an upper bound on the accessible bytes (that is, an N so any.access beyond N bytes is definitely out of bounds), so it seems to me that returning -1 is strictly worse than returning 48.
> 
> It's the second bit that controls whether it's an upper bound or lower bound that's returned, not the least significant bit.

Right, that's what I said: modes 0 and 1 ask for an upper bound. (Modes 2 and 3 ask for a lower bound.)

> We're trying to implement a GNU builtin, and the only defined semantics we have to go on are GNU's documentation. I can't see how we can deviate from their documentation unless it's to say "we can't determine this value" and so return `-1` instead of an answer that might be wildly wrong and potentially cause a memory leak of some sort.

We're not really deviating from the documented rules. By the time we get to LLVM IR lowering of the builtin, we have lost the precise frontend information. But we know the pointer might point into the complete object (where it would be able to write to 48 bytes) or to some subobject of that complete object (where it would be able to write to 48 bytes or less). Therefore the correct answer to return is 48.

In the same way, it would be correct to return 48 here:

```
char a[48];
char b[40];
bool always_false = false; // happens to never be modified
int n = __builtin_dynamic_object_size(always_false ? a  : b, 1);
```

... though we miscompile this and return 40 regardless of whether we're in upper bound or lower bound mode. :-(

> In my made-up example, if we said, "Yes you can write up to 48 bytes into `p->failed_devs[argc]`, then a user may overwrite the two fields after `field_devs`. If we return `-1`, they'll have to take the "slow", but ideally more secure, route.

No, that is not what we are saying when we return 48. We're saying "a write past 48 bytes is definitely bad". If you want a lower bound on the number of bytes that you can write, then you need to use mode 2 or 3 instead.

https://github.com/llvm/llvm-project/pull/78526