[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

Richard Smith via cfe-commits cfe-commits at lists.llvm.org
Fri Jan 19 14:32:07 PST 2024


zygoloid wrote:

> Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.

Per @nikic's example, it seems reasonably clear to me that GCC's intended semantics are to get either the best upper bound or the best lower bound that the compiler is able to compute. I mean sure, we can ask, and it does no harm to do so, but it is not reasonable to expect that a quantity like this coming from the vagaries of GCC's implementation details will be exactly the same for all examples when computed by a different implementation. (It's not even the same across optimization levels in GCC.)

> I know that we lose precise struct information going to LLVM IR. If that's what's needed here, there are ways to pass this information along. We retain this information via DWARF. We could use similar metadata for this instance. Would that be acceptable?

In theory, yes, we could preserve enough information in metadata to compute this in the middle-end. But we would need to emit a *lot* of metadata, just on the off-chance that after optimization we happen to have a builtin_object_size query that points to each object that we emit, so in practice I don't think there's any chance we can do this. We can't use DWARF, because it won't necessarily be available (and typically won't be available in the interesting case where we find the object size only after optimization), and in any case, the presence or absence of DWARF isn't supposed to affect the executable code. Also, we'd need to annotate things that simply don't exist at all in the LLVM IR:

```
typedef struct X { int a, b; } X;
int f(void) {
  X *p = malloc(sizeof(X));
  int *q = &p->a;
  return __builtin_object_size(q, 1);
}
```

Here, `f` ideally would return 4, but at the LLVM IR level, `p` and `q` are identical values and the `&p->a` operation is a no-op. In cases like this, the best we can realistically do is to return 8.

https://github.com/llvm/llvm-project/pull/78526


More information about the cfe-commits mailing list