[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

Tue Mar 12 14:57:19 PDT 2024

================
@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 """"""""""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+    ``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+    the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+    surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.
----------------
zygoloid wrote:

> First you tell me that I can't use LLVM's IR to determine the subobject, even though I did and it worked just fine

It definitely doesn't work fine, but sure, in some simple test cases it does appear to work.

> now you're saying that I can't use the front-end's knowledge about the structure.

I'm not. I'm saying that you can't assume that the LLVM IR idea of the complete object lines up with some enclosing subobject you've found in the frontend. You're still trying to use the IR notion of complete object and subobjects, and that still doesn't work for the same reasons as before.

You can absolutely compute where the subobject is in the frontend, and pass that information onto LLVM. But you'll need to pass it on in a way that is meaningful to LLVM. For example, if you pass a pointer and size to the intrinsic describing the complete range of addresses covering the subobject, that should be fine. But an offset is not useful if the frontend and middle-end can't agree on what it's an offset relative to.

> In your example, you've explicitly lied to the compiler about the types being passed in.

Sorry, that was a typo in my example: the call in `h()` should be `f(&b.a)`.

> I have no clue what you mean by pass a "pointer to `p->n` to the intrinsic" as that's already the first argument in the intrinsic.

I mean, pass a pointer to the start of the subobject. For example, for `p->arr[i]`, you'd pass in `&p->arr`.

https://github.com/llvm/llvm-project/pull/83204