[llvm] b1aece9 - LangRef: allocated objects can grow (#141338)

Wed Jul 23 01:36:09 PDT 2025

Author: Ralf Jung
Date: 2025-07-23T10:36:06+02:00
New Revision: b1aece90f32c0bb0685e1e79d6dc8e1a147bde37

URL: https://github.com/llvm/llvm-project/commit/b1aece90f32c0bb0685e1e79d6dc8e1a147bde37
DIFF: https://github.com/llvm/llvm-project/commit/b1aece90f32c0bb0685e1e79d6dc8e1a147bde37.diff

LOG: LangRef: allocated objects can grow (#141338)

This enables the (reasonably common) pattern of using `mmap` to reserve
but not actually map a wide range of pages, and then only adding in more
pages as memory is actually needed. Effectively, that region of memory
is one big allocated object for LLVM, but crucially, that allocated
object *changes its size*.

Having an allocated object grow seems entirely compatible with what LLVM
optimizations assume, *except* that when LLVM sees an `alloca` or
similar instruction, it will assume that a pointer that has been
`getelementptr inbounds` by more than the size of the allocated object
cannot alias that `alloca`. But for allocated objects that are created
e.g. by `mmap`, where LLVM does not know their size, this cannot happen
anyway.

The other main point to be concerned about is having a `getelementptr
inbounds` that is moved up across an operation that grows an allocated
object: this should be legal as `getelementptr` is freely reorderable.
We achieve that by saying that for allocated objects that change their
size, "inbounds" means "inbounds of their maximal size", not "inbounds
of their current size".

It would be nice to also allow shrinking allocations (e.g. by
`munmap`ing pages at the end), but that is more tricky. Consider an
example like this:
- load 4 bytes from `ptr`
- call some function
- load 1 byte from `ptr`

Right now, LLVM could argue that since `ptr` clearly has not been
deallocated, there must be at least 4 bytes of dereferenceable memory
behind `ptr` after the call. If allocations can shrink, this kind of
reasoning is no longer valid. I don't know if LLVM actually does
reasoning like that -- I think it should not, since I think it should be
possible to have allocations that shrink -- but to remain conservative I
am not proposing that as part of this patch.

Added: 
    

Modified: 
    llvm/docs/LangRef.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b99a96f031abd..bac13cc0424a6 100644

--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3356,6 +3356,19 @@ behavior is undefined:
 -  the size of all allocated objects must be non-negative and not exceed the
    largest signed integer that fits into the index type.
 
+Allocated objects that are created with operations recognized by LLVM (such as
+:ref:`alloca <i_alloca>`, heap allocation functions marked as such, and global
+variables) may *not* change their size. (``realloc``-style operations do not
+change the size of an existing allocated object; instead, they create a new
+allocated object. Even if the object is at the same location as the old one, old
+pointers cannot be used to access this new object.) However, allocated objects
+can also be created by means not recognized by LLVM, e.g. by directly calling
+``mmap``. Those allocated objects are allowed to grow to the right (i.e.,
+keeping the same base address, but increasing their size) while maintaining the
+validity of existing pointers, as long as they always satisfy the properties
+described above. Currently, allocated objects are not permitted to grow to the
+left or to shrink, nor can they have holes.
+
 .. _objectlifetime:
 
 Object Lifetime
@@ -11928,6 +11941,9 @@ if the ``getelementptr`` has any non-zero indices, the following rules apply:
    :ref:`based <pointeraliasing>` on. This means that it points into that
    allocated object, or to its end. Note that the object does not have to be
    live anymore; being in-bounds of a deallocated object is sufficient.
+   If the allocated object can grow, then the relevant size for being *in
+   bounds* is the maximal size the object could have while satisfying the
+   allocated object rules, not its current size.
  * During the successive addition of offsets to the address, the resulting
    pointer must remain *in bounds* of the allocated object at each step.