[llvm] [LangRef] Document accessing memory outside of object is UB. (PR #128429)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Sun Feb 23 10:18:58 PST 2025
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/128429
Currently the LangRef isn't very clear on whether accessing objects out of bounds is allowed or not. Clarify that accessing memory outside an allocated object, including accesses partially outside, are undefined behavior.
This removes the sentence regarding loading values up to the `!align` does not trap. My reading of that sentence implies that alignment would imply dereferenceability, but it is not clear if that's only intended for the !align metadata.
If alignment in general implies dereferenceability, I will update the PR to clarify, although that may cause issues with various security-related HW/SW extensions.
(Includes the changes from https://github.com/llvm/llvm-project/pull/127892, which should go in first)
>From 7019302c37430b61947aece4a60f0af60660fdf7 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Wed, 19 Feb 2025 21:25:48 +0100
Subject: [PATCH 1/4] [LangRef] Clarify that the pointer after an objet must be
valid.
In some places, we rely on the assumption that the pointer after the
object must also be valid and not overflow, but it does not seem to be
spelled out clearly in LangRef, unless I missed a reference.
The GetElementPtr section mentions that the maximum object size is half
the pointer index type space, but then the pointer past the object may
wrap. Clarify that the pointer after the object must also be valid.
This should match Alive2's semantics: https://alive2.llvm.org/ce/z/Dk8QFL
(https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288)
---
llvm/docs/LangRef.rst | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 5356aee87b35f..c15cc6099643d 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11721,8 +11721,9 @@ As a corollary, the only pointer in bounds of the null pointer in the default
address space is the null pointer itself.
These rules are based on the assumption that no allocated object may cross
-the unsigned address space boundary, and no allocated object may be larger
-than half the pointer index type space.
+the unsigned address space boundary, the pointer after the object must be valid,
+and no allocated object may be larger than half the pointer index type space
+- 1.
If ``inbounds`` is present on a ``getelementptr`` instruction, the ``nusw``
attribute will be automatically set as well. For this reason, the ``nusw``
>From 677735fb8e78071a8d1462dcf97251df3c350e41 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Thu, 20 Feb 2025 17:34:25 +0100
Subject: [PATCH 2/4] !Fixup add allocated object section.
---
llvm/docs/LangRef.rst | 100 +++++++++++++++++++++++-------------------
1 file changed, 55 insertions(+), 45 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index c15cc6099643d..bf4f38b42746d 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -729,8 +729,8 @@ units that do not include the definition.
As SSA values, global variables define pointer values that are in scope
(i.e. they dominate) all basic blocks in the program. Global variables
always define a pointer to their "content" type because they describe a
-region of memory, and all memory objects in LLVM are accessed through
-pointers.
+region of memory, and all :ref:`allocated object<allocatedobjects>` in LLVM are
+accessed through pointers.
Global variables can be marked with ``unnamed_addr`` which indicates
that the address is not significant, only the content. Constants marked
@@ -2169,7 +2169,8 @@ For example:
A ``nofree`` function is explicitly allowed to free memory which it
allocated or (if not ``nosync``) arrange for another thread to free
memory on it's behalf. As a result, perhaps surprisingly, a ``nofree``
- function can return a pointer to a previously deallocated memory object.
+ function can return a pointer to a previously deallocated
+ :ref:`allocated object<allocatedobjects>`.
``noimplicitfloat``
Disallows implicit floating-point code. This inhibits optimizations that
use floating-point code and floating-point registers for operations that are
@@ -3280,31 +3281,41 @@ This information is passed along to the backend so that it generates
code for the proper architecture. It's possible to override this on the
command line with the ``-mtriple`` command line option.
+
+.. _allocatedobjects:
+
+Allocated Objects
+-----------------
+
+An allocated object, memory object, or simply object, is a region of a memory
+space that is reserved by a memory allocation such as :ref:`alloca <i_alloca>`,
+heap allocation calls, and global variable definitions. Once it is allocated,
+the bytes stored in the region can only be read or written through a pointer
+that is :ref:`based on <pointeraliasing>` the allocation value. If a pointer
+that is not based on the object tries to read or write to the object, it is
+undefined behavior.
+
+The following properties hold for all allocated objects:
+
+- no allocated object may cross the unsigned address space boundary (including
+ the pointer after the end of the object),
+- the size of all allocated objects must be non-negative and smaller than
+ the largest signed integer that fits into the index type,
+
.. _objectlifetime:
Object Lifetime
----------------------
-A memory object, or simply object, is a region of a memory space that is
-reserved by a memory allocation such as :ref:`alloca <i_alloca>`, heap
-allocation calls, and global variable definitions.
-Once it is allocated, the bytes stored in the region can only be read or written
-through a pointer that is :ref:`based on <pointeraliasing>` the allocation
-value.
-If a pointer that is not based on the object tries to read or write to the
-object, it is undefined behavior.
-
-A lifetime of a memory object is a property that decides its accessibility.
-Unless stated otherwise, a memory object is alive since its allocation, and
-dead after its deallocation.
-It is undefined behavior to access a memory object that isn't alive, but
-operations that don't dereference it such as
-:ref:`getelementptr <i_getelementptr>`, :ref:`ptrtoint <i_ptrtoint>` and
-:ref:`icmp <i_icmp>` return a valid result.
-This explains code motion of these instructions across operations that
-impact the object's lifetime.
-A stack object's lifetime can be explicitly specified using
-:ref:`llvm.lifetime.start <int_lifestart>` and
+A lifetime of an :ref:`allocated object<allocatedobjects>` is a property that
+decides its accessibility. Unless stated otherwise, an allocated object is alive
+since its allocation, and dead after its deallocation. It is undefined behavior
+to access an allocated object that isn't alive, but operations that don't
+dereference it such as :ref:`getelementptr <i_getelementptr>`,
+:ref:`ptrtoint <i_ptrtoint>` and :ref:`icmp <i_icmp>` return a valid result.
+This explains code motion of these instructions across operations that impact
+the object's lifetime. A stack object's lifetime can be explicitly specified
+using :ref:`llvm.lifetime.start <int_lifestart>` and
:ref:`llvm.lifetime.end <int_lifeend>` intrinsic function calls.
.. _pointeraliasing:
@@ -4484,11 +4495,10 @@ Here are some examples of multidimensional arrays:
There is no restriction on indexing beyond the end of the array implied
by a static type (though there are restrictions on indexing beyond the
-bounds of an allocated object in some cases). This means that
-single-dimension 'variable sized array' addressing can be implemented in
-LLVM with a zero length array type. An implementation of 'pascal style
-arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for
-example.
+bounds of an :ref:`allocated object<allocatedobjects>` in some cases). This
+means that single-dimension 'variable sized array' addressing can be implemented
+in LLVM with a zero length array type. An implementation of 'pascal style
+arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for example.
.. _t_struct:
@@ -11720,10 +11730,8 @@ Note that ``getelementptr`` with all-zero indices is always considered to be
As a corollary, the only pointer in bounds of the null pointer in the default
address space is the null pointer itself.
-These rules are based on the assumption that no allocated object may cross
-the unsigned address space boundary, the pointer after the object must be valid,
-and no allocated object may be larger than half the pointer index type space
-- 1.
+These rules are based on the assumption for
+:ref:`allocated object<allocatedobjects>`.
If ``inbounds`` is present on a ``getelementptr`` instruction, the ``nusw``
attribute will be automatically set as well. For this reason, the ``nusw``
@@ -26319,7 +26327,7 @@ Memory Use Markers
------------------
This class of intrinsics provides information about the
-:ref:`lifetime of memory objects <objectlifetime>` and ranges where variables
+:ref:`lifetime of allocated objects <objectlifetime>` and ranges where variables
are immutable.
.. _int_lifestart:
@@ -26387,8 +26395,8 @@ Syntax:
Overview:
"""""""""
-The '``llvm.lifetime.end``' intrinsic specifies the end of a memory object's
-lifetime.
+The '``llvm.lifetime.end``' intrinsic specifies the end of a
+:ref:`allocated object's lifetime<objectlifetime>`.
Arguments:
""""""""""
@@ -26418,7 +26426,8 @@ with ``poison``.
Syntax:
"""""""
-This is an overloaded intrinsic. The memory object can belong to any address space.
+This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
+can belong to any address space.
::
@@ -26428,7 +26437,7 @@ Overview:
"""""""""
The '``llvm.invariant.start``' intrinsic specifies that the contents of
-a memory object will not change.
+an :ref:`allocated object<allocatedobjects>` will not change.
Arguments:
""""""""""
@@ -26449,7 +26458,8 @@ unchanging.
Syntax:
"""""""
-This is an overloaded intrinsic. The memory object can belong to any address space.
+This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
+can belong to any address space.
::
@@ -26459,7 +26469,7 @@ Overview:
"""""""""
The '``llvm.invariant.end``' intrinsic specifies that the contents of a
-memory object are mutable.
+:ref:`allocated object<allocatedobjects>` are mutable.
Arguments:
""""""""""
@@ -26479,9 +26489,9 @@ This intrinsic indicates that the memory is mutable again.
Syntax:
"""""""
-This is an overloaded intrinsic. The memory object can belong to any address
-space. The returned pointer must belong to the same address space as the
-argument.
+This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
+can belong to any address space. The returned pointer must belong to the same
+address space as the argument.
::
@@ -26515,9 +26525,9 @@ It does not read any accessible memory and the execution can be speculated.
Syntax:
"""""""
-This is an overloaded intrinsic. The memory object can belong to any address
-space. The returned pointer must belong to the same address space as the
-argument.
+This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
+can belong to any address space. The returned pointer must belong to the same
+address space as the argument.
::
>From 66b93da5421e8653025bb38d4e39ddde7d161edf Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Thu, 20 Feb 2025 19:49:38 +0100
Subject: [PATCH 3/4] !fixup adjust size wording
---
llvm/docs/LangRef.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index bf4f38b42746d..75cea29474589 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3299,8 +3299,8 @@ The following properties hold for all allocated objects:
- no allocated object may cross the unsigned address space boundary (including
the pointer after the end of the object),
-- the size of all allocated objects must be non-negative and smaller than
- the largest signed integer that fits into the index type,
+- the size of all allocated objects must be non-negative and not exceed the
+ largest signed integer that fits into the index type.
.. _objectlifetime:
>From 4fa7b55f0586ee03bf11ffe5a39e3493c1c6dc0a Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Sun, 23 Feb 2025 18:04:05 +0000
Subject: [PATCH 4/4] [LangRef] Document accessing memory outside of object is
UB.
---
llvm/docs/LangRef.rst | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 75cea29474589..4a49b071072db 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3293,7 +3293,9 @@ heap allocation calls, and global variable definitions. Once it is allocated,
the bytes stored in the region can only be read or written through a pointer
that is :ref:`based on <pointeraliasing>` the allocation value. If a pointer
that is not based on the object tries to read or write to the object, it is
-undefined behavior.
+undefined behavior. Trying to read or write memory outside of an allocated
+object, including accesses partially outside an allocated object, is undefined
+behavior.
The following properties hold for all allocated objects:
@@ -11108,12 +11110,9 @@ operation (that is, the alignment of the memory address). It is the
responsibility of the code emitter to ensure that the alignment information is
correct. Overestimating the alignment results in undefined behavior.
Underestimating the alignment may produce less efficient code. An alignment of
-1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+1 is always safe. The maximum possible alignment is ``1 << 32``. Access of the
+high bytes can interfere with debugging tools, so should not be accessed if the
+function has the ``sanitize_thread`` or ``sanitize_address`` attributes.
The alignment is only optional when parsing textual IR; for in-memory IR, it is
always present. An omitted ``align`` argument means that the operation has the
More information about the llvm-commits
mailing list