[llvm] [LangRef] Specify semantics for non-byte-sized loads and stores (PR #180739)

Tue Feb 10 05:57:18 PST 2026

https://github.com/nikic created https://github.com/llvm/llvm-project/pull/180739

LangRef currently specifies that non-byte-sized store store an unspecified bit pattern in the "padding", and that performing a subsequent load with a different bitwidth is "undefined". This means that storing an i1 value and then loading it as i8, the result is "undefined" (whatever that is supposed to mean). These semantics are quite unusual as they depend on the exact type a memory location has been accessed with previously. I believe that frontends often do not respect these semantics.

This PR proposes to instead specify that non-byte-sized loads effectively act like like a byte sized `load` followed by `trunc nuw`, and non-byte-sized stores act like a `zext` followed by a byte-sized store. To the best of my knowledge, this matches the legalization behavior of SDAG.

This does restrict possible codegen choices (e.g. a target couldn't define these as `sext`/`trunc nsw` instead anymore), but it does not appear that there is any interest in that in practice, given that SDAG does not support it to this day.

>From f7985312bcc42bc83fe493c20a6a8fc0198c5daf Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Tue, 10 Feb 2026 14:42:49 +0100
Subject: [PATCH] [LangRef] Specify semantics for non-byte-sized loads and
 stores

LangRef currently specifies that non-byte-sized store store an
unspecified bit pattern in the "padding", and that performing a
subsequent load with a different bitwidth is "undefined". This
means that storing an i1 value and then loading it as i8, the
result is "undefined" (whatever that is supposed to mean). These
semantics are quite unusual as they depend on the exact type a
memory location has been accessed with previously. I believe that
frontends often do not respect these semantics.

This PR proposes to instead specify that non-byte-sized loads
effectively act like like a byte sized `load` followed by
`trunc nuw`, and non-byte-sized stores act like a `zext` followed
by a byte-sized store. To the best of my knowledge, this matches
the legalization behavior of SDAG.

This does restrict possible codegen choices (e.g. a target couldn't
define these as `sext`/`trunc nsw` instead anymore), but it does
not appear that there is any interest in that in practice, given
that SDAG does not support it to this day.
---
 llvm/docs/LangRef.rst | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 28edd439b6900..5784c8e768bb5 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11735,8 +11735,11 @@ is of scalar type then the number of bytes read does not exceed the
 minimum number of bytes needed to hold all bits of the type. For
 example, loading an ``i24`` reads at most three bytes. When loading a
 value of a type like ``i20`` with a size that is not an integral number
-of bytes, the result is undefined if the value was not originally
-written using a store of the same type.
+of bytes, the load will be performed on the next larger multiple of the byte
+size (here ``i24``) and truncated. If any of the truncated bits are non-zero,
+the result is a poison value. As such, a non-byte-sized load behaves like a
+byte-sized load followed by a ``trunc nuw`` operation.
+
 If the value being loaded is of aggregate type, the bytes that correspond to
 padding may be accessed but are ignored, because it is impossible to observe
 padding from the loaded aggregate value.
@@ -11829,8 +11832,10 @@ of scalar type then the number of bytes written does not exceed the
 minimum number of bytes needed to hold all bits of the type. For
 example, storing an ``i24`` writes at most three bytes. When writing a
 value of a type like ``i20`` with a size that is not an integral number
-of bytes, it is unspecified what happens to the extra bits that do not
-belong to the type, but they will typically be overwritten.
+of bytes, the value will be zero extended to the next larger multiple of the
+byte size (here ``i24``) and then stored. As such, a non-byte-sized store
+behaves like a ``zext`` followed by a byte-sized store.
+
 If ``<value>`` is of aggregate type, padding is filled with
 :ref:`undef <undefvalues>`.
 If ``<pointer>`` is not a well-defined value, the behavior is undefined.