[PATCH] D95238: [LangRef] Update memory access ops to raise UB if ptrs are not well defined

Juneyoung Lee via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 22 07:29:06 PST 2021


aqjune created this revision.
aqjune added reviewers: reames, jdoerfert, nikic, xbolva00, efriedma.
Herald added a subscriber: JDevlieghere.
aqjune requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

In the past, it was stated in D87994 <https://reviews.llvm.org/D87994> that it is allowed to dereference a pointer that is partially undefined
if all of its possible representations fit into a dereferenceable range.
The motivation of the direction was to make a range analysis helpful for assuring dereferenceability.
Even if a range analysis concludes that its offset is within bounds, the offset could still be partially undefined; to utilize the range analysis, this relaxation was necessary.
https://groups.google.com/g/llvm-dev/c/2Qk4fOHUoAE/m/KcvYMEgOAgAJ has more context about this.

However, this is currently blocking another optimization, which is annotating the noundef attribute for library functions' arguments. D95122 <https://reviews.llvm.org/D95122> is the patch.
Currently, there are quite a few library functions which cannot have noundef attached to its pointer argument because it can be transformed from load/store.
For example, MemCpyOpt can convert stores into memset:

  store p, i32 0
  store (p+1), i32 0 // Since currently it is allowed for store to have partially undefined pointer..
  ->
  memset(p, 0, 8)    // memset cannot guarantee that its ptr argument is noundef.

A bigger problem is that this makes unclear which library functions are allowed to have 'noundef' and which functions aren't (e.g., strlen).
This makes annotating noundef almost impossible for this kind of functions.

This patch proposes that all memory operations should have well-defined pointers.
For memset/memcpy, it is semantically equivalent to running a loop until the size is met (and branching on undef is UB), so the size is also updated to be well-defined.

Strictly speaking, this again violates the implication of dereferenceability from range analysis result.
However, I think this is okay for the following reasons:

1. It seems the existing analyses in the LLVM main repo does not have conflicting implementation with the new proposal.

`isDereferenceableAndAlignedPointer` works only when the GEP offset is constant, and `isDereferenceableAndAlignedInLoop` is also fine.

2. A possible miscompilation happens only when the source has a pointer with a *partially* undefined offset (it's okay with poison because there is no 'partially poison' value).

But, at least I'm not aware of a language using LLVM as backend that has a well-defined program while allowing partially undefined pointers.
There might be such a language that I'm not aware of, but improving the performance of the mainstream languages like C and Rust is more important IMHO.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D95238

Files:
  llvm/docs/LangRef.rst


Index: llvm/docs/LangRef.rst
===================================================================
--- llvm/docs/LangRef.rst
+++ llvm/docs/LangRef.rst
@@ -9434,12 +9434,7 @@
 If the value being loaded is of aggregate type, the bytes that correspond to
 padding may be accessed but are ignored, because it is impossible to observe
 padding from the loaded aggregate value.
-
-If the pointer is not a well-defined value, all of its possible representations
-should be dereferenceable. For example, loading a byte from a pointer to an
-array of type ``[16 x i8]`` with offset ``undef & 31`` is undefined behavior.
-Loading a byte at offset ``undef & 15`` nondeterministically reads one of the
-bytes.
+If ``<pointer>`` is not a well-defined value, the behavior is undefined.
 
 Examples:
 """""""""
@@ -9533,12 +9528,7 @@
 belong to the type, but they will typically be overwritten.
 If ``<value>`` is of aggregate type, padding is filled with
 :ref:`undef <undefvalues>`.
-
-If ``<pointer>`` is not a well-defined value, all of its possible
-representations should be dereferenceable. For example, storing a byte to a
-pointer to an array of type ``[16 x i8]`` with offset ``undef & 31`` is
-undefined behavior. Storing a byte to an offset ``undef & 15``
-nondeterministically stores to one of offsets from 0 to 15.
+If ``<pointer>`` is not a well-defined value, the behavior is undefined.
 
 Example:
 """"""""
@@ -12703,11 +12693,11 @@
 to be aligned to some boundary, this can be specified as an attribute on the
 argument.
 
-If "len" is 0, the pointers may be NULL, dangling, ``undef``, or ``poison``
+If ``<len>`` is 0, the pointers may be NULL, dangling, ``undef``, or ``poison``
 pointers. However, they must still be appropriately aligned.
-If "len" isn't a well-defined value, all of its possible representations should
-make the behavior of this ``llvm.memcpy`` defined, otherwise the behavior is
-undefined.
+If ``<len>`` is not a well-defined value, the behavior is undefined.
+If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
+otherwise the behavior is undefined.
 
 .. _int_memcpy_inline:
 
@@ -12762,11 +12752,9 @@
 overlap. It copies "len" bytes of memory over. If the argument is known
 to be aligned to some boundary, this can be specified as an attribute on
 the argument.
-
-If "len" is 0, the pointers may be NULL, dangling, ``undef``, or ``poison``
-pointers. However, they must still be appropriately aligned.
-
-The generated code is guaranteed not to call any external functions.
+The behavior of '``llvm.memcpy.inline.*``' is equivalent to the behavior of
+'``llvm.memcpy.*``', but the generated code is guaranteed not to call any
+external functions.
 
 .. _int_memmove:
 
@@ -12823,11 +12811,11 @@
 aligned to some boundary, this can be specified as an attribute on
 the argument.
 
-If "len" is 0, the pointers may be NULL, dangling, ``undef``, or ``poison``
+If ``<len>`` is 0, the pointers may be NULL, dangling, ``undef``, or ``poison``
 pointers. However, they must still be appropriately aligned.
-If "len" isn't a well-defined value, all of its possible representations should
-make the behavior of this ``llvm.memmove`` defined, otherwise the behavior is
-undefined.
+If ``<len>`` is not a well-defined value, the behavior is undefined.
+If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
+otherwise the behavior is undefined.
 
 .. _int_memset:
 
@@ -12881,11 +12869,11 @@
 aligned to some boundary, this can be specified as an attribute on
 the argument.
 
-If "len" is 0, the pointer may be NULL, dangling, ``undef``, or ``poison``
+If ``<len>`` is 0, the pointer may be NULL, dangling, ``undef``, or ``poison``
 pointer. However, it must still be appropriately aligned.
-If "len" isn't a well-defined value, all of its possible representations should
-make the behavior of this ``llvm.memset`` defined, otherwise the behavior is
-undefined.
+If ``<len>`` is not a well-defined value, the behavior is undefined.
+If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
+otherwise the behavior is undefined.
 
 '``llvm.sqrt.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D95238.318521.patch
Type: text/x-patch
Size: 4187 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210122/a0f365e1/attachment.bin>


More information about the llvm-commits mailing list