[PATCH] D47747: [LangRef] Clarify "undefined" for various instructions.

Tue Jun 5 08:03:12 PDT 2018

nlopes added inline comments.

================
Comment at: docs/LangRef.rst:1051
     been the most recent stack allocation that is still live, or the
-    results are undefined. It is possible to allocate additional stack
+    behavior is undefined. It is possible to allocate additional stack
     space after an argument allocation and before its call site, but it
----------------
I'm fine with all these UBs in function attributes, since it seems it's how they work today.
However, we need to make sure that when hoisting function calls, which I don't think it's done today. This may not be feasible if the attribute is on the function decl rather than on the call instruction.

================
Comment at: docs/LangRef.rst:2365
    NaN. Such optimizations are required to retain defined behavior over
-   NaNs, but the value of the result is undefined.
+   NaNs, but the value of the result is unspecified.

----------------
I'm not sure this sentence is clear.
Does it mean that for NaNs the result has to be the same before and after optimization? Or does it mean that the result after the optimization can be whatever, but fixed (no poison or undef allowed)?

================
Comment at: docs/LangRef.rst:3297
     must be scalars, or vectors of the same number of elements. If the
-    value won't fit in the integer type, the results are undefined.
+    value won't fit in the integer type, the result is ``undef``.
 ``fptosi (CST to TYPE)``
----------------
Can we have poison here instead? (and for the following ones as well)

================
Comment at: docs/LangRef.rst:4956
+returned by the called function at this call site is in. If the loaded or
+returned value is not in the specified range, the behavior is undefined. The
+ranges are represented with a flattened list of integers. The loaded value or
----------------
UB vs poison here:
 - UB: after a load with 'range' we know the memory has a value within the range.
 - Poison: after branching on the loaded value we know the memory as a value within the range.

UB has more imediate effects than poison, of course. For GVN and friends, UB is a bit easier, but hoisting such an instruction we need to drop the range metadata. Is it done today?
For range analysis, I guess both semantics are fine, though the UB semantics may potentially allow a better analysis since after the load the analysis can assume the range right away instead of waiting for a branch.  The UB semantics helps shrinking bitwidth of arithmetic as well, which otherwise isn't easy to do (since you would need to check the users of the expression tree).

Bottom line: this really depends on what kind of transformations LLVM does today (and in the future) that care about this range metadata.

================
Comment at: docs/LangRef.rst:7314
 Its value is the value at position ``idx`` of ``val``. If ``idx``
-exceeds the length of ``val``, the results are undefined.
+exceeds the length of ``val``, the result is ``undef``.

----------------
can it be poison instead?

================
Comment at: docs/LangRef.rst:7575
+reclaimed. Allocating zero bytes is legal, but the returned pointer is
+is ``undef``. The order in which memory is allocated (ie., which way the
+stack grows) is not specified.
----------------
Is undef required by the C/C++ standards or can it be poison?

================
Comment at: docs/LangRef.rst:8163
 involving memory) involving a pointer derived from a ``getelementptr`` with
-the ``inrange`` keyword is undefined, with the exception of comparisons
+the ``inrange`` keyword is ``undef``, with the exception of comparisons
 in the case where both operands are in the range of the element selected
----------------
GEP is difficult. I suggest we leave GEP inbounds/inrange discussion for a separate patch.

Repository:
  rL LLVM

https://reviews.llvm.org/D47747