[llvm] b66ca91 - [Docs] Update GEP docs for opaque pointers

Mon Jul 25 00:52:44 PDT 2022

Author: Nikita Popov
Date: 2022-07-25T09:52:14+02:00
New Revision: b66ca91fe6f91edde10510571dec5e31e2dbcfe2

URL: https://github.com/llvm/llvm-project/commit/b66ca91fe6f91edde10510571dec5e31e2dbcfe2
DIFF: https://github.com/llvm/llvm-project/commit/b66ca91fe6f91edde10510571dec5e31e2dbcfe2.diff

LOG: [Docs] Update GEP docs for opaque pointers

Update the GEP FAQ to use opaque pointers. This requires more than
a syntactic change in some place, because some of the concerns just
don't make sense anymore (trying to index past a ptr member in a
struct for example).

This also fixes uses of incorrect syntax to declare or reference
globals.

Differential Revision: https://reviews.llvm.org/D130353

Added: 
    

Modified: 
    llvm/docs/GetElementPtr.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/GetElementPtr.rst b/llvm/docs/GetElementPtr.rst
index a25c12bef7e8..49076ab6f599 100644

--- a/llvm/docs/GetElementPtr.rst
+++ b/llvm/docs/GetElementPtr.rst
@@ -88,15 +88,15 @@ looks like:
 
 .. code-block:: llvm
 
-  define void @munge(%struct.munger_struct* %P) {
+  define void @munge(ptr %P) {
   entry:
-    %tmp = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 1, i32 0
-    %tmp1 = load i32, i32* %tmp
-    %tmp2 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 2, i32 1
-    %tmp3 = load i32, i32* %tmp2
+    %tmp = getelementptr %struct.munger_struct, ptr %P, i32 1, i32 0
+    %tmp1 = load i32, ptr %tmp
+    %tmp2 = getelementptr %struct.munger_struct, ptr %P, i32 2, i32 1
+    %tmp3 = load i32, ptr %tmp2
     %tmp4 = add i32 %tmp3, %tmp1
-    %tmp5 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 0, i32 0
-    store i32 %tmp4, i32* %tmp5
+    %tmp5 = getelementptr %struct.munger_struct, ptr %P, i32 0, i32 0
+    store i32 %tmp4, ptr %tmp5
     ret void
   }
 
@@ -108,11 +108,11 @@ To make this clear, let's consider a more obtuse example:
 
 .. code-block:: text
 
-  %MyVar = uninitialized global i32
+  @MyVar = external global i32
   ...
-  %idx1 = getelementptr i32, i32* %MyVar, i64 0
-  %idx2 = getelementptr i32, i32* %MyVar, i64 1
-  %idx3 = getelementptr i32, i32* %MyVar, i64 2
+  %idx1 = getelementptr i32, ptr @MyVar, i64 0
+  %idx2 = getelementptr i32, ptr @MyVar, i64 1
+  %idx3 = getelementptr i32, ptr @MyVar, i64 2
 
 These GEP instructions are simply making address computations from the base
 address of ``MyVar``.  They compute, as follows (using C syntax):
@@ -125,12 +125,12 @@ address of ``MyVar``.  They compute, as follows (using C syntax):
 
 Since the type ``i32`` is known to be four bytes long, the indices 0, 1 and 2
 translate into memory offsets of 0, 4, and 8, respectively. No memory is
-accessed to make these computations because the address of ``%MyVar`` is passed
+accessed to make these computations because the address of ``@MyVar`` is passed
 directly to the GEP instructions.
 
 The obtuse part of this example is in the cases of ``%idx2`` and ``%idx3``. They
 result in the computation of addresses that point to memory past the end of the
-``%MyVar`` global, which is only one ``i32`` long, not three ``i32``\s long.
+``@MyVar`` global, which is only one ``i32`` long, not three ``i32``\s long.
 While this is legal in LLVM, it is inadvisable because any load or store with
 the pointer that results from these GEP instructions would produce undefined
 results.
@@ -145,22 +145,21 @@ variable which is always a pointer type. For example, consider this:
 
 .. code-block:: text
 
-  %MyStruct = uninitialized global { float*, i32 }
+  %MyStruct = external global { ptr, i32 }
   ...
-  %idx = getelementptr { float*, i32 }, { float*, i32 }* %MyStruct, i64 0, i32 1
+  %idx = getelementptr { ptr, i32 }, ptr %MyStruct, i64 0, i32 1
 
-The GEP above yields an ``i32*`` by indexing the ``i32`` typed field of the
+The GEP above yields a ``ptr`` by indexing the ``i32`` typed field of the
 structure ``%MyStruct``. When people first look at it, they wonder why the ``i64
 0`` index is needed. However, a closer inspection of how globals and GEPs work
 reveals the need. Becoming aware of the following facts will dispel the
 confusion:
 
-#. The type of ``%MyStruct`` is *not* ``{ float*, i32 }`` but rather ``{ float*,
-   i32 }*``. That is, ``%MyStruct`` is a pointer to a structure containing a
-   pointer to a ``float`` and an ``i32``.
+#. The type of ``%MyStruct`` is *not* ``{ ptr, i32 }`` but rather ``ptr``.
+   That is, ``%MyStruct`` is a pointer (to a structure), not a structure itself.
 
 #. Point #1 is evidenced by noticing the type of the second operand of the GEP
-   instruction (``%MyStruct``) which is ``{ float*, i32 }*``.
+   instruction (``%MyStruct``) which is ``ptr``.
 
 #. The first index, ``i64 0`` is required to step over the global variable
    ``%MyStruct``.  Since the second argument to the GEP instruction must always
@@ -181,39 +180,31 @@ only involved in the computation of addresses. For example, consider this:
 
 .. code-block:: text
 
-  %MyVar = uninitialized global { [40 x i32 ]* }
+  @MyVar = external global { i32, ptr }
   ...
-  %idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
+  %idx = getelementptr { i32, ptr }, ptr @MyVar, i64 0, i32 1
+  %arr = load ptr, ptr %idx
+  %idx = getelementptr [40 x i32], ptr %arr, i64 0, i64 17
 
-In this example, we have a global variable, ``%MyVar`` that is a pointer to a
-structure containing a pointer to an array of 40 ints. The GEP instruction seems
-to be accessing the 18th integer of the structure's array of ints. However, this
-is actually an illegal GEP instruction. It won't compile. The reason is that the
-pointer in the structure *must* be dereferenced in order to index into the
-array of 40 ints. Since the GEP instruction never accesses memory, it is
-illegal.
+In this example, we have a global variable, ``@MyVar``, which is a pointer to
+a structure containing a pointer. Let's assume that this inner pointer points
+to an array of type ``[40 x i32]``. The above IR will first compute the address
+of the inner pointer, then load the pointer, and then compute the address of
+the 18th array element.
 
-In order to access the 18th integer in the array, you would need to do the
-following:
+This cannot be expressed in a single GEP instruction, because it requires
+a memory dereference in between. However, the following example would work
+fine:
 
 .. code-block:: text
 
-  %idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %, i64 0, i32 0
-  %arr = load [40 x i32]*, [40 x i32]** %idx
-  %idx = getelementptr [40 x i32], [40 x i32]* %arr, i64 0, i64 17
-
-In this case, we have to load the pointer in the structure with a load
-instruction before we can index into the array. If the example was changed to:
-
-.. code-block:: text
-
-  %MyVar = uninitialized global { [40 x i32 ] }
+  @MyVar = external global { i32, [40 x i32 ] }
   ...
-  %idx = getelementptr { [40 x i32] }, { [40 x i32] }*, i64 0, i32 0, i64 17
+  %idx = getelementptr { [40 x i32] }, ptr @MyVal, i64 0, i32 1, i64 17
 
-then everything works fine. In this case, the structure does not contain a
-pointer and the GEP instruction can index through the global variable, into the
-first field of the structure and access the 18th ``i32`` in the array there.
+In this case, the structure does not contain a pointer and the GEP instruction
+can index through the global variable, into the second field of the structure
+and access the 18th ``i32`` in the array there.
 
 Why don't GEP x,0,0,1 and GEP x,1 alias?
 ----------------------------------------
@@ -226,17 +217,15 @@ index. Consider this example:
 
 .. code-block:: llvm
 
-  %MyVar = global { [10 x i32] }
-  %idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 0, i32 0, i64 1
-  %idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
+  @MyVar = external global { [10 x i32] }
+  %idx1 = getelementptr { [10 x i32] }, ptr @MyVar, i64 0, i32 0, i64 1
+  %idx2 = getelementptr { [10 x i32] }, ptr @MyVar, i64 1
 
 In this example, ``idx1`` computes the address of the second integer in the
-array that is in the structure in ``%MyVar``, that is ``MyVar+4``. The type of
-``idx1`` is ``i32*``. However, ``idx2`` computes the address of *the next*
-structure after ``%MyVar``. The type of ``idx2`` is ``{ [10 x i32] }*`` and its
-value is equivalent to ``MyVar + 40`` because it indexes past the ten 4-byte
-integers in ``MyVar``. Obviously, in such a situation, the pointers don't
-alias.
+array that is in the structure in ``@MyVar``, that is ``MyVar+4``.  However,
+``idx2`` computes the address of *the next* structure after ``@MyVar``, that is
+``MyVar+40``, because it indexes past the ten 4-byte integers in ``MyVar``.
+Obviously, in such a situation, the pointers don't alias.
 
 Why do GEP x,1,0,0 and GEP x,1 alias?
 -------------------------------------
@@ -244,18 +233,16 @@ Why do GEP x,1,0,0 and GEP x,1 alias?
 Quick Answer: They compute the same address location.
 
 These two GEP instructions will compute the same address because indexing
-through the 0th element does not change the address. However, it does change the
-type. Consider this example:
+through the 0th element does not change the address. Consider this example:
 
 .. code-block:: llvm
 
-  %MyVar = global { [10 x i32] }
-  %idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1, i32 0, i64 0
-  %idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
+  @MyVar = global { [10 x i32] }
+  %idx1 = getelementptr { [10 x i32] }, ptr @MyVar, i64 1, i32 0, i64 0
+  %idx2 = getelementptr { [10 x i32] }, ptr @MyVar, i64 1
 
-In this example, the value of ``%idx1`` is ``%MyVar+40`` and its type is
-``i32*``. The value of ``%idx2`` is also ``MyVar+40`` but its type is ``{ [10 x
-i32] }*``.
+In this example, the value of ``%idx1`` is ``MyVar+40``, and the value of
+``%idx2`` is also ``MyVar+40``.
 
 Can GEP index into vector elements?
 -----------------------------------
@@ -447,8 +434,8 @@ value") if the GEP overflows (i.e. wraps around the end of the address space).
 As such, there are some ramifications of this for inbounds GEPs: scales implied
 by array/vector/pointer indices are always known to be "nsw" since they are
 signed values that are scaled by the element size.  These values are also
-allowed to be negative (e.g. "``gep i32 *%P, i32 -1``") but the pointer itself
-is logically treated as an unsigned value.  This means that GEPs have an
+allowed to be negative (e.g. "``gep i32, ptr %P, i32 -1``") but the pointer
+itself is logically treated as an unsigned value.  This means that GEPs have an
 asymmetric relation between the pointer base (which is treated as unsigned) and
 the offset applied to it (which is treated as signed). The result of the
 additions within the offset calculation cannot have signed overflow, but when