[PATCH] D126309: [docs][OpaquePtr] Add detail to motivations behind opaque pointers

Tue May 24 10:58:22 PDT 2022

aeubanks created this revision.
Herald added a project: All.
aeubanks requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D126309

Files:
  llvm/docs/OpaquePointers.rst


Index: llvm/docs/OpaquePointers.rst
===================================================================

--- llvm/docs/OpaquePointers.rst
+++ llvm/docs/OpaquePointers.rst
@@ -12,7 +12,23 @@
 
 The opaque pointer type project aims to replace all pointer types containing
 pointee types in LLVM with an opaque pointer type. The new pointer type is
-tentatively represented textually as ``ptr``.
+represented textually as ``ptr``.
+
+Some instructions still need to know what type to treat the memory pointed to by
+the pointer as. For example, a load needs to know how many bytes to load from
+memory and what type to treat the resulting value as. In these cases,
+instructions themselves contain a type argument. For example the load
+instruction from older versions of LLVM
+
+.. code-block:: llvm
+
+  load i64* %p
+
+becomes
+
+.. code-block:: llvm
+
+  load i64, ptr %p
 
 Address spaces are still used to distinguish between different kinds of pointers
 where the distinction is relevant for lowering (e.g. data vs function pointers
@@ -29,34 +45,39 @@
 underlying type in memory. In other words, the pointee type carries no real
 semantics.
 
-Lots of operations do not actually care about the underlying type. These
-operations, typically intrinsics, usually end up taking an ``i8*``. This causes
-lots of redundant no-op bitcasts in the IR to and from a pointer with a
-different pointee type. The extra bitcasts take up space and require extra work
-to look through in optimizations. And more bitcasts increase the chances of
-incorrect bitcasts, especially in regards to address spaces.
-
-Some instructions still need to know what type to treat the memory pointed to by
-the pointer as. For example, a load needs to know how many bytes to load from
-memory. In these cases, instructions themselves contain a type argument. For
-example the load instruction from older versions of LLVM
-
-.. code-block:: llvm
-
-  load i64* %p
-
-becomes
-
-.. code-block:: llvm
-
-  load i64, ptr %p
-
-A nice analogous transition that happened earlier in LLVM is integer signedness.
-There is no distinction between signed and unsigned integer types, rather the
-integer operations themselves contain what to treat the integer as. Initially,
-LLVM IR distinguished between unsigned and signed integer types. The transition
-from manifesting signedness in types to instructions happened early on in LLVM's
-life to the betterment of LLVM IR.
+Historically LLVM was some sort of type-safe subset of C. Having pointee types
+provided an extra layer of checks to make sure that the Clang frontend matched
+its frontend values/operations with the corresponding LLVM IR. However, as other
+languages like C++ adopted LLVM, the community realized that pointee types were
+more of a hinderance for LLVM development and that the extra type checking with
+some frontends wasn't worth it.
+
+Many operations do not actually care about the underlying type. These
+operations, typically intrinsics, usually end up taking an arbitrary pointer
+type ``i8*`` and sometimes a size. This causes lots of redundant no-op bitcasts
+in the IR to and from a pointer with a different pointee type.
+
+No-op bitcasts take up memory/disk space and also take up compile time to look
+through. However, perhaps the biggest issue is the code complexity required to
+deal with bitcasts. When looking up through def-use chains for pointers it's
+easy to forget to call `Value::stripPointerCasts()` to find the true underlying
+pointer obfuscated by bitcasts. And when looking down through def-use chains
+passes need to iterate through bitcasts to handle uses. Removing no-op pointer
+bitcasts prevents a category of missed optimizations and makes writing LLVM
+passes a little bit easier.
+
+Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in
+regards to address spaces. People maintaining backends that care a lot about
+address spaces have complained that frontends like Clang often incorrectly
+bitcast pointers, losing address space information.
+
+An analogous transition that happened earlier in LLVM is integer signedness.
+Currently there is no distinction between signed and unsigned integer types, but
+rather each integer operation (e.g. add) contains flags to signal how to treat
+the integer. Previously LLVM IR distinguished between unsigned and signed
+integer types and ran into similar issues of no-op casts. The transition from
+manifesting signedness in types to instructions happened early on in LLVM's
+timeline to make LLVM easier to work with.
 
 Opaque Pointers Mode
 ====================


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D126309.431727.patch
Type: text/x-patch
Size: 4605 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220524/9755a1ff/attachment.bin>