[PATCH] D124571: Avoid strict aliasing violation on type punning inside llvm::PointerIntPair

Breno Rodrigues Guimaraes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 27 16:30:56 PDT 2022


brenoguim created this revision.
brenoguim added reviewers: chandlerc, akyrtzi.
Herald added subscribers: jeroen.dobbelaere, dexonsmith, kristof.beyls.
Herald added a project: All.
brenoguim requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

llvm::PointerIntPair has methods that when used together can invoke undefined behavior by violating strict aliasing.

`getPointer()` uses the underlying storage as it's declared: `intptr_t`
`getAddrOfPointer()` casts the underlying storage as if it was a `PointerTy`

This violates strict aliasing, so depending on how they are used, it's possible to have the compiler to optimize the code in unwanted ways.
See the unit test in the patch. We declare a `PointerIntPair` and use the `getAddrOfPointer` method to fill in the a pointer value.
Then, when we use `getPointer` the compiler is thrown off, thinking that `intptr_t` storage could not have possibly be changed, and the check fails.

I have no experience with llvm codebase, so I expect feedback on operational things like where to put the new class, how to format it and so on.

The solution is use a `char` buffer which is blessed by the standard for type punning. That alone is not enough. When accessing the `intptr_t` modeling, one also has to use std::memcpy to let the compiler know we are creating a new intptr_t using the bit pattern in the array of chars.

I recommend using the compiler explorer to play around with different solutions. There are a bunch of examples in the issue I reported in github: https://github.com/llvm/llvm-project/issues/55103

I must say, I'm not entirely sure the solution is correct, even though it works. It could be that the code still invokes undefined behavior, but has been fiddled in a way that the compiler cannot optimize based on UB. After all, you will see that the issue is fragile. Even seemingly harmless simplifications like commenting out the line with ++pairAddr will make the code work again.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D124571

Files:
  llvm/include/llvm/ADT/PointerIntPair.h
  llvm/unittests/ADT/PointerIntPairTest.cpp


Index: llvm/unittests/ADT/PointerIntPairTest.cpp
===================================================================
--- llvm/unittests/ADT/PointerIntPairTest.cpp
+++ llvm/unittests/ADT/PointerIntPairTest.cpp
@@ -106,4 +106,22 @@
       "trivially copyable");
 }
 
+TEST(PointerIntPairTest, TypePunning) {
+  int i = 0;
+  int *intPtr = &i;
+
+  int **intPtrBegin = &intPtr;
+  int **intPtrEnd = intPtrBegin + 1;
+
+  PointerIntPair<int *, 1> pair;
+  int **pairAddr = pair.getAddrOfPointer();
+
+  while (intPtrBegin != intPtrEnd) {
+    *pairAddr = *intPtrBegin;
+    ++pairAddr;
+    ++intPtrBegin;
+  }
+  EXPECT_EQ(pair.getPointer(), intPtr);
+}
+
 } // end anonymous namespace
Index: llvm/include/llvm/ADT/PointerIntPair.h
===================================================================
--- llvm/include/llvm/ADT/PointerIntPair.h
+++ llvm/include/llvm/ADT/PointerIntPair.h
@@ -19,10 +19,37 @@
 #include "llvm/Support/type_traits.h"
 #include <cassert>
 #include <cstdint>
+#include <cstring>
 #include <limits>
 
 namespace llvm {
 
+namespace detail {
+template <typename Ptr> struct PunnedPointer {
+  explicit constexpr PunnedPointer(intptr_t i = 0) { *this = i; }
+
+  intptr_t asInt() const {
+    intptr_t R = 0;
+    std::memcpy(&R, &Data, sizeof(R));
+    return R;
+  }
+
+  constexpr operator intptr_t() const { return asInt(); }
+
+  constexpr PunnedPointer &operator=(intptr_t V) {
+    std::memcpy(&Data, &V, sizeof(Data));
+    return *this;
+  }
+
+  Ptr *getPointerAddress() { return reinterpret_cast<Ptr *>(Data); }
+  const Ptr *getPointerAddress() const {
+    return const_cast<PunnedPointer *>(this)->getPointerAddress();
+  }
+
+  alignas(Ptr) char Data[sizeof(Ptr)];
+};
+} // namespace detail
+
 template <typename T, typename Enable> struct DenseMapInfo;
 template <typename PointerT, unsigned IntBits, typename PtrTraits>
 struct PointerIntPairInfo;
@@ -46,7 +73,7 @@
 class PointerIntPair {
   // Used by MSVC visualizer and generally helpful for debugging/visualizing.
   using InfoTy = Info;
-  intptr_t Value = 0;
+  detail::PunnedPointer<PointerTy> Value;
 
 public:
   constexpr PointerIntPair() = default;
@@ -86,10 +113,12 @@
     assert(Value == reinterpret_cast<intptr_t>(getPointer()) &&
            "Can only return the address if IntBits is cleared and "
            "PtrTraits doesn't change the pointer");
-    return reinterpret_cast<PointerTy *>(&Value);
+    return Value.getPointerAddress();
   }
 
-  void *getOpaqueValue() const { return reinterpret_cast<void *>(Value); }
+  void *getOpaqueValue() const {
+    return reinterpret_cast<void *>(Value.asInt());
+  }
 
   void setFromOpaqueValue(void *Val) & {
     Value = reinterpret_cast<intptr_t>(Val);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D124571.425637.patch
Type: text/x-patch
Size: 2712 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220427/8633011d/attachment.bin>


More information about the llvm-commits mailing list