[PATCH] D98477: [ADT] Add IntrusiveVariant

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 6 16:47:48 PDT 2021


dblaikie added a subscriber: rsmith.
dblaikie added inline comments.


================
Comment at: llvm/include/llvm/ADT/IntrusiveVariant.h:159-166
+/// A helper macro to add the declarations needed to use a type as an
+/// alternative for IntrusiveVariant. Must be the first declaration of the
+/// class.
+#define DECLARE_INTRUSIVE_ALTERNATIVE                                          \
+  ::llvm::IntrusiveVariantTag IntrusiveVariantTagMember;                       \
+  template <typename...> friend class ::llvm::IntrusiveVariant;                \
+  template <std::size_t, typename, typename...>                                \
----------------
scott.linder wrote:
> dblaikie wrote:
> > Can we instead use/generalize something like the type traits used for PointerUnion that declare the number of spare bits in the type? Since we already have those in LLVM (& to macro-y).
> My understanding of the `PointerUnion` case is that the standard allows the bitwise manipulation of pointer representations so long as you jump through the appropriate casting hoops (i.e. to `(u)intptr_t` and back) and ensure you end up with the same representation before casting back. For objects of arbitrary standard-layout type I don't know that we have any neat way to ensure we don't invoke UB, as the "padding" bits in the object representation are not guaranteed to be preserved over most uses of the struct, AFAIK.
> 
> That said, if I'm just missing something I would be very happy to remove `DECLARE_INTRUSIVE_ALTERNATIVE`. I currently see it as an acceptable concession needed to meet the requirements of the "common initial sequence" rule, which seems like the only way to get the compact layout without invoking UB.
Wondering if it might be more intuitive for users to work from the back instead - bit more work in the implementation, though?

eg: if I've got a `{int, char}` struct, and I use some tool to dump the layout and find I have 3 bytes of tail padding I could create an array of 3 chars at the end (`{int, char, char[3]}`) and specialize some template to say "I have 3 bytes of tail padding char array" - guess that still doesn't provide compatibility/foundation to implement pointer union on. Would have to declare the spare space in bits, and the aliasing would still be a problem for the pointer case 

I guess the system you've got means you don't have to specialize another template, etc - the existence of a friend-accessed member with a specific name you can check for is enough evidence the type has agreed to be intrusively varianted.

I'd love to here @rsmith's or other C++-y folks thoughts on this, as it's some non-trivial design with interesting overlap on existing constructs.


================
Comment at: llvm/include/llvm/ADT/IntrusiveVariant.h:276-280
+// case limbs still have to resolve to some overload, but we won't have an
+// overload available for the cases where the index is too large (i.e. we want
+// there to be no such overload, so we get compile-time errors for those cases
+// everywhere else). As a workaround, we make the body of the out-of-bounds
+// overloads be llvm_unreachable.
----------------
scott.linder wrote:
> dblaikie wrote:
> > Not sure I follow why this ends up as a runtime error - what if the overload with unreachable (down at 307) wasn't defined? Wouldn't that produce a compile-time error only when the index is out of bounds?
> AFAICT the only way to get that behavior is to have a separate overload for every size of `IntrusiveVariant` we support, as there is no way to recursively generate the switch itself, and if we just declare a switch with some arbitrary number of `case` limbs (in my patch this is 32) then it is a compile-time error to use any `IntrusiveVariant` with fewer than 32 alternatives.
> 
> In the simplest case:
> 
> ```
> visit(Callable C, IntrusiveVariant<A> IV) { // i.e. toy example of a monomorphized visit for a variant with a single alternative type A
>   switch (IV.index()) {
>   case 0: C(IV.Union.getMember(InPlaceIndex{0})); break;
>   case 1: C(IV.Union.getMember(InPlaceIndex{1})); break; // This does not resolve to any overload of getMember, and is a compile-time error.
>   }
> }
> ```
> 
> In the above, if we only wrote one template for `visit` and it has two `case` limbs in the `switch`, then it will only compile for variants with exactly two alternative types. If we could recursively generate the body of `visit` we could do this without the `unreachable` workaround, but I don't see how to.
I've missed a step in the logic here.

I'm still trying to wrap my head around: What happens if the function defined below with llvm_unreachable was instead not defined at all? Where is the code that's creating a dead call to this code such that it needs to be defined, but is guaranteed never to be called?

Separately, as for the switches: Presumably some recursion could be used instead. Yeah, it doesn't scale well especially at -O0 codegen.

Oh, maybe an array of function pointers could be used - a function template templated on the index? Would avoid the recursion.


================
Comment at: llvm/include/llvm/ADT/IntrusiveVariant.h:429-432
+  union {
+    detail::CommonInitialSequenceT CommonInitialSequence;
+    UnionT Union;
+  };
----------------
scott.linder wrote:
> dblaikie wrote:
> > Technically C++ doesn't allow access to a common initial sequence except for types with standard layout, I think? So types with non-trivial copy ops, etc, wouldn't be usable here. (I wonder what PointerUnion does? I guess it just stuffs everything in void* or uintptr_t, which isn't exactly applicable here - but maybe some aligned storage would be suitable here)
> Yes, that's my understanding too, and I do `static_assert` the `standard-layout`-ness of all alternative types and mention this restriction in the file header comment.
> 
> > maybe some aligned storage would be suitable here
> 
> I think this is the point I may be missing up above, but if I understand what you're getting at the idea would be to basically:
> 
> * Allocate an array of `char` with the maximum alignment of any of the alternative types
> * If possible, put our index tag in any padding left over after the value representation of all the alternative types (as I mentioned above, this is the part I'm unsure about concerning UB)
> * Manage all of the construction, copying, moving, destruction of alternative types in the array of `char`
> 
> If there is a way to do this without relying on UB, then it seems reasonable to me. I do worry it would end up being a lot of code, in particular I think we would need to do something like the `libcxx` implementation of `std::variant` to support all of the possible combinations of levels of trivial-ness of construction, copying, destruction, etc. which was something I wanted to avoid doing purely to make the code simpler.
Ah. Rather than using the common initial sequence rule - could you instead take a pointer and reinterpret_cast it - knowing that the member is at the start of the object? I would've thought that would be valid even without the common initial sequence rule - but maybe there's some pointer providence that makes that approach invalid?

(eg: if I know `struct x { int; };` starts with an int, and I have `x *y = &some_x;` can I not reinterpret_cast `y` to `int*` and dereference it?)

but, yeah - partly I was thinking of putting it at the end rather than the beginning - and, yeah, I'm not sure what happens if you have some of these chars at the end of the object, then some trailing after the object (if some of the objects are of different sizes) it may be fussy to correctly address all those bytes (struct-internal padding, and external padding) in a simple/uniform way.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98477/new/

https://reviews.llvm.org/D98477



More information about the llvm-commits mailing list