[PATCH] D43112: [WebAssembly] Use Symbol class heirarchy. NFC.

Nicholas Wilson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 14 04:36:49 PST 2018


ncw added a comment.

In https://reviews.llvm.org/D43112#1006394, @ruiu wrote:

> > As to the union type, we don't really use it as a union. That is a convenient way to allocate a memory that is large enough to hold any Symbol-derived type, but we don't access any member of it. As you wrote, maybe we should make it explicit by changing (Symbol *)make<SymbolUnion>() to reinterpret_cast<Symbol *>(make<SymbolUnion>()) (I actually wrote that cast with reinterpret_cast in mind, without noticing that there's other way of interpreting it), or if the existence of the union causes confusion like that, we should remove SymbolUnion and define char[maximum symbol size] for a symbol instead. The point is that the union isn't important. We just want to allocate blank memory for symbols. We identify object type using LLVM's classof mechanism, and I think that works just fine.
>
> On second thought, it is already interpreted as `interpret_cast<>` and not a UB, no? SymbolUnion is not a union of Symbols, but is a union of char[].


I'm happy with the assertions that the types are trivially-destructible, that makes good sense.

I'm still a bit uneasy that there might be UB. You're assuming that the base class is positioned right at the front of the class: you take the address of the union and reinterpret it as a `Symbol*`. Then, when you actually construct the object using placement new, you're assuming that the base-class pointer is unadjusted:

  T *replaceSymbol(Symbol *S, ArgT &&... Arg) {
    T *Derived = new (S) T(std::forward<ArgT>(Arg)...);
    assert(static_cast<Symbol*>(Derived) == S); // This is what you are requiring to be true
    return Derived;
  }

And yet: the compiler's (implementation-defined) rules for object layout might not put the base class there at the same address as the start of the memory. (And possibly the compile environment could redefine placement new to use padding, possibly by some kind of sanitizer guards?)

I think that for "standard layout" class, the object layout //has// to put the base class first in memory (http://en.cppreference.com/w/cpp/language/data_members). Unfortunately, the Symbol classes are //not// "standard layout", so I can see any standards-provided guarantee that the base class is where you want it to be.

I don't want to be a pesky language lawyer. Would it be OK to add the assertion above?



================
Comment at: wasm/Symbols.h:231-238
+T *replaceSymbol(Symbol *S, ArgT &&... Arg) {
+  static_assert(sizeof(T) <= sizeof(SymbolUnion), "Symbol too small");
+  static_assert(alignof(T) <= alignof(SymbolUnion),
+                "SymbolUnion not aligned enough");
+  assert(static_cast<Symbol *>(static_cast<T *>(nullptr)) == nullptr &&
+         "Not a Symbol");
+  return new (S) T(std::forward<ArgT>(Arg)...);
----------------
sbc100 wrote:
> sbc100 wrote:
> > ncw wrote:
> > > Yikes, this is scary!
> > > 
> > > 1. For one, doing `make<SymbolUnion>` means that the actual symbol's destructor won't be called at the program end. We're relying on the Symbol classes being "simple" with trivial destructors, but what if someone forgets and sticks a std::vector in there as a member...? It's not ideal, quite fragile.
> > > 2. And similarly, replaceSymbol doesn't deallocate the previous data in the union, in just writes straight over it. The new object will be OK, but the previous one will leak any members. Again not a problem as long as the Symbols all have trivial dtors, but it feels like an accident waiting to happen.
> > > 3. Is it technically Undefined Behaviour? You seem to be relying on the exact details of the base-to-derived pointer adjustments. First we allocate the union, then reinterpret-cast it to a `Symbol*`, then placement-construct an UndefinedFunction symbol. Then the first bit of UB happens, we assume that the newly-constructed UndefinedFunction, when cast to a Symbol, has the same address. That is, we assume the following:
> > > 
> > > `(void*)(BaseClass*)(new (addr) DerivedClass) == (void*)addr`
> > > 
> > > Then the second bit of UB happens when we do replaceSymbol and make the same assumption again, but in replaceSymbol we further assume that all derived classes will construct their Symbol base at the same offset within the memory block we're placement-constructing them in (it's just another assumption about object layout).
> > > 
> > > The basic assumption is that base classes are constructed before the derived class, and the base class is at offset zero within the memory when placement-constructed at a specific address.
> > > 
> > > I can see it's just copying the existing LLD code. What you're trying to achieve is to dynamically change the derived type of an object, so that previously-created pointers to the base class remain valid as pointers to the new object's base class.
> > > 
> > > There is a solution I can think of that's "safe". Instead of doing `reinterpret_cast<Symbol*>(make<SymbolUnion>())`, why not instead put the Kind member next to the union, something like this: `struct SymbolHandle { int Kind; SymbolUnion U; }` and then store pointers to the union-with-kind. You can then safely destruct the union using a switch, and safely replace the union too; and finally, you can add automatic casting operators that allow casting a SymbolHandle to an `UndefinedFunction&` etc with an assertion on Kind and a cast on the appropriate member of the union.
> > I think perhaps you raise some good points, but I see this as more of an lld wide discussion.   The other ports have always working in this way AFAICT, and it was always my intention to have the wasm port do the same thing.
> > 
> > Perhaps @ruiu can add some documentation about this technique, why it is actually safe in the this context, and what the motivating factors are for using it.
> > 
> > In any case I don't think we should block this change in this design discussion.   Lets make the linkers consistent and iterate (together) from there.
> FWIW, it looks it should be easy to protect against (1) and (2) by adding:
> 
> ```
>   static_assert(std::is_trivially_destructible<T>(),                             
>                 "Symbol types must be trivially destructible");  
> ```
> 
> The spec on this explicitly says `Storage occupied by trivially destructible objects may be reused without calling the destructor.`.
> 
> That still leaves the concerns you raise in (3) of course.
"have always worked this way" - I think it's actually recent, looks like the change dates to 31 Oct 2017.

Agreed that it's OK to change Wasm to match.


Repository:
  rLLD LLVM Linker

https://reviews.llvm.org/D43112





More information about the llvm-commits mailing list