[LLVMdev] Structure Types and ABI sizes

Tue Feb 15 14:14:13 PST 2011

Renato Golin <renato.golin at arm.com> writes:

>> There are ways to do that without losing too much information.  For
>> example, we render the above without using arrays at all:
>>
>> %I = type { i32, i8, i16 }
>> %J = type { %I, i8, i16 }
>
> Not if you follow the Itanium C++ ABI.
>
> In your example it works because { i8, i16 } pads nicely to 4 bytes,

That's why we use { i8, i16 }.  It's not by accident.  We do adhere to
the Itanium ABI.

> so there is no tail padding. If there is tail padding, the size of the
> Base class is different from the size of the Base class inside the
> Derived one.

Yes, that's true for non-POD types.  In that case the base class really
has two different representations which need to be two different types
in LLVM.  This is really ugly stuff.  I fixed a whole slew of bugs
around just this issue last year.  :)

In these cases you may have to resort to arrays or at least a bunch of
consecutive i8s.  I'm sure we do so though I would have to verify that.

> So, in my example "B : public A { char }":
>
> %A = type { i32, i8 }
> %B = type { %A, i8 }
>
> A has 8 bytes, as it should, but inside B it has only 5, so B's first
> field offset is 5, not 8. This is why we have to do:
>
> %B = { [5 x i8], i8, [3 x i8] }

Wait, that's not what you showed before:

// CHECK: %struct.J = type { [8 x i8], i8, [3 x i8] }
struct J : I {
  char c;
};

%B = { [5 x i8], i8, [3 x i8] } is not correct for the Itanium ABI
because tail padding cannot be overlaid in "POD for the purposes of
layout" types (secs. 1.1 and 2.2).  You had it right the first time.  :)

> Adding the 3 bytes at the end is NOT the problem, but revoking the
> type (and it's natural alignment) from %A is.

What do you mean by "revoking?"  Do you mean inferring the type of %A
within %B given %B's layout?  Why do you need to get the alignment
information anyway?  The byte offsets are fixed by the ABI so in the
end, bits is bits and addresses is addresses.  Ugly casts may be
necessary but nothing too drastic that will seriously prevent
optimization.

> My idea was that StructLayout could have more (optional) sources of
> information, to do a better job at figuring out sizes and offsets. We
> even thought about creating a Pass that will transform from natural
> structures, unions and bitfields to the horrible mess it results to
> when lowered, but that's avoiding the problem, not solving them.

I'm still not exactly sure what problem you're trying to solve.  Is it a
correctness issue in your code generator?

That said, I have thought along similar lines to make frontends easier
to construct.  I imagined metadata on struct types to indicate layout
requirements but the current metadata system is not appropriate since it
does not consider metadata to be semantically important for correctness.

But even with that solution, the frontend would still need to add the
metadata to struct types.  There's really no way around the frontend
needing to understand the ABI at some level.  It has to convey the
language semantics to LLVM, which is by design language-agnostic.

> The IR was designed for type safety and we have far too many hacks in
> the type system that all C++ front-ends have to do. Maybe the original
> design wasn't followed so closely, or we need a new design document
> that clearly states what the goals are, because the way it is, it's
> not clear, and it's definitely not good for C++.

I'm not sure the IR was designed for type safety.  The original
designers can speak to that.  But any language that has things like
inttoptr and ptrtoint is inherently not type-safe.  The typing helps
certain classes of analysis and transformation but in the case of C++
inheritence there's not a whole lot that applies.  You need a
higher-level IR to take care of that stuff.

                               -Dave