[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

Mon Sep 10 11:29:37 PDT 2012

On Thu, Sep 6, 2012 at 4:24 PM, Dan Gohman <gohman at apple.com> wrote:

> Hello,
>
> Persuant to feedback,
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html
>
> here is a new proposal for detailed struct assignment information.
>
> Here's the example showing the basic problem:
>
> struct bar {
>  char x;
>  float y;
>  double z;
> };
> void copy_bar(struct bar *a, struct bar *b) {
>  *a = *b;
> }
>
> The solution I now propose here is to have front-ends describe the copy
> using metadata. For example:
>
>  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1
> false), !tbaa.struct !4
>  […]
>  !0 = metadata !{metadata !"Simple C/C++ TBAA"}
>  !1 = metadata !{metadata !"omnipotent char", metadata !0}
>  !2 = metadata !{metadata !"float", metadata !1}
>  !3 = metadata !{metadata !"double", metadata !1}
>  !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
>  !5 = metadata !{i64 1, metadata !1}
>  !6 = metadata !{i64 4, metadata !2}
>  !7 = metadata !{i64 8, metadata !3}
>
> Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.
>
> Metadata node !4 here is a top-level description of the memcpy. It holds a
> list of virtual members. An integer represents a padding field of that
> size. A metadata tuple represents an actual data field. The tuple's members
> are an integer size and a TBAA tag for the field.
>

Hey Dan, I've talked with you about this in person and on IRC, but I've not
yet laid out my thoughts on a single place, so I'll put them here.

TL;DR: I really like the idea of using metadata to tag each member of a
struct with TBAA, and re-using the TBAA metadata nodes we already have. I'm
not as fond of the description of padding in the metadata node.

Currently padding is really hard to represent because there is sometimes a
member of an LLVM struct which represents padding (packed structs and cases
where the frontend type requires more alignment than the datalayout string
specifies) and other times there isn't. The current proposal doesn't
entirely fix this because we still will need some way to annotate the
members of structs inserted purely for the purpose of padding.

Further, we have the problem that sometimes what is needed is a
representation of a "hole", that is a region which is neither padding nor
part of the struct itself. The canonical example is the tail padding of a
base class where the derived class's first member has low alignent
constraints.

I would propose that we solve these problems by a somewhat more invasive
change, but one which will significantly simplify both LLVM and frontends
(at least Clang, I suspect other frontends):

Remove non-packed struct types completely. Make LLVM structs represent a
contiguous sequence of bytes, explicitly partitioned into fields with
particular primitive types.

The idea would be to make all struct types be packed[1], and to represent
padding as explicit members of the struct. These could in turn have a
"padding" TBAA metadata node which would specify that member as being
padding. This would simplify the metadata representation because there
would *always* be a member to hang the padding tag off of. It would
simplify struct layout analysis in LLVM because the difference between
alloc-size and type-size would be irrelevant. It would dramatically
simplify Clang's record layout building, which already has to fall back to
packed LLVM structs in many cases because  normal structs produce offsets
that conflict with the ABI's layout requirements.

Essentially, LLVM is trying to simplify ABI layout by providing a
datalayout summary description of target alignments, and building structs
with that algorithm. But unless this *exactly* matches the ABI in question,
it actually makes the job harder because now we have to try, potentially
fail, and end up with all the code to use the packed mode anyways. My
theory is that there are too many ABIs in the world (and too weird rules
within them) for us to ever really get this right at the LLVM layer.
Instead, we should force the frontend to explicitly layout the bytes as it
sees fit.

Ok, now to the "how does this all work" part:

- No more alignment needed in the datalayout string[2].
- Other places where today we have optional alignment, if omitted the
alignment will be '1' instead of '0'. This will essentially require
alignment to be specified in more places.
- Array elements are packed[3]. If the elements of an array must be padded
out to a particular alignment, the array should be of a struct containing
the element and a padding member of the appropriate size. This will allow
us to tag that member with metadata as padding as well.
- Auto-upgrade uses old datalayout with alignments to synthesize necessary
align specifiers on instructions etc.
- TBAA metadata will identify members of a struct type which are padding
and hold no interesting data.

This would at least remove one dimension of complexity from Clang's record
layout building by removing the need to try non-packed structs and fallback
to packed. It should even allow us to retain the struct type for a base
class with derived class members packed into previously "padding" bytes at
the end. Currently, even the current proposal doesn't seem to support
retaining the llvm struct type for the base class in this case, or easily
annotating the fields of that base class with TBAA information.

Thoughts?
-Chandler

Some points of clarification:
[1]: I say "packed" repeatedly but never "bit packed" or "byte packed". My
inclination is to make the rule within LLVM "byte packed" and fix the idea
of a byte as an i8. I think its hopeless to support non-8-bit-bytes in
LLVM, and we should just move past that illusion. However, it would
certainly be possible to make this be "bit packed" and add bit padding with
appropriate metadata. I might even like that if it gives us a cleaner
semantic model, or helps tag certain bits as undef.

[2]: We could potentially keep some of this information here if there are
other parts of LLVM that use it... I'm not deeply familiar with all the
consumers of the datalayout string.

[3]: I'm torn on this one. It might be nice to have arrays get an optional
alignment that establishes the stride of the elements, particularly if we
want the semantics to be that between array elements we have a "hole"
rather than padding. However, I'm not aware of any place where this is a
practical or important constraint, and it seems to add complexity that we
don't need. If needed, it could always be added later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120910/7ac0a1f7/attachment.html>