[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information

Mon Aug 27 22:22:10 PDT 2012

<moving this to llvmdev now that the lists are back up!>

On Aug 23, 2012, at 4:37 PM, Dan Gohman <gohman at apple.com> wrote:
> On Aug 23, 2012, at 4:05 PM, Chris Lattner <clattner at apple.com> wrote:
>> On Aug 23, 2012, at 3:59 PM, Dan Gohman <gohman at apple.com> wrote:
>>> On Aug 23, 2012, at 3:31 PM, Chris Lattner <clattner at apple.com> wrote:
>>>> Interesting approach.  The IR type for a struct may or may not be enough to describe holes (think unions and other cases), have you considered a more explicit MDNode that describes the ranges of any holes?
>>> 
>>> What's the issue with unions? Do you mean unions containing structs
>>> containing holes?
>> 
>> Unions don't lower to a unique or useful IR type.  In general, I'm skeptical of anything that uses IR types to reason about source level types (except primitives like integers and floats).
> 
> I'm confused. It seems a big difference here between your expectations
> and my understanding is that you're expecting to see source level types
> here, whereas it hadn't even occurred to me that we should try to represent
> source level types.

My point here is that the frontend reasons about two things: 1) a source level construct of a type, and 2) LLVM IR types.   The LLVM IR type lowering is not guaranteed cover all fields in the source type (e.g. in the case of unions).

Let me give you a dumb example.  Consider:

union x {
  struct { char b;  int c; } a;
  short b;
} u;

On my system, Clang codegen's this to:

%union.x = type { %struct.anon }
%struct.anon = type { i8, i32 }

This isn't a safe IR type to use to describe a memcpy (because it wouldn't copy all of "b"), so implementing your proposal would requiring implementing yet-another conversion from AST types to LLVM types that *is* guaranteed to cover all the fields.

Instead of implementing this, it would be a lot easier for clang to walk a type and produce a mask describing all the holes in a type, using a simple recursive algorithm (where union intersects the member "hole sets", finding that byte 3/4 of the union is a hole).

Given this, it makes a lot more sense to explicitly model this hole set in an MDNode (e.g. by using a list of byte ranges?) instead of representing the holes with a null pointer constant of some IR type.

Does this make sense?

-Chris