[LLVMdev] structs get decomposed when shouldn't

Wed May 2 02:15:17 PDT 2012

Hi Tim,

On 02/05/12 10:51, Tim Northover wrote:
> On Wednesday 02 May 2012 09:12:16 Duncan Sands wrote:
>>> As I can understand, LLVM is trying to decompose datatypes into smaller
>>> components in some circumstances.
>>
>> Can you please explain more what you are referring to here.  LLVM itself
>> shouldn't be changing function parameters or return types unless the
>> function has local (internal) linkage (since in that case ABI requirements
>> don't matter).
>
> This is in the backend of LLVM itself. When converting the LLVM IR to its DAG
> representation prior to selection, CodeGen asks the target to take care of
> function parameters. Unfortunately the only interface it presents for the
> target code to make that decision is a sequence of MVTs: iN, float, double,
> vNiM, vNfM. Structs are split into their component members with no indication
> that they were originally more than that.

yup, front-ends have to take care of more complicated ABI details.  For example
the front-end should currently use "byval" for any (parts of) structs that need
to be passed on the stack, and explicit scalars for struct bits that should go
in registers.

>
> This has affected a couple more people recently (including me):
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048203.html
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-
> Mon-20120326/055577.html
>
> If this interface could be improved, I believe clang simply apply a function
> to its QualType and produce an LLVM type which does the right thing.

I don't think this is possible, for example I doubt you can handle the x86-64
ABI in a context free way.

  Without
> that improvement clang will have to use a context-sensitive model to map the
> whole sequence of arguments.
>
> At least, that's the ARM situation. I'm not sure Ivan's can even be solved
> without an improved interface (well, he could probably co-opt byval pointers
> too, but that's Just Wrong).

I must have missed that discussion, since I don't know what Ivan's problem is.

> This most recent one, I'm not sure about. Whether a struct can be mapped to a
> sane sequence of iN types probably hinges on the various alignment constraints
> and whether an argument can be split between regs and memory. (If a split is
> allowed then you can probably use [N x iM] where the struct has size N*M and
> alignment M (assuming iM has alignment M), otherwise that would be wrong).
>
> And Juhasz David wrote:
>> the problem can be mitigated by using a
>> pointer tagged with byval attribute and catch such an argument in a
>> custom CC function.
>
> That's the approach I've currently adopted for some of my work, but It's
> incomplete for my needs and I'm rather concerned about the performance of what
> does work: unless we reimplement mem2reg in the backend too, it introduces
> what amounts to an argument alloca with associated load/store very late on.

Byval is designed for the situation in which the callee takes the address of
the struct.  Thus it provides a pointer to a block of memory.  However there
is also the situation in which the struct is not addressable (just like a
virtual register) and just needs to have bits of it passed on the stack because
the ABI says so (also like virtual registers: the first ones are passed in
registers, the rest on the stack).  To make this easier, maybe there should be
an "onstack" parameter attribute (kind of the opposite to "inreg"), which says
that an argument should be passed on the stack.  Then you can break your struct
up into bits that should be passed in registers ("inreg" attribute), bits that
should be passed transparently (i.e. not addressably) on the stack ("onstack"
attribute) and bits that should be passed addressably on the stack ("byval").

Ciao, Duncan.