[cfe-commits] Fix handling of ARM homogenous aggregates

Fri Mar 30 12:58:10 PDT 2012

Hi,

(Forward from cfe-commits, where some backend stuff has come up).

This is an issue I've been thinking about quite a bit recently, and I agree that the biggest problem is the one below:

> * The big thing still missing here is that there is no logic to check how many VFP registers have already been used for other arguments.  When deciding whether to pass an argument as a homogeneous aggregate, one of the criteria is that the entire aggregate has to fit into the remaining unused argument registers, right?

I tend to think that if every front-end has to implement the entire VFP PCS to decide how to pass an HFA, something has gone wrong. So I've come to the conclusion that the real flaw is LLVM not exposing enough information to the target-dependent backend code for it to do the right thing. By the time the target is involved, all that remains of any composite type is:
  * The fields completely separated if it was naturally by value. {float, float} just gives you two "float" parameters for example.
  * i32, the ByValSize and ByValAlign if it was a byval pointer: e.g. "{float, float}* byval".

Even in the first case there's no indication of where a composite type begins and ends. The latter could be bludgeoned to mean "this is an HFA, put it in VFP regs", but it would be unspeakably ugly.

I believe that if the LLVM original Type* pointer is exposed to TargetLowering (perhaps as part of InputArg/OutputArg), then LLVM itself can decide what to do with both Small Structures and HFAs in a sane manner: writing a front-end which adheres to the PCS would be much easier for any source language. The worry is the apparent layering violation by passing a Type* further down. But I'd argue that the TargetLowering functions involved are constructing a DAG from nothing rather than transforming an existing DAG; giving them LLVM source-level information is justifiable.

Given that, the simpler implementation is via byval pointers, but they have some issues with efficiency (phases like ScalarRepl can't get to work replacing getelementptrs with extracts since the implicit alloca happens during DAG construction -- just look at what happens to mips small structs now). With more work, the truly natural equivalence would be possible and a front-end could simply "call void @foo({float, float} %val)" and everything would work.

Of course, while the second approach is nice in isolation, it may not exactly fit in with what other backends do.

Any thoughts?

Tim.

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.