[llvm-dev] [RFC] The future of the va_arg instruction

Mon Aug 14 13:12:03 PDT 2017

On 8/14/2017 2:26 AM, Alex Bradbury wrote:
> On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org> wrote:
>> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:
>>> Option 3: Teach va_arg to handle aggregates
>>>     * In this option, va_arg might reasonably be expected to handle a
>>> struct,
>>>     but would not be expected to have detailed ABI-specific knowledge. e.g.
>>> it
>>>     won't automagically know whether a value of a certain size/type is
>>> passed
>>>     indirectly or not. In a sense, this would put support for aggregates
>>> passed
>>>     as varargs on par with aggregates passed in named arguments.
>>>     * Casting would be necessary in the same cases casting is required
>>> for named args
>>>     * Support for aggregates could be implemented via a new module-level
>>> pass, much like PNaCl.
>>>     * Alternatively, the conversion from the va_arg instruction to
>>>     SelectionDAG could be modified. It might be desirable to convert the
>>> vaarg
>>>     instruction to a number of loads and a new node that is responsible
>>> only for
>>>     manipulating the va_list struct.
>>
>> We could automatically split va_arg on an LLVM struct type into a series of
>> va_arg calls for each of the elements of the struct.  Not sure that actually
>> helps anyone much, though.
>>
>> Anything more requires full type information, which isn't currently encoded
>> into IR; for example, on x86-64, to properly lower va_arg on a struct, you
>> need to figure out whether the struct would be passed in integer registers,
>> floating-point registers, or memory.
> I've been thinking more about this. Firstly, if anyone has insight in
> to any cases where the va_arg instruction actually provides better
> optimisation opportunities, please do share. The va_arg IR instruction
> has been supported in LLVM for over a decade, but Clang doesn't
> generate it for the vast majority of the "top tier" targets. I'm
> trying to determine if it just needs more love, or if perhaps it
> wasn't really the right thing to express at the IR level. Is the main
> motivation of va_arg to allow such argument access to be specified
> concisely in IR, or is there a particular way it makes life easier for
> optimisations or analysis (and if so, which ones and at which point in
> compilation?).

We don't have any optimizations that touch va_arg, as far as I know.  
It's an instruction mostly because it got added when LLVM was first 
written, and nobody has bothered to try to get rid of it.

> va_arg really does three things:
> * Calculates how to load a value of the given type
> * Increments the appropriate fields in the va_list struct
> * Loads a value of the given type
>
> The problem I see is it's fairly difficult to specialise its behaviour
> depending on the target. In one of the many previous threads about ABI
> lowering, I think someone commented that in LLVM it happens both too
> early and too late (in the frontend, and on the SelectionDAG). This
> seems to be the case here, to support targets with a more complex
> va_list struct featuring separate save areas for GPRs and FPRs,
> splitting a va_arg in to multiple operations (one per element of an
> aggregate) doesn't seem like it could work without heroic gymnastics
> in the backend.
>
> Converting the va_arg instruction to a new GETVAARG SelectionDAG node
> plus a series of LOADs seems like it may provide a straight-forward
> path to supporting aggregates on targets that use a pointer for
> va_list. Of course this ends up exposing loads plus offset generation
> in the SelectionDAG, just hiding the va_list increment behind
> GETVAARG. For such an approach to work, you must be able to load the
> given type from a contiguous region of memory, which won't always be
> true for targets with a more complex va_list struct.

Really, IMO, we shouldn't have a va_arg instruction at all, but 
deprecating it is too much work to be worthwhile. :)

If we are going to keep it around, though, we should really do the 
lowering in IR, before we hit SelectionDAG.  Like you explained, it's 
just a bunch of load and store operations, so there isn't any reason to 
wait, and transforming IR is much easier than lowering in SelectionDAG.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project