[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions

Tue Jan 7 16:14:11 PST 2014

> -----Original Message-----
> From: aaron.ballman at gmail.com [mailto:aaron.ballman at gmail.com] On
> Behalf Of Aaron Ballman
> Sent: Tuesday, December 24, 2013 7:02 AM
> To: Rafael Espíndola
> Cc: Gao, Yunzhong; cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu);
> LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)
> Subject: Re: [cfe-dev] [LLVMdev] [Proposal] function attribute to reduce
> emission of vzeroupper instructions
> 
> On Tue, Dec 24, 2013 at 7:50 AM, Rafael Espíndola
> <rafael.espindola at gmail.com> wrote:
> >> In general, I'm not too keen on adding more calling conventions
> >> unless there's a really powerful need for one from an ABI
> >> perspective. This sounds more like an optimization than an ABI need.
> >
> > I think that is the case.
> >
> >> What's more, I
> >> worry (a little bit) about confusion that could be caused with the
> >> __vectorcall calling convention (which we do not currently support,
> >> but will need to at some point for MSVC compatibility).
> >
> > What does the __vectorcall does?
> 
> http://msdn.microsoft.com/en-us/library/dn375768.aspx
> 
> It's different than the proposed attribute, but still relates to SIMD instruction
> optimizations.
> 
> >
> >> What should happen with this code?
> >>
> >> int foo() __attribute__((avx));
> >>
> >> void bar(int (*fp)()) {
> >>   int i = fp();
> >> }
> >>
> >> void baz(void) {
> >>   bar(foo);
> >> }
> >>
> >> Based on your description, this code is valid, but not as performant
> >> as it could be. The vzeroupper would be inserted before fp() is
> >> called, but there's no incompatibility happening. So I guess this
> >> feels more like a regular function attribute than a calling
> >> convention.
> >
> > It is not a calling convention. The issue is more if it is a type or a
> > decl attribute. Given that putting the attributes on the function
> > decls is the simplest and should cover most of the cases, I think we
> > can probably start with that and revisit if we still see too many
> > vzeroupper  being inserted. What do you think?
> 
> That seems reasonable to me.
> 
> >
> >>>
> >>> Function definitions in a translation unit compiled with -mavx
> >>> architecture will
> >>>
> >>> implicitly have this attribute.
> >>
> >> Can you safely do that? What about code that does uses inline
> >> assembly to use legacy SSE instructions in a TU compiled with -mavx,
> >> for instance?
> >
> > I think it would take a performance penalty, but I don't expect that
> > to be common.
> 
> Hmm, I was worried about the situation where:
> 
> extern int foo(); // compiled without -mavx
> 
> void bar() {  // compiled in a TU with -mavx
>   ...
>   // no vzeroupper is inserted before the call instruction because it is
>   // implicit due to -mavx
>   foo();
>   ...
> }
> 
> I'm not certain whether this sort of pattern could cause problems or not. If
> there's no way for it to be problematic, then implicitly attaching the attribute
> is reasonable enough. It does mean we're straying farther from the as-
> written attributes for the function, but that's just an unfortunate situation
> we're already in today and wouldn't block this feature.
> 
> ~Aaron

Hi Aaron,
Many thanks for your feedback!

I do not have any opinion right now on how this attribute should interact with
the __vectorcall calling convention. I will need to revisit it later.

Regarding the implicit attachment of this attribute, my intention is to only
imply the avx attribute on function definitions. Since the backend can see what
instructions are being generated in the callee, it should be able to make smart
decisions on whether to emit a vzeroupper before the call instruction. In the
above example, foo() would not implicitly carry the avx attribute because the
compiler sees only its declaration.

- Gao