[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions

Aaron Ballman aaron at aaronballman.com
Sat Dec 21 09:56:58 PST 2013


On Thu, Dec 19, 2013 at 2:31 PM, Gao, Yunzhong
<yunzhong_gao at playstation.sony.com> wrote:
> Hi all,
>
>
>
> I would like to find out whether anyone will find it useful to add an x86-
>
> specific calling convention for reducing emission of vzeroupper
> instructions.
>
>
>
> Current implementation:
>
> vzeroupper is inserted to any functions that use AVX instructions. The
>
> insertion points are:
>
> 1) before a call instruction;
>
> 2) before a return instruction;
>
>
>
> Background:
>
> vzeroupper is an AVX instruction; it is inserted to avoid performance
> penalty
>
> when transitioning between x86 AVX mode and legacy SSE mode, e.g., when an
>
> AVX function calls a SSE function. However, vzeroupper is a slow
> instruction; it
>
> adds to register pressure and hurts performance for AVX-to-AVX calls.
>
>
>
> My proposal:
>
> 1) (LLVM part) Add an x86-specific calling convention to the LLVM IR which
>
> specifies that an external function will be compiled with AVX support and
> its
>
> function definition does not use any legacy SSE instructions, e.g.,
>
>   declare x86_avxcc i32 @foo()
>
>
>
> 2) (Clang part) Add a function attribute to the clang front-end which
> specifies
>
> this calling convention, e.g.,
>
>   extern int foo() __attribute__((avx));

In general, I'm not too keen on adding more calling conventions unless
there's a really powerful need for one from an ABI perspective. This
sounds more like an optimization than an ABI need. What's more, I
worry (a little bit) about confusion that could be caused with the
__vectorcall calling convention (which we do not currently support,
but will need to at some point for MSVC compatibility).

What should happen with this code?

int foo() __attribute__((avx));

void bar(int (*fp)()) {
  int i = fp();
}

void baz(void) {
  bar(foo);
}

Based on your description, this code is valid, but not as performant
as it could be. The vzeroupper would be inserted before fp() is
called, but there's no incompatibility happening. So I guess this
feels more like a regular function attribute than a calling
convention.

>
> Function definitions in a translation unit compiled with -mavx architecture
> will
>
> implicitly have this attribute.

Can you safely do that? What about code that does uses inline assembly
to use legacy SSE instructions in a TU compiled with -mavx, for
instance?

~Aaron



More information about the llvm-dev mailing list