[LLVMdev] Proposal to improve vzeroupper optimization strategy

Thu Sep 19 12:15:51 PDT 2013

Great! Glad to see you are working on this.

On Thu, Sep 19, 2013 at 3:04 PM, Manny Ko <Manny.Ko at imgtec.com> wrote:

> Great idea.  I reported on this problem before and glad to see someone
> trying to tackle this.
>
> cheers.
>
> ________________________________________
> From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf
> of Gao, Yunzhong [yunzhong_gao at playstation.sony.com]
> Sent: Thursday, September 19, 2013 11:53 AM
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] Proposal to improve vzeroupper optimization strategy
>
> Hi all,
>
> I would like to make a proposal about changing the optimization strategy
> regarding when to insert a vzeroupper instruction in the x86 backend.
>
> Current implementation:
> vzeroupper is inserted to any functions that use AVX instructions. The
> insertion points are:
> 1) before a call instruction;
> 2) before a return instruction;
>
> Rationale:
> vzeroupper is an AVX instruction; it is inserted to avoid performance
> penalty
> when switching between x86 AVX mode and SSE mode, e.g., when an AVX
> function
> calls a SSE function.
>
> My proposal:
> Default to not insert vzeroupper instruction unless a function is using
> legacy
> SSE instructions. By a legacy SSE instruction, I mean any vector
> instructions
> that do not have a v- prefix, write XMM register but not YMM register. If a
> legacy SSE instruction is spotted, then insert a vzeroupper instruction:
> 1) before a call instruction;
> 2) before a return instruction;
>
> Explanation:
> If all application and libraries are compiled with the same toolchain, then
> with this proposal, a function can assume that incoming AVX registers have
> their top 128 bits either specified or zeroed. Assuming that legacy SSE
> instructions will be seldom generated, it should be rare to have to emit
> vzeroupper instructions, which is a slow instruction by itself.
>
> Possible problem:
> This proposal is biased towards the situation when all applications and
> libraries are compiled with the same toolchain. If it is common case to
> mix and
> match applications built with different toolchains, this approach might
> lead to
> situations when a vzeroupper instruction is missing when calling from a
> LLVM-compiled AVX function to a foreign-compiled SSE function, hence a
> transition penalty. One possible solution around this issue is to add a
> function attribute which specifies whether the caller and callee have the
> same architecture. e.g.,
> extern int foo __attribute__((nolegacy));
> would declare an external function that does not use legacy SSE
> instruction.
>
> Any thoughts?
> - Gao.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130919/d56ca066/attachment.html>