[LLVMdev] Proposal to improve vzeroupper optimization strategy
    Gao, Yunzhong 
    yunzhong_gao at playstation.sony.com
       
    Thu Sep 19 11:53:57 PDT 2013
    
    
  
Hi all,
I would like to make a proposal about changing the optimization strategy
regarding when to insert a vzeroupper instruction in the x86 backend.
Current implementation:
vzeroupper is inserted to any functions that use AVX instructions. The
insertion points are:
1) before a call instruction;
2) before a return instruction;
Rationale:
vzeroupper is an AVX instruction; it is inserted to avoid performance penalty
when switching between x86 AVX mode and SSE mode, e.g., when an AVX function
calls a SSE function.
My proposal:
Default to not insert vzeroupper instruction unless a function is using legacy
SSE instructions. By a legacy SSE instruction, I mean any vector instructions
that do not have a v- prefix, write XMM register but not YMM register. If a
legacy SSE instruction is spotted, then insert a vzeroupper instruction:
1) before a call instruction;
2) before a return instruction;
Explanation:
If all application and libraries are compiled with the same toolchain, then
with this proposal, a function can assume that incoming AVX registers have
their top 128 bits either specified or zeroed. Assuming that legacy SSE
instructions will be seldom generated, it should be rare to have to emit
vzeroupper instructions, which is a slow instruction by itself.
Possible problem:
This proposal is biased towards the situation when all applications and
libraries are compiled with the same toolchain. If it is common case to mix and
match applications built with different toolchains, this approach might lead to
situations when a vzeroupper instruction is missing when calling from a
LLVM-compiled AVX function to a foreign-compiled SSE function, hence a
transition penalty. One possible solution around this issue is to add a
function attribute which specifies whether the caller and callee have the
same architecture. e.g.,
extern int foo __attribute__((nolegacy));
would declare an external function that does not use legacy SSE instruction.
Any thoughts?
- Gao.
    
    
More information about the llvm-dev
mailing list