[llvm-dev] Aligned vector spills and variably sized stack frames

Fri Aug 28 16:00:50 PDT 2015

I've run into a problem that I'm trying to figure out how to address and 
would welcome ideas and feedback.

Today, the vectorizer will nicely vectorize loops using the widest legal 
vector type for the target.  On a reasonable recent machine, this will 
often end up using AVX2 registers which are 32 bytes wide.

If during register allocation, we decide to spill one of these 
registers, we use the vmovaps instruction which requires the address in 
memory accessed to be 32 byte aligned.  So far, so good.

However, the C ABI generally only provides 16 bytes of alignment for the 
stack on entry to the function.  To work around this, the backend will 
create a variable sized frame with a dynamic amount of padding inserted 
if required to ensure that a 32 byte aligned spill slot is available.

The problem I have is that my runtime's ABI really doesn't like variably 
sized frames.  In particular, the assumption that stack frames are fixed 
size - except during prolog and epilogue - is fairly baked in.

I'm weighing a couple of options for addressing this and want to gather 
feedback on the perceived difficulty of each.  If someone has another 
approach, I'm also very open to that.

Option 1 - Fix my runtime to not expect mostly fixed size frames. This 
isn't a small change to make, but given it's a strictly internal ABI, I 
can probably get away with doing it.  Given things like shrink-wrapping 
are coming down the pipe, it might also have secondary benefits.  
However, this is a relatively risky change to make for a fairly corner case.

Option 1a - I could change my ABI to use a 32 byte aligned frame. This 
has many of the same problems as (1).

Option 2 - Don't compile things which need to spill vector registers.  
This is actually what we do today and has worked out fairly well in 
practice.  This is what I'm hoping to move away from.

Option 3 - Add an option in the x86 backend to not require aligned spill 
slots for AVX2 registers.  In particular, the VMOVUPS instruction can be 
used to spill vector registers into an 8 or 16 byte aligned spill slot 
and not require dynamic frame realignment. This seems like it might be 
useful in other context as well, but I can't name any at the moment.

One thing that occurs to me is that many spills are down rare paths.  
Maybe it would make sense to only do dynamic alignment for hot 
spill/reloads?  We could then simply override the heustic to always use 
unaligned spills.

I don't really have a sense for how hard (3) would be to implement. 
Anyone have an intuition?

Philip