[llvm-dev] Aligned vector spills and variably sized stack frames
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 28 16:00:50 PDT 2015
I've run into a problem that I'm trying to figure out how to address and
would welcome ideas and feedback.
Today, the vectorizer will nicely vectorize loops using the widest legal
vector type for the target. On a reasonable recent machine, this will
often end up using AVX2 registers which are 32 bytes wide.
If during register allocation, we decide to spill one of these
registers, we use the vmovaps instruction which requires the address in
memory accessed to be 32 byte aligned. So far, so good.
However, the C ABI generally only provides 16 bytes of alignment for the
stack on entry to the function. To work around this, the backend will
create a variable sized frame with a dynamic amount of padding inserted
if required to ensure that a 32 byte aligned spill slot is available.
The problem I have is that my runtime's ABI really doesn't like variably
sized frames. In particular, the assumption that stack frames are fixed
size - except during prolog and epilogue - is fairly baked in.
I'm weighing a couple of options for addressing this and want to gather
feedback on the perceived difficulty of each. If someone has another
approach, I'm also very open to that.
Option 1 - Fix my runtime to not expect mostly fixed size frames. This
isn't a small change to make, but given it's a strictly internal ABI, I
can probably get away with doing it. Given things like shrink-wrapping
are coming down the pipe, it might also have secondary benefits.
However, this is a relatively risky change to make for a fairly corner case.
Option 1a - I could change my ABI to use a 32 byte aligned frame. This
has many of the same problems as (1).
Option 2 - Don't compile things which need to spill vector registers.
This is actually what we do today and has worked out fairly well in
practice. This is what I'm hoping to move away from.
Option 3 - Add an option in the x86 backend to not require aligned spill
slots for AVX2 registers. In particular, the VMOVUPS instruction can be
used to spill vector registers into an 8 or 16 byte aligned spill slot
and not require dynamic frame realignment. This seems like it might be
useful in other context as well, but I can't name any at the moment.
One thing that occurs to me is that many spills are down rare paths.
Maybe it would make sense to only do dynamic alignment for hot
spill/reloads? We could then simply override the heustic to always use
I don't really have a sense for how hard (3) would be to implement.
Anyone have an intuition?
More information about the llvm-dev