[llvm-dev] Aligned vector spills and variably sized stack frames
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 28 16:23:38 PDT 2015
----- Original Message -----
> From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, August 28, 2015 6:00:50 PM
> Subject: [llvm-dev] Aligned vector spills and variably sized stack frames
> I've run into a problem that I'm trying to figure out how to address
> would welcome ideas and feedback.
> Today, the vectorizer will nicely vectorize loops using the widest
> vector type for the target. On a reasonable recent machine, this
> often end up using AVX2 registers which are 32 bytes wide.
> If during register allocation, we decide to spill one of these
> registers, we use the vmovaps instruction which requires the address
> memory accessed to be 32 byte aligned. So far, so good.
> However, the C ABI generally only provides 16 bytes of alignment for
> stack on entry to the function. To work around this, the backend
> create a variable sized frame with a dynamic amount of padding
> if required to ensure that a 32 byte aligned spill slot is available.
> The problem I have is that my runtime's ABI really doesn't like
> sized frames. In particular, the assumption that stack frames are
> size - except during prolog and epilogue - is fairly baked in.
> I'm weighing a couple of options for addressing this and want to
> feedback on the perceived difficulty of each. If someone has another
> approach, I'm also very open to that.
> Option 1 - Fix my runtime to not expect mostly fixed size frames.
> isn't a small change to make, but given it's a strictly internal ABI,
> can probably get away with doing it. Given things like
> are coming down the pipe, it might also have secondary benefits.
> However, this is a relatively risky change to make for a fairly
> corner case.
> Option 1a - I could change my ABI to use a 32 byte aligned frame.
> has many of the same problems as (1).
> Option 2 - Don't compile things which need to spill vector registers.
> This is actually what we do today and has worked out fairly well in
> practice. This is what I'm hoping to move away from.
> Option 3 - Add an option in the x86 backend to not require aligned
> slots for AVX2 registers. In particular, the VMOVUPS instruction can
> used to spill vector registers into an 8 or 16 byte aligned spill
> and not require dynamic frame realignment. This seems like it might
> useful in other context as well, but I can't name any at the moment.
> One thing that occurs to me is that many spills are down rare paths.
> Maybe it would make sense to only do dynamic alignment for hot
> spill/reloads? We could then simply override the heustic to always
> unaligned spills.
> I don't really have a sense for how hard (3) would be to implement.
> Anyone have an intuition?
I suspect that implementing this would not be too difficult. There are essentially two things that need to be changed:
1. Change the code in X86InstrInfo::storeRegToStackSlot / X86InstrInfo::loadRegFromStackSlot to do the right thing for underaligned stack slots (or, in general, under the control of some target feature, option, etc.) [specifically, you need to change the code in those functions to pass false to the isStackAligned parameter of getStoreRegOpcode and getLoadRegOpcode].
2. The alignment necessary for register spills is generically specified in the target's *RegisterInfo.td file.(it's the third parameter of the RegisterClass TableGen type). You'd need to specify a way to override that based on some target feature, option, etc. if one does not already exist.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev