[llvm-dev] MCRegisterClass mandatory vs preferred alignment?

Mon Aug 31 16:25:15 PDT 2015

Would certainly be interesting to perform some benchmarking (llvm-testsuite/spec) to confirm this. I could imagine that a smaller stack footprint improves performance (or at least does not degrade it).

- Matthias

> On Aug 31, 2015, at 4:15 PM, Philip Reames <listmail at philipreames.com> wrote:
> 
> 
> 
> On 08/31/2015 03:59 PM, Matthias Braun wrote:
>> Looks to me like the alignment is specified in tablegen. From Target.td:
>> 
>> class RegisterClass<string namespace, list<ValueType> regTypes, int alignment,
>>                     dag regList, RegAltNameIndex idx = NoRegAltName>
>> 
>> X86RegisterInfo.td:
>> 
>> def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
>>                           256, (sequence "YMM%u", 0, 15)>;
>> def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
>>                           256, (sequence "YMM%u", 0, 31)>;
>> 
>> Seems to be 256bits/32bytes.
> Yeah, don't know how I missed that.  :)
>> 
>> I don't know why the alignment was specified the way it is. My guess would be because memory accesses are faster that way (because they do not cross cache lines for example).
> This is certainly true on older cores, but is actually true on newer ones?  Looking through Agner's instruction tables, it looks like the aligned and unaligned versions are essentially the same on newer intels and amds.
> 
> I was originally imagining that I'd need a custom hook or flag, but would it make sense to just use the unaligned versions if the appropriate feature flag (IsUAMem32Slow) is unset?  This would result in slightly smaller code on newer architectures without (seemingly, I have no direct experience here) a performance hit.
>> 
>> - Matthias
>> 
>>> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment.  This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice.  For instance, on x86 the alignment of %ymm0 is listed as 32, not 1.  Does anyone know why this is?
>>> 
>>> Additionally, where are these alignments actually defined?  I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect.
>>> 
>>> The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment.  (see the previous thread "Aligned vector spills and variably sized stack frames")  The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling.  The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice.
>>> 
>>> Philip
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIC-g&c=eEvniauFctOgLOKGJOplqw&r=owCLIXjMdMpT1E9Ei7smWg&m=4X-tenWKR90yebSZyZtJkCGbxi3lStowT32fRt8hEfE&s=Qo26oxiHUS6bEX8ogW7m8YC9B6KEpzfx06lA7_CzRI8&e= 
>