[PATCH] D48193: [LoopVectorizer] Use an interleave count of 1 when using a vector library call

Mon Jul 2 09:49:10 PDT 2018

rob.lougher added a comment.

In https://reviews.llvm.org/D48193#1134868, @craig.topper wrote:

> This patch tries to fix improve the SVML calling convention. https://reviews.llvm.org/D47188 Maybe it will help this code?

Hi Craig,

Thanks for the link.

I had an idea for an alternative approach before sending the patch. This involved changing the register usage calculation to record the number of live values at the point of a call.  Then, if we knew how many vector registers are preserved across the call, we could estimate register pressure, and potentially allow interleaving.  For example, if there was 1 live value at the call, and 4 registers are preserved, we could allow an IC of 4 (1*4 means there would be 4 live registers after interleaving minus 1 for the value dead after the call).

The problem was finding out how many registers are preserved.  The TargetTransformInfo pass exposes codegen information to IR-level passes.  So for example, it provides to the loop vectorizer the number of registers (scalar or vector).  However, this is just a simple target specific number.  In contrast, the number of preserved registers depends on the calling-convention/word-size/instruction-set, etc.  A vector-library call could use any calling-convention, and as far as I can see, there's nothing to prevent anybody from providing an SVML-like library on say, ARM, so this would also need to be implemented for all targets.

Of course, this sort of information is needed by the register allocator.  TargetRegisterInfo provides an interface to find out information about the target registers, e.g. getCalleeSavedRegs() and getCallPreservedMask().  The TargetRegisterInfo is normally obtained via the subtarget attached to the machine function (this subtarget is created during codegen prepare).  However, the TargetTransformInfo also has a subtarget object (as part of the TTIImpl), which means the TargetRegisterInfo could potentially be queried by the loop-vectorizer.  However, both getCalleeSavedRegs() and getCallPreservedMask() take a MachineFunction pointer (which obviously doesn't exist when the loop-vectorizer is ran).  The functions are also much more low-level than we require (we would need to convert the return into a number, based on register class, etc.).

https://reviews.llvm.org/D47188 is interesting for two reasons.  Firstly it provides an explicit calling-convention for SVML.  Secondly, only an X86 implementation of the calling-convention is provided.  So, if this were to land, implementing the number of preserved registers becomes trivial.  However, on re-reading the previous comments, there's a reluctance to only handle vector-library calls (unfortunately handling all calls is problematic, as an intrinsic call may end up as a sequence of instructions, or it may be lowered back into scalar calls to libm).  Also, is it true that the SVML framework is X86-only?

So I'm still rather stuck as to what to do for the preserved registers.  I don't want to go to the fuss of providing a full implementation for every target (duplicating the logic in getCalleeSavedRegs/getCallPreservedMask) if it isn't necessary.  If people think the approach outlined above sounds promising, I could provide an initial patch that just handled the default CCs on X86?

Thanks,
Rob.

Repository:
  rL LLVM

https://reviews.llvm.org/D48193