[llvm] r184870 - Fix SROA to avoid unnecessary scalar conversions for 1-element vectors.

Wed Jun 26 11:18:43 PDT 2013

On Jun 26, 2013, at 3:19 AM, Chandler Carruth <chandlerc at google.com> wrote:

> On Tue, Jun 25, 2013 at 12:09 PM, Bob Wilson <bob.wilson at apple.com> wrote:
> Author: bwilson
> Date: Tue Jun 25 14:09:50 2013
> New Revision: 184870
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=184870&view=rev
> Log:
> Fix SROA to avoid unnecessary scalar conversions for 1-element vectors.
> 
> When a 1-element vector alloca is promoted, a store instruction can often be
> rewritten without converting the value to a scalar and using an insertelement
> instruction to stuff it into the new alloca.  This patch just adds a check
> to skip that conversion when it is unnecessary.  This turns out to be really
> important for some ARM Neon operations where <1 x i64> is used to get around
> the fact that i64 is not a legal type.
> 
> I have a few questions about this patch, mostly to better understand what use cases you're attacking.
> 
> First, you mention the <1 x i64> type being important on ARM -- in what precise way is it important? Is it important in arguments and return values for ABI reasons (and to deal with the fact that there isn't an i64 GPR)? Is it important throughout the code? The reason I ask this is that it seems almost like we should teach instcombine to canonicalize single element vectors to be just scalars inside the function body, as that will simplify any number of other aspects of this problem... And if the actual generated code needs to wrap the i64 into a vector to make it legal and efficient, it feels like that should happen in codegen rather than in the middle end. None of that makes me dislike the patch here of course, it still makes perfect sense.

The source code that motivated this is:

#include <arm_neon.h>
uint64x1_t test_vshl_n_u64(uint64x1_t a) {
  return vshl_n_u64(a, 4);
}

ARM's definitions of Neon intrinsics that operate on 64-bit integers use the int64x1_t and uint64x1_t types, which are defined in arm_neon.h.  So, that's one reason that it matters.  The other reason is that code-gen's type legalization will do horrible things to any i64 types in the IR.  For the long term, I agree with you that it would be good to fix that.  We have internally had some discussions about introducing a more relaxed notion of "legal types" that would help, but we're still quite a ways from having anything like that.  In the meantime, it is critical that the <i64 x 1> types for ARM do not get changed into scalars, since that will almost always produce terrible code.

> 
> My second question is if this is really specific to single element vectors. Can you trigger something similar when SROA tries to split a 4 element vector alloca into two 2 element vector allocas if there happen to be some 2 element vector stores? If so, it would be good to add some testing around that.

I suppose you could probably construct a case like that, but offhand it doesn't seem like something that would come up very often.  It doesn't seem to have anything to do with my change, though.  This isn't about splitting up a vector alloca.

> 
> Generally, the vector rewriting in SROA is the least well tested, so the more testing you can do to try to ensure we get this right in all inputs the better... I'm always worried about corner cases slipping through.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130626/50e6a8ae/attachment.html>