[llvm-commits] [PATCH][ARM] Vext Lowering was missing opportunities

Thu Nov 1 09:53:24 PDT 2012

Thanks!

It should be better now (note I have remove a few other tabs in that file).

Quentin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ARMVextLowering.patch
Type: application/octet-stream
Size: 4517 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121101/06f2c624/attachment.obj>
-------------- next part --------------

On Nov 1, 2012, at 5:11 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> Hi Quentin,
> 
> On Thu, Nov 1, 2012 at 6:55 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>> +    if (V2->getOpcode() == ISD::UNDEF &&
>> +	isSingletonVEXTMask(ShuffleMask, VT, Imm)) {
>> +      return DAG.getNode(ARMISD::VEXT, dl, VT, V1, V1,
>> +			 DAG.getConstant(Imm, MVT::i32));
>> +    }
> 
> No tabs in sources, please.
> 
> Dmitri
> 
> -- 
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/

On Oct 31, 2012, at 9:55 PM, Quentin Colombet <qcolombet at apple.com> wrote:

> Hi,
> 
> When working with ARM neon intrinsics, I have encountered a hole in how vext is mapped into the assembly.
> Basically, something like this:
> 
> vext a, b, imm
> 
> is perfectly lowered, whereas something like this:
> 
> vext a, a, imm
> 
> generates more or less bad code (depending on the type of a, 128bits or 64bits) and definitely not the expected vext instruction.
> 
> The short story (long story at the very end of the mail for people interested in) is that the attached patch fix that and you can find the new test cases in the patch.
> 
> Cheers,
> 
> Quentin
> 
> ----------
> The long story
> 
> ARM doc: vext instruction extracts elements from the bottom end of the second operand vector and the top end of the first, concatenates them and places the result in the destination vector.
> 
> Now, the problem when writing something like:
> 
> vext a, a, imm
> 
> clang translates that into a shufflevector instruction with a sequence of integer (I simplify a bit) representing in which order each element of both operands should appear in the result vector.
> Assuming 'a' has 8 elements, they would be numbered from 0 to 7 for the first operand and from 8 to 15 for the second operand.
> For this kind of vector, the sequence of integer has the following pattern: the i+1th element equals the ith+1 (e.g. 2, 3, 4, 5). 
> This is the pattern that is matched in ARMISelLowering.
> 
> However, when both operands are the same, an instruction combine optimization (visitShuffleVectorInst) during the late emit pass breaks this pattern.
> It transforms  a, a into a, undef and updates the sequence of integer accordingly i.e. all integers point to the first operand (e.g. 2, 3, 4, 5 => 2, 3, 0, 1).
> This pattern was not recognized by as VEXT.
> 
> Note that vrev and vext (with an undef argument) are equivalent for some patterns. Thus, I placed the new pattern matching after vrev matching instead of directly after vext with 2 arguments to not change exiting output, in particular in vrev.ll tests.
> <ARMVextLowering.patch>