[PATCH][AArch64] updated patches with initial implementation of Neon scalar instructions

Kevin Qin kevinqindev at gmail.com
Tue Sep 17 23:07:19 PDT 2013

Hi Ana,

I test your patch on trunk, but have 1 failure in regression test.  Can you
make a check of your patch?

FAIL: Clang :: CodeGen/aarch64-neon-intrinsics.c (2056 of 15433)
******************** TEST 'Clang :: CodeGen/aarch64-neon-intrinsics.c'
FAILED ********************
/home/kevin/llvm_trunk/build/bin/./clang -cc1 -internal-isystem
/home/kevin/llvm_trunk/build/bin/../lib/clang/3.4/include -triple
aarch64-none-linux-gnu -target-feature +neon    -ffp-contract=fast -S -O2
-o -
| FileCheck
Exit Code: 1

Command Output (stderr):
llvm::DAGTypeLegalizer::PromoteIntOp_BUILD_VECTOR(llvm::SDNode*): Assertion
`!(NumElts & 1) && "Legal vector of one illegal element?"' failed.

2013/9/18 Ana Pazos <apazos at codeaurora.org>

> Hi folks,
> I have rebased my patches now that dependent pending patches are merged.
> I also have made these additional changes:
> 1) Adopted the v1ix and v1if solution.
>    I will revisit it when the "global instruction selection" is in place.
>   Tim, can you talk more about this upcoming LLVM change?
>         a) Will it still be SelectionDAG based?
>         b) How having whole function knowledge will help me distinguish
> when
> to create Integer and scalar Neon operations without adding the v1x and v1f
> types?
> 2) Introduced a new operator OP_SCALAR_ALIAS to allow creating AArch64
> scalar intrinsics that are alias to legacy ARM intrinisics.
>    Example:
> __ai int64_t vaddd_s64(int64_t __a, int64_t __b) {
>   return (int64_t)vadd_s64((int64x1_t)__a, (int64x1_t)__b); }
> Note that even with this change, the AArch64 intrinisc vaddd_s64 will NOT
> generate "add d0, d1, d0" but the optimized code "add x0, x1, x0" because
> of
> the castings to in64_t.
> I experimented with compiling the aarch64-neon-intrinsics.c with -O0
> instead
> of -O3, but instruction combining pass still makes this optimization.
> So we are really dependent on the compiler optimizations here.
> But note that directly calling ARM legacy intrinsic vadd_s64 produces "add
> d0, d1, d0", since the inputs are v1i64 type and I have the proper
> instruction selection pattern defined.
> 3) Got rid of int_aarch64_sisd_add(u,s)64 and int_aarch64_sisd_add(u,s)64
> intrinsics, as a side-effect of implementing (2).
> Removing these intrinsics we cannot guarantee vaddd_(s,u)64 and
> vsubd_(s,u)64 will produce "add/sub d0, d1, d0".
> I am allowing these intrinsics to generate Integer code, which is the best
> implementation of these intrinsics, as Tim pointed out.
> I updated the tests accordingly.
> 4) Used FMOV instead of UMOV to move registers from Neon/integer units when
> possible
> For types of size 32 and 64 I tried to make use of FMOV instructions. For
> types of size 8 and 16, I make use of the UMOV instructions.
> Let me know if you have any more comments on these patches.
> Thanks,
> Ana.
> -----Original Message-----
> From: Tim Northover [mailto:t.p.northover at gmail.com]
> Sent: Friday, September 13, 2013 2:02 AM
> To: Kevin Qin
> Cc: Ana Pazos; rajav at codeaurora.org; llvm-commits; cfe-commits at cs.uiuc.edu
> Subject: Re: [PATCH][AArch64]RE: patches with initial implementation of
> Neon
> scalar instructions
> Hi Kevin,
> > From my perspective, DAG should only hold operations with value type,
> > but not a certain register class. Which register class to be used is
> > decided by compiler after some cost calculation. If we bind v1i32 and
> > v1i64 to FPR, then it's hard for compiler to make this optimization.
> In an ideal world, I completely agree. Unfortunately the SelectionDAG
> infrastructure just doesn't make these choices intelligently. It looks at
> each node in isolation and chooses an instruction based on the types
> involved. If there were two "(add i64:$Rn, i64:$Rm)" patterns then only one
> of them would ever match.
> I view this v1iN nonsense as an unfortunate but necessary temporary
> measure,
> until we get our global instruction selection.
> I think the only way you could get LLVM to produce both an "add x, x, x"
> and
> an "add d, d, d" from sensible IR without it would be a separate
> (MachineInstr) pass which goes through afterwards and patches things up.
> The number of actually duplicated instructions is small enough that this
> might be practical, but it would have its own ugliness even if it worked
> flawlessly (why v1i8, v1i16 but i32 and i64? There's a good reason, but
> it's
> not pretty).
> I'm not implacably opposed to the approach, but I think you'd find
> implementing it quite a bit of work. Basically, the main thing I want to
> avoid is an int_aarch64_sisd_add intrinsic. That seems like it's the worst
> of all possible worlds.
> Cheers.
> Tim.
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130918/a37a1186/attachment.html>

More information about the cfe-commits mailing list