[llvm] r192908 - [AArch64] Add support for NEON scalar three register different instruction

Mon Oct 21 13:30:10 PDT 2013

Hi Chad,

> I'm not sure I follow
> your point about replacing the mla/mls with "(vqadd accum, (vqdmull lhs,
> rhs))."  Are you suggesting a combine?  Please expound on your point when
> you have a moment.  Thanks again for the feedback.

Looking at the ARM ARM pseudocode, I think these operations basically do:

   tmp1 = saturate(2*lhs*rhs)
   res = saturate(accum + tmp1)

If so, then we already have an intrinsic for both of these operations
(vqdmull and vqadd/vqsub respectively) and it would probably be more
efficient if Clang produced them separately and LLVM matched them
rather than a specific vqdmlal/vqdmlsl intrinsic which can't possibly
be optimised by generic code.

You should be able to write a pattern directly rather than faffing
around with a Combine. Something like (though I've not checked the
types compile or anything):

def : Pat<(int_arm_neon_vqadd (v1i32 accum:$Rd), (v1i32
(int_arm_neon_vqdmull (v1i16 lhs:$Rn), (v1i16 rhs:$Rm)))), (SQDMLALL
FPR32:$Rd, FPR16:$Rn, FPR16:$Rm)>;

Cheers.

Tim.