[PATCH][AArch64] implement aarch64 neon instruction class AdvSIMD (3 diff)

Mon Aug 26 02:37:33 PDT 2013

Hi Jiangning,

I've just looked at the LLVM patch for now, since the comments may
drastically change the Clang patch.

+  def _8h8b
[...]
+    def _8H

It would be nice to settle on a single naming convention for these
instructions. Personally, I think I prefer the first, but I don't have
a strong opinion either way.

+defm SADDWvvv :  NeonI_3VDW_s<0b0, 0b0001, "saddw", add, 1>;
+defm UADDWvvv :  NeonI_3VDW_u<0b1, 0b0001, "uaddw", add, 1>;
+
+defm SADDW2vvv :  NeonI_3VDW2_s<0b0, 0b0001, "saddw2", add, 1>;
+defm UADDW2vvv :  NeonI_3VDW2_u<0b1, 0b0001, "uaddw2", add, 1>;

I don't think any widening instructions are commutable. The addition
part is, but the widening only happens to the RHS. You can't swap Rn
and Rm on the instructions and get the same result.

+defm ADDHNvvv  : NeonI_3VDN_2Op<0b0, 0b0100, "addhn", int_arm_neon_vaddhn, 1>;
+defm RADDHNvvv : NeonI_3VDN_2Op<0b1, 0b0100, "raddhn",
int_arm_neon_vraddhn, 1>;

Don't these have reasonably simple LLVM IR representations? For example:

define <2 x i32> @addhn(<2 x i64> %lhs, <2 x i64> %rhs) {
  %sum = add <2 x i64> %lhs, %rhs
  %shift = shl <2 x i64> %sum, <i64 32, i64 32>
  %trunc = trunc <2 x i64> %shift to <2 x i32>
  ret <2 x i32> %trunc
}

define <2 x i32> @raddhn(<2 x i64> %lhs, <2 x i64> %rhs) {
  %sum = add <2 x i64> %lhs, %rhs
  %rounded = add <2 x i64> %sum, <i64 0x80000000, i64 0x80000000>
  %shift = shl <2 x i64> %rounded, <i64 32, i64 32>
  %trunc = trunc <2 x i64> %shift to <2 x i32>
  ret <2 x i32> %trunc
}

+defm SMULLvvv :  NeonI_3VDL_2Op<0b0, 0b1100, "smull", int_arm_neon_vmulls, 1>;
+defm UMULLvvv :  NeonI_3VDL_2Op<0b1, 0b1100, "umull", int_arm_neon_vmullu, 1>;

Aren't these even simpler than addhn and friends? An extend followed
by a multiply? They're also always commutable so it probably doesn't
need to be a template parameter (same for sabdl and uabdl).

+defm SQDMLALvvv : NeonI_3VDL_3Op_v2<0b0, 0b1001, "sqdmlal",
+                                    int_arm_neon_vqdmlal>;

The qdmlals are just qdmulls with an extra addition, I think.

Cheers.

Tim.