patches with initial implementation of Neon scalar instructions

Ana Pazos apazos at
Tue Sep 10 23:11:51 PDT 2013

Hello folks,


I prepared a patch with initial implementation of Neon scalar instructions.


The patch was built on top of Kevin’s SIMD copy patch still under review
(UMOV, SMOV, INS instructions). Once his patch is merged, I will update mine
and re-submit it.


In the meanwhile I would like to discuss a couple of implementation
decisions made in the patches before we proceed with implementation of the
other Neon scalar instructions.


I implemented some of the Scalar Arithmetic and Scalar Reduce Pairwise
instructions to illustrate the change. Complete implementation of these
instructions will come in subsequent patches, once we agree on the items


1)      Extended support in the backend for one-element vectors of size <=
64 bits, e.g. v1i8, v1i16, v1i32, v1f32, v1f64, v1i64.

This is a change in the the SelectionDAG (or TableGen) level, by adding new
types to include/llvm/CodeGen/

This is so that we can distinguish code generation for example that uses
float32_t  and float32x1_t.


2)      Associated types v1i8, v1i16, v1i32 with FPR8, FPR16, FPR32 register


I associated these types with the FPR register classes because they are
printed as scalar registers “b”, “h”, “s”, “d” in the Neon scalar


In situations where these types have to be mapped to instructions that use
the vector form “v”, I make use of EXTRACT_SUBREG and SUBREG_TO_REG
operations to construct the patterns.



I can also remove v1i64 from VPR64 register class and associate it with

I have not done so in this patch because it would be a lot of changes at

I have to update the patterns already defined to add EXTRACT_SUBREG and
SUBREG_TO_REG operations.

If we agree this is preferred, I can provide this change as an additional


3)      The ACLE Scalar intrinsics (i.e., C level intrinsics for end-user
defined in arm_neon.h) use Clang builtins with scalar types.

In CGBuiltin the Clang builtins are translated to LLVM IR intrinsics with
v1ix and v1if vector types as input/return types.

It does not look like I can translate these builtins into simple IR
operations in CGBuiltin. 


      Because the Clang builtins use scalar types, if in CGBuiltin I
transform the code below into an IR operation using CreateAdd, the code is
optimized and results in the machine instruction ‘add X0, X1, X0’, which is
not what we want.


     int64_t vaddd_s64(int64_t a, int64_t b) {

       return (int64_t) vadd_s64((int64x1_t(a), int64x1_t(b));  -à becomes
add x0, x1, x0



   But in the td file we can still add the instructions selection patterns
with IR operations that we want.


Let me know what you think about these decisions and if you prefer another
way to handle Neon Scalar instructions and intrinsics.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang-scalar-arith-scalar-reduce-pairwise
Type: application/octet-stream
Size: 24514 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-scalar-arith-scalar-reduce-pairwise-after-UMOV-SMOV-INS
Type: application/octet-stream
Size: 123799 bytes
Desc: not available
URL: <>

More information about the cfe-commits mailing list