[PATCH][AArch64]RE: patches with initial implementation of Neon scalar instructions

Thu Sep 12 15:13:58 PDT 2013

Hi Tim and Jiangning,

The patches bring up a couple discussion points:

1)      Type of code generated by ACLE Neon intrinsics

>From what I have experimented with, to guarantee only Neon code is generated
for the ACLE Neon intrisics, you will need to use builtins and translate
those builtins into LLVM intrinsics.

Otherwise you are vulnerable to the compiler capabilities (e.g.,
current/future optimizations, data layout changes) and might not generate
the expected Neon instructions.

If this is not a requirement, than the way we generate tests for ACLE Neon
intrinsics in NeonCodeEmitter needs to be fixed. We cannot auto generate
"//CHECK" strings with the Neon instructions.

2)      Using v1ix and v1fx to represent Neon scalar data types in the
backend. 

This is the important decision we need to make soon.

ARMv8 supports 64, 32, 16 and 8 bit scalar operations in Neon.

I think the compiler should be able to distinguish when to generate Neon
scalar from non-Neon scalar operations.

How to achieve that without defining different data types? Only through
using Neon intrinsics?

Regarding impact on middle end optimizations effectiveness:

This is my understanding. Tim and others, correct me if I got it wrong.

The data layout string defined for AArch64 only contains 32 and 64 as native
types. 

See AArch64TargetInfo::DescriptionString in
tools\clang\lib\Basic\Targets.cpp: n32:64

The middle end uses this data layout information to perform the
optimizations.

Right now it promotes sub-word data types to 32-bit. You can see the
generation of "sext" IR operations when you emit LLVM code. I do not see it
doing sub-word optimizations.

If this data layout is in the future changed to n8:16:32:64 and we use ixx
and fxx for Neon scalar types, we will have more mix of Neon and Non-neon
code, more copy operations between Neon and Non-neon registers which can
have a bad impact on performance.

Hope Tim and the community can give me some more guidance in this area.

Thanks,

Ana.

From: Jiangning Liu [mailto:liujiangning1 at gmail.com] 
Sent: Wednesday, September 11, 2013 11:32 PM
To: Ana Pazos
Cc: Tim Northover; rajav at codeaurora.org; llvm-commits;
cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][AArch64]RE: patches with initial implementation of Neon
scalar instructions

Ana,

I personally think acle functions for neon should be expected to generate
neon instruction, because it would be able to ask compiler to generate
special instructions supporting complex functionality.

The test case given by Tim should be able to still generate "add d0, d1,
d0", if you define vaddd_s64 using vadd_s64, rather than using an IR
intrinsic.

Since most of middle end optimizations are based on scalar data type, if we
use v1ixx instead of ixx, do we have any scenario to lose optimization
opportunities in middle end? Or we don't care about that at all, because
this is being introduced by acle intrinsics. I'm also fine with this
conclusion.

Thanks,

-Jiangning

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130912/898dd5ec/attachment.html>