[PATCH] Aarch64 Neon ACLE scalar instrinsic name mangling with BHSD suffix

Thu Aug 22 10:46:10 PDT 2013

Hi Kevin,

 

Thanks, it should work too. We do that in EmitIntrinsic and genTargetTest. 

 

One more request:

Can you clarify on the naming convention for Scalar Reduce Pairwise (FMAXP,
FMAXNMP, FMINP, FMINNMP) intrinsic names. Maybe ARM Ltd. has updated names.

 

Gcc's arm_neon.h adds a 'q' character to the intrinsic name.

 

But another instruction from the same instruction class, FADDP, does not
have the additional 'q' in its intrinsic name. That is weird since the
prototypes are the same:

 

float64_t vpmaxnmqd_f64 (float64x2_t a)

float64_t vpmaxqd_f64 (float64x2_t a)

float64_t vpaddd_f64 (float64x2_t a)

 

If the 'q' is needed, then we need to further change the patch to generate
the correct name. This can be done now or when we check in the Scalar Reduce
Pairwise implementation.

 

You can reproduce the issue by adding these declarations:

def SCALAR_FMAXP : SInst<"vpmax", "sd", "SfSQd">;

def SCALAR_FMAXNMP : SInst<"vpmaxnm", "sd", "SfSQd">;

 

Thanks,

Ana.

 

From: Kevin Qin [mailto:kevinqindev at gmail.com] 
Sent: Wednesday, August 21, 2013 10:51 PM
To: Ana Pazos
Cc: Joey Gouly; Jiangning Liu; Hao Liu; cfe-commits at cs.uiuc.edu;
mcrosier at codeaurora.org
Subject: Re: [PATCH] Aarch64 Neon ACLE scalar instrinsic name mangling with
BHSD suffix

 

Hi Ana,

 

Thanks for your feedback. I can reproduce this problem and fix it in
attached patch. My solution is record both intrinsic name and its prototypes
in map. So only a definition with same name and same prototypes will be
treated as redefinition. I think this solution is more reasonable and fit
more scenarios. I have tested this patch by comparing arm_neon.h,
arm_neon.inc before and after this patch. They are all the same. Also I test
defining SInst<"vqadd", "sss", "ScSsSiSlSUcSUsSUiSUl">. There is no compile
error now.

 

2013/8/22 Ana Pazos <apazos at codeaurora.org>

Hi Kevin,

 

I tried to use your patch for defining the Scalar Arithmetic and and Scalar
Reduce Pairwise intrinsics I am working on.

 

I encountered an issue with Scalar Saturating Add which I defined as

 

def SCALAR_QADD   : SInst<"vqadd", "sss", "ScSsSiSlSUcSUsSUiSUl"> 

 

and when enabling tools/clang/lib/Headers/Makefile to build the
auto-generated headers arm_neon.h., arm_neon_test.h and arm_neon_sema.h.

 

Because I am reusing the intrinsic name "vqadd", the ARM v7 legacy builtins
associated with this same intrinsic name ended up not being added to AArch64
headers as they should.

 

Below is one possible solution (simply do not add AArch64 Scalar intrinsics
to the AArch64 Map structure; changed arm_neon.td and NeonEmitter.cpp).

 

See if you can reproduce the issue, and if you agree with the change below,
please add it to your patch.

diff --git a/utils/TableGen/NeonEmitter.cpp b/utils/TableGen/NeonEmitter.cpp

index 8924278..fece66a 100644

--- a/utils/TableGen/NeonEmitter.cpp

+++ b/utils/TableGen/NeonEmitter.cpp

@@ -2396,6 +2396,8 @@ void NeonEmitter::runHeader(raw_ostream &OS) {

   std::vector<Record *> RV = Records.getAllDerivedDefinitions("Inst");

 

   // build a map of AArch64 intriniscs to be used in uniqueness checks.

+  // The map does not include AArch64 scalar intrinsics because their name

+  // mangling and non-overloaded builtins prevent name conflict.

   StringMap<ClassKind> A64IntrinsicMap;

   for (unsigned i = 0, e = RV.size(); i != e; ++i) {

     Record *R = RV[i];

@@ -2404,6 +2406,10 @@ void NeonEmitter::runHeader(raw_ostream &OS) {

     if (!isA64)

       continue;

 

+       bool isScalar = R->getValueAsBit("isScalar");

+    if (isScalar)

+      continue;

+

 

diff --git a/include/clang/Basic/arm_neon.td
b/include/clang/Basic/arm_neon.td

index 6918f0a..0a98320 100644

--- a/include/clang/Basic/arm_neon.td

+++ b/include/clang/Basic/arm_neon.td

@@ -79,6 +79,7 @@ class Inst <string n, string p, string t, Op o> {

   bit isShift = 0;

   bit isVCVT_N = 0;

   bit isA64 = 0;

+  bit isScalar = 0;

 

   // Certain intrinsics have different names than their representative

   // instructions. This field allows us to handle this correctly when we

@@ -564,15 +565,46 @@ def SHLL_HIGH_N    : SInst<"vshll_high_n", "ndi",
"HcHsHiHUcHUsHUi"

// Converting vectors

def VMOVL_HIGH   : SInst<"vmovl_high", "nd", "HcHsHiHUcHUsHUi">;

 

+}

 

+let isA64 = 1, isScalar = 1 in {

////////////////////////////////////////////////////////////////////////////
////

// Scalar Arithmetic

 

 

Thanks,

Ana.

 

 

Some more background on the issue.

 

Currently we allow defining AArch64 intrinsics with the same name as ARM
intrinsics.

Usually this happens when the AArch64 intrinsic can handle more types than
the ARM intrinsic. It is seen as a superset of ARM intrinsics.

So the Prototype string might differ, but they have the same Type string,
e.g., "ddd", and Name string.

 

Example:

def VADD    : IOpInst<"vadd", "ddd", "csilfUcUsUiUlQcQsQiQlQfQUcQUsQUiQUl",
OP_ADD>; --> used by ARM NEON

...

let isA64 = 1 in {

def ADD : IOpInst<"vadd", "ddd", "csilfUcUsUiUlQcQsQiQlQfQUcQUsQUiQUlQd",
OP_ADD>; --> with additional Qd type, used by AArch64 NEON

...

}

 

When generating the intrinsics definitions and builtin definitions we check
for such conflicts using Maps structures.

We only include ARM NEON legacy intrinsics and builtin definitions into
AArch64 headers/tests if AArch64 does not redefine them. Otherwise the
definitions marked as isA64 prevail (they are the superset).

 

Now we have a new situation with scalar intrinsics whose Type strings will
be different.

For example, Scalar Saturating Add, which requires a different Types string
"sss".

It would be nice to define it as: SInst<"vqadd", "sss",
"ScSsSiSlSUcSUsSUiSUl">. But it will fail to compile.

There are a couple of possible solutions.

This can be address in your patch or in the next one by whoever needs this
fix.

 

From: Kevin Qin [mailto:kevinqindev at gmail.com] 

Sent: Tuesday, August 20, 2013 7:02 PM
To: Ana Pazos

Cc: Joey Gouly; Jiangning Liu; Hao Liu; llvm-commits at cs.uiuc.edu;
cfe-commits at cs.uiuc.edu


Subject: Re: [PATCH] Aarch64 Neon ACLE scalar instrinsic name mangling with
BHSD suffix

 

Hi Ana,

 

Sorry for that mistake. I merged and tested my patch on our daily updated
internal repo. So it may have latency with truck and just missed Hao's patch
at the time I merged my patch. Here is the patch rebased on truck now.
Please try again. Thanks.

 

2013/8/21 Ana Pazos <apazos at codeaurora.org>

Hi Kevin,

 

I am trying your patch now.

The first thing I noticed is that the patch does not merge cleanly.

It seems you do not have Hao's previous clang commit
(http://llvm.org/viewvc/llvm-project?view=revision
<http://llvm.org/viewvc/llvm-project?view=revision&revision=188452>
&revision=188452 -Clang and AArch64 backend patches to support shll/shl and
vmovl instructions and ACLE function).

Can you rebase and repost your patch?

 

Thanks,

Ana.

 

 

From: cfe-commits-bounces at cs.uiuc.edu
[mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Kevin Qin
Sent: Monday, August 19, 2013 2:43 AM
To: Joey Gouly
Cc: llvm-commits at cs.uiuc.edu; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH] Aarch64 Neon ACLE scalar instrinsic name mangling with
BHSD suffix

 

Hi Joey,

 

   Thanks a lot for your suggestions. I improved my patch as your good
advice. I wish to implement some of ACLE intrinsic based on my patch as
test, but the full implementation of one ACLE intrinsic need to commit on
both llvm and clang. More important thing is,  all scalar ACLE intrinsic are
classified to different tables, and these tables are implemented in parallel
but none of them is accomplished.  There are complex dependence among
intrinsic and may be rewrite frequently.  The main purpose posting this
patch at moment is to decrease overlap on such base module and make our
parallel development more efficient.

 

Best Regards,

Kevin Qin

 

2013/8/16 Joey Gouly <joey.gouly at arm.com>

Hi Kevin,

Clang patches should be sent to cfe-commits. I cc'd them in for you.

You could test this patch by including an ACLE function that you have
implemented, that way we can see it works. Or if you are going to submit
those soon, maybe it can go in without a test for now. Let's see what others
think about that.

Just two minor points.

> +static std::string Insert_BHSD_Suffix(StringRef typestr){

This should just return a 'char'.

> +/// Insert proper 'b' 'h' 's' 'd' behind function name if prefix 'S' is
used.
Can you change that to something like
  Insert proper 'b', 'h', 's', 'd' suffix if 'S' prefix is used.

Thanks

From: llvm-commits-bounces at cs.uiuc.edu
[mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Kevin Qin
Sent: 16 August 2013 10:46
To: llvm-commits at cs.uiuc.edu
Subject: [PATCH] Aarch64 Neon ACLE scalar instrinsic name mangling with BHSD
suffix


This patch is used to add 'bhsd' suffix in Neon ACLE scalar function name.
1. A new type prefix 'S' is defined in
tools/clang/include/clang/Basic/arm_neon.td, which is used to mark whether
enable 'bhsd' suffix mangle.
2. If prefix 'S' is found before data type, proper suffix will be added
according below rule:
Data Type  Suffix
  i8 -> b
  i16 -> h
  i32 f32 -> s
  i64 f64 -> d
 
For example,
If we define a new ACLE function in arm_neon.td like:
 
def FABD : SInst<"vabd", "sss",  "ScSsSiSlSfSd">
 
Then, rebuild llvm. We would see below in
INSTALL_PATH/lib/clang/3.4/include/arm_neon.h
 
__ai int8_t vabdb_s8(int8_t __a, int8_t __b) {
  return (int8_t)__builtin_neon_vabdb_s8(__a, __b); }
__ai int16_t vabdh_s16(int16_t __a, int16_t __b) {
  return (int16_t)__builtin_neon_vabdh_s8(__a, __b); }
.
 
Proper suffix is inserted in both ACLE intrinsics and builtin functions.
 
Because this patch only works on llvm building time, so I have no idea on
writing test case for this patch. Fortunately, this patch won't effect on
any already defined ACLE functions, for its function depending on the new
defined prefix 'S'. So It is safe for current system and is developed to
help implement new scalar ACLE intrinsics in parallel.
 
You can see each scalar ACLE function corresponds to an unique builtin
function. At beginning, I planned to let series of ACLE functions with the
same semantics reuse one builtin function. But this would introduce extra
unnecessary convert expression in IR, for different data types have
different data width. If I promote all data types to i64 and use single
builtin function, then there will be only an i64 version builtin function
existing in IR. Though I can add extra arg in builtin function to record the
original data type, but extra data convert IR(sign extension before call and
truncation after call) still exists. This will increase the difficulty in
coding lower patterns in backend.
 





 

-- 

Best Regards,

 

Kevin Qin





 

-- 

Best Regards,

 

Kevin Qin





 

-- 

Best Regards,

 

Kevin Qin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130822/3849f01d/attachment.html>