[PATCH] D10964: [Codegen] Add intrinsics 'hadd*' and corresponding SDNodes for horizontal sum operation.

Tue Jul 7 08:35:02 PDT 2015

Hi James,

Pls find the response below.

Shahid

From: James Molloy [mailto:james at jamesmolloy.co.uk]
Sent: Tuesday, July 07, 2015 7:09 PM
To: reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org; Shahid, Asghar-ahmad; hfinkel at anl.gov; renato.golin at linaro.org; james.molloy at arm.com
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics 'hadd*' and corresponding SDNodes for horizontal sum operation.

Hi,

> Ah, I did not think about it.Instead of restricting it to fast-math I would prefer to have an order such as "add each element of vector, starting from element 0 to n-1, to an accumulated sum which is initialized to zero". Does it make sense?

That would mean you wouldn't be able to lower it using a lg(n)-shuffles algorithm, as that does it in the wrong order. You'd have to use a linear algorithm which would perform quite poorly.
That’s right.

It would also stop horizontal add instructions being used on architectures that support them (I don't know of any that do for FP types - probably for this reason!). I'd probably go with the fast-math version personally.
In that case I would support fast-math version.

>In that case what should be the return type of intrinsic?

llvm_any_ty ?
Initially I thought of this, however did not use because it will allow any other type also. Hence provided the float intrinsic separately.
If that is not a concern I would use llvm_any_ty.

On Tue, 7 Jul 2015 at 11:59 Shahid <Asghar-ahmad.Shahid at amd.com<mailto:Asghar-ahmad.Shahid at amd.com>> wrote:
Hi James,

Thanks for your comments.I will do the needful. Pls see my response below.

Regards,
Shahid


================
Comment at: docs/LangRef.rst:9593
@@ +9592,3 @@
+      declare <4 x integer> @llvm.hadd.v4i32(<4 x integer> %a)
+      declare <4 x float> @llvm.hadd.v4f32(<4 x float> %a)
+
----------------
jmolloy wrote:
> You need to be very explicit about the behaviour of this intrinsic with floating point arguments. What order, if any, does it perform the adds in? If there is no guaranteed order, it can only be used in fast-math mode.
Ah, I did not think about it.Instead of restricting it to fast-math I would prefer to have an order such as "add each element of vector, starting from element 0 to n-1, to an accumulated sum which is initialized to zero". Does it make sense?

================
Comment at: include/llvm/IR/Intrinsics.td:599
@@ +598,3 @@
+// Calculate the horizontal/reduction sum across the elements of input vector.
+def int_hadd_int : Intrinsic<[llvm_anyint_ty], [llvm_anyvector_ty], [IntrNoMem]>;
+def int_hadd_float : Intrinsic<[llvm_anyfloat_ty], [llvm_anyvector_ty], [IntrNoMem]>;
----------------
jmolloy wrote:
> Just having one intrinsic here would be good; there's no need for a separate int and float version.
In that case what should be the return type of intrinsic?

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp:712
@@ -707,1 +711,3 @@
+  case ISD::FHADD:
+    return UnrollHADD(Op);
   default:
----------------
jmolloy wrote:
> Can't you just call ExpandHADD() here? or at least share the unroll and expand code?
Yes, probably I can share.


Repository:
  rL LLVM

http://reviews.llvm.org/D10964




_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150707/dc5d1a70/attachment.html>