[PATCH] D10964: [Codegen] Add intrinsics 'hadd*' and corresponding SDNodes for horizontal sum operation.
Shahid
Asghar-ahmad.Shahid at amd.com
Wed Jul 15 01:49:12 PDT 2015
ashahid added inline comments.
================
Comment at: docs/LangRef.rst:9620
@@ +9619,3 @@
+ call i32 @llvm.hadd.v4i32(<4 x i32> %a)
+
+is equivalent to::
----------------
bruno wrote:
> I don't know if this discussion already happend, but I've been thinking about this and I'm wondering whether we should have a vector result instead of a scalar one; the result in the first element of the vector type and the other elements undef. Then an extractelement follows to get the scalar result.
>
> IMO, this is more natural given the way architectures implement variants of HADD, they usually leave the results on vectors. One advantage of doing this is that we can also use this ISD::HADD while lowering other vector operations (the CTPOP case) and don't have to write a DAGCombine or any other extra logic to recognise the vector back from an extract. I might be biased on one side of the history here though, I'd appreciate hearing the other side :-)
No, this discussion did not happen earlier.
IMO, the scalar version is more natural w.r.t the HADD operation itself and also more canonical. Also, as you mentioned vector version will need an extractelement which may have some performance impact also.
In fact on X86, we need to do a DAGCombine of *ABSDIFF* and *HADD* to generate PSAD instruction which is our main objective for adding the two intrinsics. I feel vector version of HADD will complicate this DAGCombine.
Repository:
rL LLVM
http://reviews.llvm.org/D10964
More information about the llvm-commits
mailing list