[PATCH] D10964: [Codegen] Add intrinsics 'hadd*' and corresponding SDNodes for horizontal sum operation.

Shahid Asghar-ahmad.Shahid at amd.com
Wed Jul 15 01:49:12 PDT 2015


ashahid added inline comments.

================
Comment at: docs/LangRef.rst:9620
@@ +9619,3 @@
+    call i32 @llvm.hadd.v4i32(<4 x i32> %a)
+
+is equivalent to::
----------------
bruno wrote:
> I don't know if this discussion already happend, but I've been thinking about this and I'm wondering whether we should have a vector result instead of a scalar one; the result in the first element of the vector type and the other elements undef. Then an extractelement follows to get the scalar result.
> 
> IMO, this is more natural given the way architectures implement variants of HADD, they usually leave the results on vectors. One advantage of doing this is that we can also use this ISD::HADD while lowering other vector operations (the CTPOP case) and don't have to write a DAGCombine or any other extra logic to recognise the vector back from an extract. I might be biased on one side of the history here though, I'd appreciate hearing the other side :-)
No, this discussion did not happen earlier.

IMO, the scalar version is more natural w.r.t the HADD operation itself and also more canonical. Also, as you mentioned vector version will need an extractelement which may have some performance impact also.

In fact on X86, we need to do a DAGCombine of *ABSDIFF* and *HADD* to generate PSAD instruction which is our main objective for adding the two intrinsics. I feel vector version of HADD will complicate this DAGCombine.


Repository:
  rL LLVM

http://reviews.llvm.org/D10964







More information about the llvm-commits mailing list