[PATCH] D36454: [X86] Changes to extract Horizontal addition operation for AVX-512.

Fri Sep 29 07:47:44 PDT 2017

jbhateja added a comment.

In https://reviews.llvm.org/D36454#884252, @RKSimon wrote:

> I'm not sure about this approach; I don't think most of this needs to be done in the X86 backend at all, and much of it shouldn't even be done in the DAG. Most of the code appears to be better handled in a mixture of SimplifyDemandedVectorElts and SLPVectorizer.
>
> PR33758 was about improving codegen for horizontal reductions, so we'd probably be better off having the backend optimize for @llvm.experimental.vector.reduce.add.* (or the legalized patterns it produces), and then getting the vectorizers to create these properly.

Hi Simon,

Thanks for pointing me to code references, I tired a simple case, which was not optimized by InstCombiner::SimpliefyDemandedVectorElts. 
It works over knownbits mechanism. In fact none of the test cases provided in avx512-hadd-hsub.ll were optimized by InstCombener.

define float @fhsub_16(<16 x float> %x225) {
;define <16 x float> @fhsub_16(<16 x float> %x225) {

  %x226 = shufflevector <16 x float> %x225, <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %x227 = fadd <16 x float> %x225, %x226
  %x228 = shufflevector <16 x float> %x227, <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %x229 = fsub <16 x float> %x227, %x228
  %x230 = extractelement <16 x float> %x229, i32 0
  ret float %x230

}

This path provided two generic routines which try to scale down operation in denominations of X86 vector register sizes, patch https://reviews.llvm.org/D36650 is also suggesting for similar effort.

PR 33758 is specially about infrence of horizontal operations for AVX512 vector types.

Kindly elaborate how add reduction is helful here for cases in provided in testcase avx512-hadd-hsub.ll.

Thanks

https://reviews.llvm.org/D36454