[PATCH] D151184: [AArch64] Adjust costs of i1 and/or/xor reductions
    David Sherwood via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue May 23 01:29:08 PDT 2023
    
    
  
david-arm added inline comments.
================
Comment at: llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll:20
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.xor.v16i8(<16 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.xor.v32i8(<32 x i8> undef)
----------------
Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do:
  ptrue p0.h, vl8
  eorv h0, p0, z0.h
  fmov w0, s0
whereas I see we currently do
  ext     v1.16b, v0.16b, v0.16b, #8
  eor     v0.8b, v0.8b, v1.8b
  fmov    x8, d0
  eor     x8, x8, x8, lsr #32
  lsr     x9, x8, #16
  eor     w0, w8, w9
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151184/new/
https://reviews.llvm.org/D151184
    
    
More information about the llvm-commits
mailing list