[PATCH] D151184: [AArch64] Adjust costs of i1 and/or/xor reductions

Tue May 23 01:29:08 PDT 2023

david-arm added inline comments.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll:20
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.xor.v16i8(<16 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.xor.v32i8(<32 x i8> undef)
----------------
Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do:

  ptrue p0.h, vl8
  eorv h0, p0, z0.h
  fmov w0, s0

whereas I see we currently do

  ext     v1.16b, v0.16b, v0.16b, #8
  eor     v0.8b, v0.8b, v1.8b
  fmov    x8, d0
  eor     x8, x8, x8, lsr #32
  lsr     x9, x8, #16
  eor     w0, w8, w9

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151184/new/

https://reviews.llvm.org/D151184