[PATCH] D151184: [AArch64] Adjust costs of i1 and/or/xor reductions

Wed May 31 05:49:51 PDT 2023

dmgreen added a comment.

ping

================
Comment at: llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll:20
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.xor.v16i8(<16 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.xor.v32i8(<32 x i8> undef)
----------------
david-arm wrote:
> Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do:
> 
>   ptrue p0.h, vl8
>   eorv h0, p0, z0.h
>   fmov w0, s0
> 
> whereas I see we currently do
> 
>   ext     v1.16b, v0.16b, v0.16b, #8
>   eor     v0.8b, v0.8b, v1.8b
>   fmov    x8, d0
>   eor     x8, x8, x8, lsr #32
>   lsr     x9, x8, #16
>   eor     w0, w8, w9
> 
OK cool.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151184/new/

https://reviews.llvm.org/D151184