[PATCH] D106239: [AArch64] Expand the SVE min/max reduction costs to NEON

Tue Jul 20 01:23:38 PDT 2021

david-arm added inline comments.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll:190
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 73 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2f32 = call float @llvm.vector.reduce.fmax.v2f32(<2 x float> undef)
----------------
dmgreen wrote:
> david-arm wrote:
> > Hi @dmgreen, something strange is going on for v16f16 here with a cost of 73. I ran llc for this intrinsic and got:
> > 
> >   fmaxnm  v0.8h, v0.8h, v1.8h
> >   fmaxnmv h0, v0.8h
> > 
> > so a cost of 3 inline with umax.v16i16 seems reasonable here.
> Yeah I saw that, It is an odd one. This test is run without fullfp16, so I think the costs of any half min/max should be higher. The original version of this patch didn't include FP and I hadn't noticed when rebasing over the tests.
> 
> I'll looks at correcting that properly.
OK thanks. Yeah I see now. I ran the llc command with "-mattr=+sve", which enabled fullfp16 automatically. That explains the efficient codegen. :)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106239/new/

https://reviews.llvm.org/D106239