[PATCH] D131118: [LV] Add generic scalarization support for unpredicated scalable vectors

Thu Aug 11 01:02:59 PDT 2022

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll:28
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i16>, <vscale x 4 x i16>* [[TMP7]], align 2
-; CHECK-NEXT:    call void @llvm.masked.scatter.nxv4i16.nxv4p0i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16*> [[BROADCAST_SPLAT]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
-; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
----------------
Hi @reames, we definitely don't want to be doing this for SVE as it will likely hurt performance - scatters are still likely to be better. We made a conscious effort to avoid scalaring this way for SVE because we managed to find other ways of solving the same problems using existing instructions. Also, if we ever encountered a situation where we'd have to scalarise a scalable vector then the performance will likely to be terrible and so we choose to return an Invalid cost from the cost model and skip that VF. We may as well just use NEON or not vectorise, because this is almost certainly better.

What use cases are you trying to solve here? This patch doesn't seem to fix any actual bugs, so I'm assuming this is for performance reasons. It looks like this change only affects one test (`@uniform_store_of_loop_varying`) and I guess the IR in this test is not a common idiom. Have you tested performance before and after to see if this is worthwhile?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131118/new/

https://reviews.llvm.org/D131118