[PATCH] D116664: [AArch64] Improve codegen for get.active.lane.mask when SVE is available

Wed Jan 5 07:52:35 PST 2022

david-arm created this revision.
david-arm added reviewers: sdesmalen, kmclaughlin, CarolineConcatto, dmgreen.
Herald added subscribers: ctetreau, hiraditya, kristof.beyls, tschuett.
david-arm requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

When lowering the get.active.lane.mask intrinsic with a fixed-width
predicate vector result, we can actually make use of the SVE whilelo
instruction when SVE is enabled. We do this by carefully choosing
a sensible VT for the whilelo instruction, then promoting it to an
integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8
subvector and truncate back to the original return type, i.e. v16i1.
This leads to a significant improvement in code quality. Also, you can
see in tests such as lane_mask_v8i1_i32 that by choosing the right
scalable VT for the whilelo instruction we no longer see the
"xtn v0.8b, v0.8h" instruction. This is because for NEON v8i1 gets
promoted to v8i8, rather than v8i16, and so the natural element type
to choose is i8.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D116664

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/test/CodeGen/AArch64/active_lane_mask.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D116664.397584.patch
Type: text/x-patch
Size: 12502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220105/bbff544e/attachment.bin>