[PATCH] D141693: [AArch64] turn extended vecreduce bigger than v16i8 into udot/sdot

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 23 00:54:12 PST 2023


dmgreen added a comment.

Thanks - the patch looks pretty good to me now. The widths that are a multiple 8, but not of 16 (like 24 and 40), whilst they wont be super common, are not doing as well as they could be. Could they instead general v16 "chunks", until there is an v8 remainder? So split v40 into v16+v16+v8. It should hopefully reduce the amount of shuffling and extra instructions in the test_udot_v24i8_nomla like cases, as well as being less s/udot's in total.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141693/new/

https://reviews.llvm.org/D141693



More information about the llvm-commits mailing list