[PATCH] D141693: [AArch64] turn extended vecreduce bigger than v16i8 into udot/sdot
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 27 04:52:32 PST 2023
dmgreen added a comment.
Thanks. The results are looking better now, if we can clean up the code a little then this looks good to me.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15257
+ // Generate Dot instructions that are multiple of 16.
+ unsigned VecReduce16Num = floor(Op0VT.getVectorNumElements() / 16);
+ SmallVector<SDValue, 4> SDotVec16;
----------------
I don't think this needs floor
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15274
+ // Generate the remainder of the Dot operations that are multiple of 8.
+ for (unsigned I = 0; I < VecReduce8Num; I += 1) {
+ SDValue Zeros = DAG.getConstant(0, DL, MVT::v2i32);
----------------
This can only ever be 0 or 1, so probably doesn't need the loop. Hopefully this can simplify things a little, as we won't need to concat v8 vectors.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15297
+ SDotVec8);
+ // Append Undef vector to v2i32 Dot vectors in order to concatenate them with
+ // v4i32 vectors.
----------------
They would need to be 0's I think. Would it be better and simpler to just return `vecreduce.add(v16s) + vecreduce.add(v8)`?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141693/new/
https://reviews.llvm.org/D141693
More information about the llvm-commits
mailing list