[PATCH] D67645: [aarch64] add def-pats for dot product
Sebastian Pop via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 18 07:49:18 PDT 2019
sebpop added a comment.
To catch more dot product cases, we need to fix the passes above instruction selection.
I looked at the basic dot product loop:
int dot_product1(char *a, char *b, int sum) {
for (int i = 0; i < 16 * K; i += 1)
sum += a[i] * b[i];
return sum;
}
for different values of K:
- for K = 1, we do generate a dot instruction
- for K = 2, K = 3
- the loop is unrolled
- SLP vectorizes the straight line code with vector factor 32
- type legalization kicks in and destroys the pattern
- we end up generating very poor code
- K >= 4, no unroll, no SLP, no loop vectorization -> scalar byte loop code.
Looks like if we want to catch more dot product patterns, we'll need to fix the SLP and loop vectorizers.
I am also looking at some code that comes from TVM that is a higher level compiler generating code to LLVM IR.
I have seen that there is a missing pattern in interleaved load pass and a missing instruction in arm64: a ld8.
That is an interleaved load for an 8 by 8 byte matrix.
I think we can generate an i16 ld4 and then generate the low/high byte extracts in each lane.
This will simplify the dag on which we do instruction selection and enable generation of the dot product.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D67645/new/
https://reviews.llvm.org/D67645
More information about the llvm-commits
mailing list