[flang-commits] [flang] [flang] Inline hlfir.matmul[_transpose]. (PR #122821)

Tue Jan 14 09:55:26 PST 2025

vzakhari wrote:

> I wonder if we could improve the array accesses order with the hlfir.elemental case in bufferization by re-ordering the loops, but that does not sound trivial.

Basically, we want to get from this:
```
//   DO 1 I = 1, NROWS
//    DO 1 J = 1, NCOLS
//     RES(I,J) = 0
//     DO 1 K = 1, N
//   1  RES(I,J) = RES(I,J) + X(I,K)*Y(K,J)
```
to this:
```
//   DO 1 I = 1, NROWS
//    DO 1 J = 1, NCOLS
//   1 RES(I,J) = 0
//   DO 2 K = 1, N
//    DO 2 J = 1, NCOLS
//     DO 2 I = 1, NROWS
//   2  RES(I,J) = RES(I,J) + X(I,K)*Y(K,J)
```

It seems to me that it is too much to do in hlfir.elemental bufferization :)
The loop distribution and interchange should do it, but not in their current state in LLVM. Maybe MLIR loop opts will be able to do it sometime.

Another point to investigate is what we should do for the parallel execution models.  The straightforward implementation may be parallelized across rows and columns, but the accesses will be sparse for `X` in each thread.

That is why I would like to keep a possibility to inline it both ways and continue experimenting.

https://github.com/llvm/llvm-project/pull/122821