[llvm-dev] [RFC] Matrix support (take 2)

Thu Feb 7 19:57:16 PST 2019

On Tue Dec 18 20:45:12 PST 2018, Chris wrote:

> Since layout and padding information is important, it seems most
> logical to put this into the type.  Doing so would make it available
> in all these places.

> That said, I still don’t really understand why you *need* it.

for large vectors and matrices that simply will not fit into the register
file, LD/ST and MV etc. in the form of gather/scatter or vectorised MVX [1]
is the clear and obvious requirement.

however the penalty for use of LD/ST is the power consumption hit of
going through the L1/L2 cache barrier.

for a low-power cost-competitive 3D GPU, for example, a 100% increase in
power consumption due to the penalty of being forced to move data back
and forth multiple times through the L1/L2 cache would be completely
unacceptable.

hence the natural solution, for small vectors and matrices, to be able
to process them *in-place*.

that in turn means having, at the *architectural* level, a way to re-order
the sequence of an otherwise straight linear 1D array of elements.  with
the right re-ordering capability, it even becomes possible to do arbitrary
in-place transposition of the order of elements, such that matrix multiply
may be done *in-place*, without MV operations.

this practice is extremely common in 3D GPUs, as there tend to be a lot
of 3x4 matrices.  ARM MALI actually added a special hard-coded set of
operations just to deal with 3x4 matrix data.

l.

[1] regfile[regfile[rs]] = regfile[rd]