[llvm-dev] [RFC] Matrix support (take 2)
lkcl via llvm-dev
llvm-dev at lists.llvm.org
Thu Feb 7 19:57:16 PST 2019
On Tue Dec 18 20:45:12 PST 2018, Chris wrote:
> Since layout and padding information is important, it seems most
> logical to put this into the type. Doing so would make it available
> in all these places.
> That said, I still don’t really understand why you *need* it.
for large vectors and matrices that simply will not fit into the register
file, LD/ST and MV etc. in the form of gather/scatter or vectorised MVX 
is the clear and obvious requirement.
however the penalty for use of LD/ST is the power consumption hit of
going through the L1/L2 cache barrier.
for a low-power cost-competitive 3D GPU, for example, a 100% increase in
power consumption due to the penalty of being forced to move data back
and forth multiple times through the L1/L2 cache would be completely
hence the natural solution, for small vectors and matrices, to be able
to process them *in-place*.
that in turn means having, at the *architectural* level, a way to re-order
the sequence of an otherwise straight linear 1D array of elements. with
the right re-ordering capability, it even becomes possible to do arbitrary
in-place transposition of the order of elements, such that matrix multiply
may be done *in-place*, without MV operations.
this practice is extremely common in 3D GPUs, as there tend to be a lot
of 3x4 matrices. ARM MALI actually added a special hard-coded set of
operations just to deal with 3x4 matrix data.
 regfile[regfile[rs]] = regfile[rd]
More information about the llvm-dev