[llvm-dev] Matrix Multiplication not Vectorized using double pointers
Fabian Giesen via llvm-dev
llvm-dev at lists.llvm.org
Sun Oct 20 16:23:44 PDT 2019
Compile with optimization remarks on ("-Rpass-analysis=.*", or
"-Rpass-missed=.*") to get some optimization remarks for why the
optimizer is or is not handling certain constructs. (Warning, this is a
lot, and it will often be hard to decode if you don't already know how
the relevant optimization passes work.)
There are several problems here, but zooming out, there are just way too
many pointers here that could all potentially alias. Proving statically
that none of these pointers alias is easier said than done, especially
when variable-sized arrays are involved. Even if you could handle that
within a function (not trivial), the next problem after that is crossing
function boundaries (if there's code between allocation and use), and
that gets messy really quick.
If you have 2D arrays, then if at all possible, allocating them
contiguously and use a single base pointer to access all elements of a
given array. That's a form the loop analysis passes know how to deal
with and it's going to go better, although you will probably still need
some annotations to promise the compiler there's no aliasing. (Easier
said than done; you can flag the result pointer as __restrict, but let's
just say that this path is fraught with peril.)
After writing the whole thing using single base pointers, manual 2D
array access and, I can get Clang 9.0.0 to vectorize the loop (for x86
at -march=skylake which has cheap enough gathers), but it's still a very
direct translation of the original code, which is not what you actually
want.
Efficient vectorization of matrix multiplies is a well-studied problem,
but that doesn't make it easy. Turning the basic three nested loops into
a form that achieves high performance requires _many_ different
transform steps. (Among other things, almost certainly turning the three
nested loops into at least five, to process the data in small blocks,
getting rid of the gathers and turning them into in-register
transpositions.)
Getting state-of-the-art loop optimizations into LLVM has been an
ongoing project for many years (-> "Polly", though that seems to have
gotten quiet recently?). I believe LLVM+Polly would do a lot better on
this particular example, but it's not enabled (or compiled in) for
normal compiler releases.
-Fabian
On 10/20/2019 12:31 AM, hameeza ahmed via llvm-dev wrote:
> Hello,
> My matrix multiplication code has variables allocated via double
> pointers on heap. The code is not getting vectorized. Following is the code.
>
> int **buffer_A = (int **)malloc(vecsize * sizeof(int *));
> int **buffer_B = (int **)malloc(vecsize * sizeof(int *));
> for(p = 0; p < vecsize; p++)
> {
> buffer_A[p] = (int *)malloc(vecsize * sizeof(int));
> }
> for(p = 0; p < vecsize; p++)
> {
> buffer_B[p] = (int *)malloc(vecsize * sizeof(int));
> }
> long i, j, k;
> int **result = (int **)malloc(vecsize * sizeof(int *));
> for(p = 0; p < vecsize; p++)
> {
> result[p] = (int *)malloc(vecsize * sizeof(int));
> }
> for(i = 0; i < vecsize; i++)
> {
> for(j = 0; j < vecsize; j++)
> {
> result[i][j]=0;
> for(k = 0; k < vecsize; k++)
> {
> result[i][j] += buffer_A[i][k] * buffer_B[k][j];
> }
> }
> }
> what is the issue? how to resolve it?
>
> Thank You
>
> Regards
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
More information about the llvm-dev
mailing list