[llvm-dev] Matrix Multiplication not Vectorized using double pointers
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Tue Apr 21 11:33:22 PDT 2020
On 10/20/19 6:23 PM, Fabian Giesen via llvm-dev wrote:
> Compile with optimization remarks on ("-Rpass-analysis=.*", or
"-Rpass-missed=.*") to get some optimization remarks for why the
optimizer is or is not handling certain constructs. (Warning, this is a
lot, and it will often be hard to decode if you don't already know how
the relevant optimization passes work.)
>
> There are several problems here, but zooming out, there are just way
too many pointers here that could all potentially alias. Proving
statically that none of these pointers alias is easier said than done,
especially when variable-sized arrays are involved. Even if you could
handle that within a function (not trivial), the next problem after that
is crossing function boundaries (if there's code between allocation and
use), and that gets messy really quick.
>
> If you have 2D arrays, then if at all possible, allocating them
contiguously and use a single base pointer to access all elements of a
given array. That's a form the loop analysis passes know how to deal
with and it's going to go better, although you will probably still need
some annotations to promise the compiler there's no aliasing. (Easier
said than done; you can flag the result pointer as __restrict, but let's
just say that this path is fraught with peril.)
>
> After writing the whole thing using single base pointers, manual 2D
array access and, I can get Clang 9.0.0 to vectorize the loop (for x86
at -march=skylake which has cheap enough gathers), but it's still a very
direct translation of the original code, which is not what you actually
want.
>
> Efficient vectorization of matrix multiplies is a well-studied
problem, but that doesn't make it easy. Turning the basic three nested
loops into a form that achieves high performance requires _many_
different transform steps. (Among other things, almost certainly turning
the three nested loops into at least five, to process the data in small
blocks, getting rid of the gathers and turning them into in-register
transpositions.)
>
> Getting state-of-the-art loop optimizations into LLVM has been an
ongoing project for many years (-> "Polly", though that seems to have
gotten quiet recently?). I believe LLVM+Polly would do a lot better on
this particular example, but it's not enabled (or compiled in) for
normal compiler releases.
FWIW, Polly performs pattern matching for matrix multiplications (only)
and replaces them with pretty good code. Without that the tiling Polly
applies is usually the most beneficial factor for code like this. That
said, the code below, with the indirection and missing local restrict
keywords* in the 3D loop, will not be optimized by Polly.
* Support to actually use local restrict keywords is still under
development in LLVM.
Cheers,
Johannes
> -Fabian
>
> On 10/20/2019 12:31 AM, hameeza ahmed via llvm-dev wrote:
>> Hello,
>> My matrix multiplication code has variables allocated via double
pointers on heap. The code is not getting vectorized. Following is the code.
>>
>> int **buffer_A = (int **)malloc(vecsize * sizeof(int *));
>> int **buffer_B = (int **)malloc(vecsize * sizeof(int *));
>> for(p = 0; p < vecsize; p++)
>> {
>> buffer_A[p] = (int *)malloc(vecsize * sizeof(int));
>> }
>> for(p = 0; p < vecsize; p++)
>> {
>> buffer_B[p] = (int *)malloc(vecsize * sizeof(int));
>> }
>> long i, j, k;
>> int **result = (int **)malloc(vecsize * sizeof(int *));
>> for(p = 0; p < vecsize; p++)
>> {
>> result[p] = (int *)malloc(vecsize * sizeof(int));
>> }
>> for(i = 0; i < vecsize; i++)
>> {
>> for(j = 0; j < vecsize; j++)
>> {
>> result[i][j]=0;
>> for(k = 0; k < vecsize; k++)
>> {
>> result[i][j] += buffer_A[i][k] * buffer_B[k][j];
>> }
>> }
>> }
>> what is the issue? how to resolve it?
>>
>> Thank You
>>
>> Regards
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list