[llvm-dev] which pass can do following optimization? gvn-sink?

Michael Kruse via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 30 11:04:34 PDT 2021


This kind optimization is done by the LICM pass. Look for
promoteLoopAccessesToScalars in LICM.cpp. However, it requires the
loop ocde to be executed unconditionally (or
isSafeToExecuteUnconditionally). See the justification in the comment
for promoteLoopAccessesToScalars.

Michael

Am Fr., 30. Juli 2021 um 12:23 Uhr schrieb Fangqing Du via llvm-dev
<llvm-dev at lists.llvm.org>:
>
> Dear all,
>
> Imagine we have following code:
>
>   1 #define ny 10
>
>   2 #define Batch_Size 10
>
>   3
>
>   4 typedef float data_t;
>
>   5
>
>   6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]);
>
>   7
>
>   8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size],
>
>   9                         data_t out[ny][Batch_Size]) {
>
>  10
>
>  11   data_t max[Batch_Size];
>
>  12
>
>  13 SA_MAX2:
>
>  14   for (int i = 0; i < Batch_Size; i++) {
>
>  15     max[i] = 0;
>
>  16   SA_MAX1:
>
>  17     for (int j = 0; j < ny; j++) {
>
>  18       if (l_Z2[j][i] > max[i])
>
>  19         max[i] = l_Z2[j][i];
>
>  20     }
>
>  21   }
>
>  22   foo(out, max);
>
>  23 }
>
> we can see 'max[i]' is an invariant variable to loop 'SA_MAX1', so I want to know which pass can following following transformation/optimization:
>
>   1 #define ny 10
>
>   2 #define Batch_Size 10
>
>   3
>
>   4 typedef float data_t;
>
>   5
>
>   6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]);
>
>   7
>
>   8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size],
>
>   9                         data_t out[ny][Batch_Size]) {
>
>  10
>
>  11   data_t max[Batch_Size];
>
>  12
>
>  13 SA_MAX2:
>
>  14   for (int i = 0; i < Batch_Size; i++) {
>
>  15     data_t Max = 0;
>
>  16   SA_MAX1:
>
>  17     for (int j = 0; j < ny; j++) {
>
>  18       if (l_Z2[j][i] > Max)
>
>  19         Max = l_Z2[j][i];
>
>  20     }
>
>  21     max[i] = Max;
>
>  22   }
>
>  23   foo(out, max);
>
>  24 }
>
> Which will use a local scalar 'Max' to replace the original 'max[i]', and sink the original write out of the loop 'SA_MAX1'.
>
> I did some experiment with godbolt, looks like currently we don't have such kind of optimization.
> https://godbolt.org/z/9PK3hYvPs
>
> Do you know which pass can do this? Or it's not necessary for CPU?
>
> Thanks,
> Fangqing
> Xilinx Inc.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list