<div dir="ltr">Thank you Michael!<div>This info is very useful!</div><div><br></div><div>Fangqing</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 30, 2021 at 11:05 AM Michael Kruse <<a href="mailto:llvmdev@meinersbur.de">llvmdev@meinersbur.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This kind optimization is done by the LICM pass. Look for<br>
promoteLoopAccessesToScalars in LICM.cpp. However, it requires the<br>
loop ocde to be executed unconditionally (or<br>
isSafeToExecuteUnconditionally). See the justification in the comment<br>
for promoteLoopAccessesToScalars.<br>
<br>
Michael<br>
<br>
Am Fr., 30. Juli 2021 um 12:23 Uhr schrieb Fangqing Du via llvm-dev<br>
<<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>:<br>
><br>
> Dear all,<br>
><br>
> Imagine we have following code:<br>
><br>
> 1 #define ny 10<br>
><br>
> 2 #define Batch_Size 10<br>
><br>
> 3<br>
><br>
> 4 typedef float data_t;<br>
><br>
> 5<br>
><br>
> 6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]);<br>
><br>
> 7<br>
><br>
> 8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size],<br>
><br>
> 9 data_t out[ny][Batch_Size]) {<br>
><br>
> 10<br>
><br>
> 11 data_t max[Batch_Size];<br>
><br>
> 12<br>
><br>
> 13 SA_MAX2:<br>
><br>
> 14 for (int i = 0; i < Batch_Size; i++) {<br>
><br>
> 15 max[i] = 0;<br>
><br>
> 16 SA_MAX1:<br>
><br>
> 17 for (int j = 0; j < ny; j++) {<br>
><br>
> 18 if (l_Z2[j][i] > max[i])<br>
><br>
> 19 max[i] = l_Z2[j][i];<br>
><br>
> 20 }<br>
><br>
> 21 }<br>
><br>
> 22 foo(out, max);<br>
><br>
> 23 }<br>
><br>
> we can see 'max[i]' is an invariant variable to loop 'SA_MAX1', so I want to know which pass can following following transformation/optimization:<br>
><br>
> 1 #define ny 10<br>
><br>
> 2 #define Batch_Size 10<br>
><br>
> 3<br>
><br>
> 4 typedef float data_t;<br>
><br>
> 5<br>
><br>
> 6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]);<br>
><br>
> 7<br>
><br>
> 8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size],<br>
><br>
> 9 data_t out[ny][Batch_Size]) {<br>
><br>
> 10<br>
><br>
> 11 data_t max[Batch_Size];<br>
><br>
> 12<br>
><br>
> 13 SA_MAX2:<br>
><br>
> 14 for (int i = 0; i < Batch_Size; i++) {<br>
><br>
> 15 data_t Max = 0;<br>
><br>
> 16 SA_MAX1:<br>
><br>
> 17 for (int j = 0; j < ny; j++) {<br>
><br>
> 18 if (l_Z2[j][i] > Max)<br>
><br>
> 19 Max = l_Z2[j][i];<br>
><br>
> 20 }<br>
><br>
> 21 max[i] = Max;<br>
><br>
> 22 }<br>
><br>
> 23 foo(out, max);<br>
><br>
> 24 }<br>
><br>
> Which will use a local scalar 'Max' to replace the original 'max[i]', and sink the original write out of the loop 'SA_MAX1'.<br>
><br>
> I did some experiment with godbolt, looks like currently we don't have such kind of optimization.<br>
> <a href="https://godbolt.org/z/9PK3hYvPs" rel="noreferrer" target="_blank">https://godbolt.org/z/9PK3hYvPs</a><br>
><br>
> Do you know which pass can do this? Or it's not necessary for CPU?<br>
><br>
> Thanks,<br>
> Fangqing<br>
> Xilinx Inc.<br>
><br>
><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>