<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, sans-serif;font-size:10pt" ><div dir="ltr" style="font-family:Arial, Helvetica, sans-serif;font-size:10pt" ><div dir="ltr" >Hi Jeroen,<br><br><font size="2" >Thanks for the analysis.</font></div>
<div dir="ltr" ><br><font size="2" >We think your C++ code can represent the Fortran code for the most part. One crucial difference is that there should not be a restrict qualifier for tgtA0[123]. In Fortran, a variable with the target attribute can be pointed to by any pointers with the same type. Hence, if we comment out the restrict qualifier for tgtA0[123], the load/store instructions stay in the loop body.</font><br><br><font size="2" >We also slightly modify the original code as well as the corresponding C++ code to further illustrate the issue. We add two statements, one before the loop and one after. The statements are marked as "NEW: stmt-[12]" in the attached t1.cpp</font><br><br><font size="2" >We observe that, with the restrict qualifier on tgtA0[123], there is no reload of N.p_1_setA after the loop (</font><i><font size="2" >stmt-2</font></i><font size="2" >). (In Fortran, it is possible that p_1_setA is modified in the loop.)</font><br><br><font size="2" >//-----------------------------------------</font><br><font size="2" > %2 = load i32*, i32** getelementptr inbounds (%"struct.m::N", %"struct.m::N"* @_ZN1m1NE, i64 0, i32 3), align 8, !tbaa !2, !noalias !14</font><br><font size="2" > %3 = load i32, i32* %2, align 4, !tbaa !8, !noalias !14</font><br><font size="2" > %add = add nsw i32 %3, 20</font><br><font size="2" > ...</font><br><br><font size="2" >for.end: ; preds = %for.cond.for.end_crit_edge, %entry</font><br><font size="2" > %add5 = add nsw i32 %3, 50</font><br><font size="2" > store i32 %add5, i32* %2, align 4, !tbaa !8, !noalias !14</font><br><font size="2" > ret void</font><br><font size="2" >//-----------------------------------------</font><br><br><font size="2" >If the example is compiled with -DNO_RESTRICT_QUALIFIER, we see that N.p_1_SetA is reloaded after the loop.</font><br><br><font size="2" >//-----------------------------------------</font><br><font size="2" > %2 = load i32*, i32** getelementptr inbounds (%"struct.m::N", %"struct.m::N"* @_ZN1m1NE, i64 0, i32 3), align 8, !tbaa !2, !noalias !11</font><br><font size="2" > %3 = load i32, i32* %2, align 4, !tbaa !8, !noalias !11</font><br><font size="2" > %add = add nsw i32 %3, 20</font><br><font size="2" > store i32 %add, i32* %2, align 4, !tbaa !8, !noalias !11</font><br><font size="2" > ...</font><br><font size="2" >for.end.loopexit: ; preds = %for.body</font><br><font size="2" > %.pre14 = load i32, i32* %2, align 4, !tbaa !8, !noalias !11</font><br><font size="2" > br label %for.end</font><br><br><font size="2" >for.end: ; preds = %for.end.loopexit, %entry</font><br><font size="2" > %12 = phi i32 [ %.pre14, %for.end.loopexit ], [ %add, %entry ]</font><br><font size="2" > %add5 = add nsw i32 %12, 30</font><br><font size="2" > store i32 %add5, i32* %2, align 4, !tbaa !8, !noalias !11</font><br><font size="2" >//-----------------------------------------</font><br><br><font size="2" >We also attach the Fortran code (t1.f and t1-f.ll) for your reference.</font></div>
<div dir="ltr" ><br>Thanks,<br>Kelvin</div>
<div dir="ltr" > </div>
<div dir="ltr" > </div>
<blockquote data-history-content-modified="1" dir="ltr" style="border-left:solid #aaaaaa 2px; margin-left:5px; padding-left:5px; direction:ltr; margin-right:0px" >----- Original message -----<br>From: Jeroen Dobbelaere <Jeroen.Dobbelaere@synopsys.com><br>To: Kelvin Li <kli@ca.ibm.com>, "llvm-dev@lists.llvm.org" <llvm-dev@lists.llvm.org><br>Cc: Tarique Islam <tislam@ca.ibm.com>, Johannes Doerfert <johannesdoerfert@gmail.com><br>Subject: [EXTERNAL] RE: [AA] Question on the full restrict patch<br>Date: Mon, Nov 30, 2020 11:02<br> <br><!--Notes ACF
<meta http-equiv="Content-Type" content="text/html; charset=utf8" >--> <!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit" >
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div><p style="margin: 0px;" >Hi Kelvin, Tarique,<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >I finally found some time to look into your example in more depth.<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >As Johannes mentioned in one of the LLVM Alias Analysis Technical calls, mapping a 'generic table' that <o:p></o:p></p>
<p style="margin: 0px;" >tracks and provides alias information for any two pointers would not be an efficient way of keeping track of the aliasing<o:p></o:p></p>
<p style="margin: 0px;" >in llvm-ir.<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >Reconstructing the (minimal) necessary scopes and intrinsics for a such a 'generic table' is also not an easy problem<o:p></o:p></p>
<p style="margin: 0px;" >(and not fast to solve as well).<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >I have been wondering on how to tackle the example that you provided. Reconstructing the optimal or a sensible 'noalias' set<o:p></o:p></p>
<p style="margin: 0px;" >by hand did not seem easy. So I went for a different approach: I created a c++ program that should represent as close as possible<o:p></o:p></p>
<p style="margin: 0px;" >what you are describing in fortran (and the resulting llvm-ir code). There is space for improvement here, Fortran seems to be able<o:p></o:p></p>
<p style="margin: 0px;" >to provide extra alias information. But it should be a good start to show how a (hierarchical based) mapping can be done.<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >------- test03.cpp<o:p></o:p></p>
<p style="margin: 0px;" >// COMPILE: ../bin/clang++ -c test03.cpp -S -O3 -ffull-restrict -emit-llvm -mllvm -unroll-max-count=0 -mllvm --interleave-loops=false<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >// Roughly equivalent c++ structure:<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >#define restrict __restrict<o:p></o:p></p>
<p style="margin: 0px;" >namespace m {<o:p></o:p></p>
<p style="margin: 0px;" > struct N {<o:p></o:p></p>
<p style="margin: 0px;" > int* restrict tgtA01;<o:p></o:p></p>
<p style="margin: 0px;" > int* restrict tgtA02;<o:p></o:p></p>
<p style="margin: 0px;" > int* restrict tgtA03;<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > int* p_1_SetA;<o:p></o:p></p>
<p style="margin: 0px;" > int* p_2_SetA;<o:p></o:p></p>
<p style="margin: 0px;" > int* p_3_SetA;<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > // invisible to others ! But not easy to represent in c++, so there is room for improvement here<o:p></o:p></p>
<p style="margin: 0px;" > int tgtB01;<o:p></o:p></p>
<p style="margin: 0px;" > int tgtB02;<o:p></o:p></p>
<p style="margin: 0px;" > int tgtB03;<o:p></o:p></p>
<p style="margin: 0px;" > } N;<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > extern void init_var(int&);<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > void init_ptr() {<o:p></o:p></p>
<p style="margin: 0px;" > N.p_1_SetA = nullptr;<o:p></o:p></p>
<p style="margin: 0px;" > N.p_2_SetA = nullptr;<o:p></o:p></p>
<p style="margin: 0px;" > N.p_3_SetA = nullptr;<o:p></o:p></p>
<p style="margin: 0px;" > }<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > void compute(int & __restrict n) { // the 'n' parameters seems to be passed as a 'i32* noalias'<o:p></o:p></p>
<p style="margin: 0px;" > int idx;<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > init_var(N.tgtB01);<o:p></o:p></p>
<p style="margin: 0px;" > init_var(N.tgtB02);<o:p></o:p></p>
<p style="margin: 0px;" > init_var(N.tgtB03);<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > {<o:p></o:p></p>
<p style="margin: 0px;" > auto &assoc_tgtA01 = *N.tgtA01;<o:p></o:p></p>
<p style="margin: 0px;" > auto &assoc_tgtA02 = *N.tgtA02;<o:p></o:p></p>
<p style="margin: 0px;" > auto &assoc_tgtB02 = N.tgtB02;<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" > for (idx = 1; idx <= n; ++idx) {<o:p></o:p></p>
<p style="margin: 0px;" > assoc_tgtA01 = assoc_tgtA01 * N.tgtB03 + assoc_tgtA02 * (*N.tgtA03) + N.tgtB01 * assoc_tgtB02;<o:p></o:p></p>
<p style="margin: 0px;" > }<o:p></o:p></p>
<p style="margin: 0px;" > }<o:p></o:p></p>
<p style="margin: 0px;" > }<o:p></o:p></p>
<p style="margin: 0px;" >}<o:p></o:p></p>
<p style="margin: 0px;" >------<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >Compiling this code with the full restrict patches results in similar code as you provided (no loads and stores inside the loop).<o:p></o:p></p>
<p style="margin: 0px;" >Notes:<o:p></o:p></p>
<p style="margin: 0px;" >- only two scopes are used:<o:p></o:p></p>
<p style="margin: 0px;" >-- one for 'n' (a local scope)<o:p></o:p></p>
<p style="margin: 0px;" >-- one for 'tgtA01',..,'tgtA03' (aka, the 'unknown scope', as they reside outside the function). <o:p></o:p></p>
<p style="margin: 0px;" > Because those are three different pointers, they are treated as pointing to separate objects,<o:p></o:p></p>
<p style="margin: 0px;" > although they do share the same scope.<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >Greetings,<o:p></o:p></p>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" >Jeroen Dobbelaere<o:p></o:p></p>
<p style="margin: 0px;" ><o:p></o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt" ><div><div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm" ><p style="margin: 0px;" ><b>From:</b> Kelvin Li <kli@ca.ibm.com><br><b>Sent:</b> Monday, November 23, 2020 17:32<br><b>To:</b> Jeroen Dobbelaere <dobbel@synopsys.com>; llvm-dev@lists.llvm.org<br><b>Cc:</b> Tarique Islam <tislam@ca.ibm.com><br><b>Subject:</b> [AA] Question on the full restrict patch<o:p></o:p></p></div></div>
<p style="margin: 0px;" ><o:p> </o:p></p>
<p style="margin: 0px;" ><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Hi Jeroen,</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >In working on a scheme to represent Fortran alias information precisely,</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >we encounter a situation in which two alias sets partially overlap. We would</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >like to seek any idea of representing such alias sets, particular, in the scheme of</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >the full restrict alias approach in D68484.</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Consider the following Fortran code, </span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >module m</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > implicit none</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >!set A: variables *with* the TARGET attribute</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > integer, allocatable, target :: tgtA01, tgtA02, tgtA03</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > integer, pointer :: p_1_SetA, p_2_SetA, p_3_SetA</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >!set B: variables *without* the TARGET attribute</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > integer :: tgtB01, tgtB02, tgtB03</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > contains</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > subroutine init_ptr()</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > p_1_SetA = 0 ; p_2_SetA = 0 ; p_3_SetA = 0</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > end subroutine init_ptr</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > subroutine compute(n)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > integer :: idx, n</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > call init_var(tgtB01)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > call init_var(tgtB02)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > call init_var(tgtB03)</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > associate(assoc_tgtA01 => tgtA01, assoc_tgtA02 => tgtA02, assoc_tgtB02 => tgtB02)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > do idx = 1, n</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > assoc_tgtA01 = assoc_tgtA01 * tgtB03 + assoc_tgtA02 * tgtA03 + tgtB01 * assoc_tgtB02</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > end do</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > end associate</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > end subroutine compute</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >end module</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >The alias analysis in frontend can generate the following alias sets for</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >"tgtA01" and "tgtA02".</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > -------------------------------</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "tgtA01" is aliased to:</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "__m_NMOD__&&_m" "__m_NMOD_init_ptr" "__m_NMOD_compute" "init_var"</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "p_1_SetA" "p_2_SetA" "p_3_SetA"</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "assoc_tgtA01" "tgtA01"</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > -------------------------------</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "tgtA02" is aliased to:</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "__m_NMOD__&&_m" "__m_NMOD_init_ptr" "__m_NMOD_compute" "init_var"</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "p_1_SetA" "p_2_SetA" "p_3_SetA"</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > "assoc_tgtA02" "tgtA02"</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" > -------------------------------</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >If we express this alias info with the scheme that we presented in the last LLVM</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Developers' Meeting (</span><a href="https://youtu.be/Vp4GklTXDys" target="_blank"><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >https://youtu.be/Vp4GklTXDys</span></a><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >), the optimizer manages to</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >move the loop invariant code out of the loop body. We think that the optimizer</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >can make use of the alias information even though there is an overlap in the two</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >alias sets. With the D69542 patch (</span><a href="https://reviews.llvm.org/D69542" target="_blank"><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >https://reviews.llvm.org/D69542</span></a><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >), we would</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >like to know if there is any way to express that with the intrinsics proposed in</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >the full restrict patch.</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >We attach the ll files for reference.</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Note: The options --unroll-max-count=0 --interleave-loops=false</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >-passes=default<O3> -aa-pipeline=default are used in opt.</span><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Attachments</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >0. alias-sets-from-fe.txt - full alias sets generated from Fortran frontend</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >1. before-opt-without-alias-from-fe.ll - IR before opt (NO alias info is emitted)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >2. after-opt-without-alias-from-fe.ll - IR after opt (NO alias info is emitted)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >3. before-opt-with-alias-from-fe.ll - IR before opt (noalias/alias.scope metadata is emitted)</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >4. after-opt-with-alias-from-fe.ll - IR after opt (noalias/alias.scope metadata is emitted)</span><br><br><br><br><br><br><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Thanks,</span><br><span style="font-size:10.0pt;font-family:"Arial",sans-serif" >Kelvin</span><o:p></o:p></p></div></div></blockquote>
<div dir="ltr" > </div></div></div>
<BR>