<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Verdana" size="2"><span style="font-size:10pt;">
<div><font color="purple">Hi Andrew,</font></div>
<div><font color="purple">The issue below refers to LSR, so I’ll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.</font></div>
<div><font color="purple">Thanks, Anat </font></div>
<div><font face="Calibri" size="2" color="purple"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2" color="purple"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Tahoma">_____________________________________________<br>
<b>From:</b> Shemer, Anat <br>
<b>Sent:</b> Tuesday, February 18, 2014 15:07<br>
<b>To:</b> 'llvmdev@cs.uiuc.edu'<br>
<b>Subject:</b> better code for IV</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Hi,</div>
<div> </div>
<div>I will be glad if you can give me some advice on how to make LLVM generate better code for IV in a simple loop case. There is IR sample below, I know of one way to do it, and I’m looking for advice on another way.</div>
<div> </div>
<div>When compiling a simple C loop (c[i]=a[i]+b[i], a, b, c are float*), the IR starts as follows:</div>
<div> </div>
<div><font face="Courier New"> define void @ArrayAdd1(float* nocapture %a, float* nocapture %b, float* nocapture %c, i64 %iNumElements) {</font></div>
<div><font face="Courier New"> Entry:</font></div>
<div><font face="Courier New"> br label %L_pre_head</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_pre_head: ; preds = %Entry</font></div>
<div><font face="Courier New"> br label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_entry: ; preds = %L_entry, %L_pre_head</font></div>
<div><font face="Courier New"> %L_ind_var = phi i64 [ 0, %L_pre_head ], [ %L_inc_ind_var, %L_entry ]</font></div>
<div><font face="Courier New"> %L_tid = phi i64 [ 0, %L_pre_head ], [ %L_inc_tid, %L_entry ]</font></div>
<div><font face="Courier New"> %trunc = trunc i64 %L_tid to i32</font></div>
<div><font face="Courier New"> %idxprom = sext i32 %trunc to i64</font></div>
<div><font face="Courier New"> %arrayidx = getelementptr inbounds float* %a, i64 %idxprom</font></div>
<div><font face="Courier New"> %0 = load float* %arrayidx, align 4</font></div>
<div><font face="Courier New"> %arrayidx2 = getelementptr inbounds float* %b, i64 %idxprom</font></div>
<div><font face="Courier New"> %1 = load float* %arrayidx2, align 4</font></div>
<div><font face="Courier New"> %add = fadd float %0, %1</font></div>
<div><font face="Courier New"> %arrayidx4 = getelementptr inbounds float* %c, i64 %idxprom</font></div>
<div><font face="Courier New"> store float %add, float* %arrayidx4, align 4</font></div>
<div><font face="Courier New"> %L_inc_ind_var = add nuw nsw i64 %L_ind_var, 1</font></div>
<div><font face="Courier New"> %L_cmp.to.max = icmp eq i64 %L_inc_ind_var, %iNumElements</font></div>
<div><font face="Courier New"> %L_inc_tid = add nuw nsw i64 %L_tid, 1</font></div>
<div><font face="Courier New"> br i1 %L_cmp.to.max, label %L_exit, label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_exit: ; preds = %L_entry</font></div>
<div><font face="Courier New"> br label %2</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> ; <label>:2 ; preds = %L_exit</font></div>
<div><font face="Courier New"> ret void</font></div>
<div><font face="Courier New"> }</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>And after going through all passes before code generation it becomes:</div>
<div> </div>
<div><font face="Courier New"> define void @ArrayAdd1(float * nocapture %a, float * nocapture %b, float * nocapture %c, i64 %iNumElements)</font></div>
<div><font face="Courier New"> {</font></div>
<div><font face="Courier New"> Entry:</font></div>
<div><font face="Courier New"> br label %L_pre_head</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_pre_head: ; preds = %Entry</font></div>
<div><font face="Courier New"> br label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_entry: ; preds = %L_entry, %L_pre_head</font></div>
<div><font face="Courier New"> %L_ind_var = phi i64 [ 0, %L_pre_head ], [ %L_inc_ind_var, %L_entry ]</font></div>
<div><font face="Courier New"> %L_tid = phi i64 [ 0, %L_pre_head ], [ %L_inc_tid, %L_entry ]</font></div>
<div><font face="Courier New"> %sext = shl i64 %L_tid, 32</font></div>
<div><font face="Courier New"> %idxprom = ashr exact i64 %sext, 32</font></div>
<div><font face="Courier New"> %arrayidx = getelementptr inbounds float * %a, i64 %idxprom</font></div>
<div><font face="Courier New"> %0 = load float * %arrayidx, align 4</font></div>
<div><font face="Courier New"> %arrayidx2 = getelementptr inbounds float * %b, i64 %idxprom</font></div>
<div><font face="Courier New"> %1 = load float * %arrayidx2, align 4</font></div>
<div><font face="Courier New"> %add = fadd float %0, %1</font></div>
<div><font face="Courier New"> %arrayidx4 = getelementptr inbounds float * %c, i64 %idxprom</font></div>
<div><font face="Courier New"> store float %add, float * %arrayidx4, align 4</font></div>
<div><font face="Courier New"> %L_inc_ind_var = add nuw nsw i64 %L_ind_var, 1</font></div>
<div><font face="Courier New"> %L_cmp.to.max = icmp eq i64 %L_inc_ind_var, %iNumElements</font></div>
<div><font face="Courier New"> %L_inc_tid = add nuw nsw i64 %L_tid, 1</font></div>
<div><font face="Courier New"> br i1 %L_cmp.to.max, label %L_exit, label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_exit: ; preds = %L_entry</font></div>
<div><font face="Courier New"> br label %2</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> ; <label>:2 ; preds = %L_exit</font></div>
<div><font face="Courier New"> ret void</font></div>
<div><font face="Courier New"> }</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>On the way to code gen the IR goes through some additional passes, among them the LSR in LoopStrengthReduce.cpp. This pass promotes the shl operation out of the loop. As a result the generated code is awkward and uses 4 registers for IV management instead
of 2 (see transformed IR and result asm below). </div>
<div> </div>
<div>One way to deal with this is to prevent the transformation of </div>
<div><font face="Courier New"> %trunc = trunc i64 %L_tid to i32</font></div>
<div><font face="Courier New"> %idxprom = sext i32 %trunc to i64</font></div>
<div>to</div>
<div><font face="Courier New"> %sext = shl i64 %L_tid, 32</font></div>
<div><font face="Courier New"> %idxprom = ashr exact i64 %sext, 32</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Sext(trunk()) remains nice until code gen and one asm instruction of the sort “move %eax, %rax” is generated, as I want (see asm sample at the end of this message). I implemented this locally in InstCombineCasts.cpp, in InstCombiner::visitSExt(). The code
looks for the specific pattern of sext32to64(trunk64to32()) and doesn’t let it become ashr(shl()).</div>
<div> </div>
<div>Another way that I can think of, is to keep ashr32(shl32()) together and then it’s possible to replace both instructions with one asm instruction (a la “move %eax, %rax”) and avoid an additional register for monitoring end of loop. I’m not sure what is
the right point in LSRInstance::LSRInstance to avoid this optimization. </div>
<div> </div>
<div>Do you think that the first way is ok? If you think that this should be done in the second way or a third way I will appreciate your guidance. </div>
<div> </div>
<div>Thanks, Anat</div>
<div> </div>
<div> </div>
<div>The IR after LSR:</div>
<div><font face="Courier New"> define void @ArrayAdd1(float* nocapture %a, float* nocapture %b, float* nocapture %c, i64 %iNumElements) {</font></div>
<div><font face="Courier New"> Entry:</font></div>
<div><font face="Courier New"> br label %L_pre_head</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_pre_head: ; preds = %Entry</font></div>
<div><font face="Courier New"> br label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_entry: ; preds = %L_entry, %L_pre_head</font></div>
<div><font face="Courier New"> %lsr.iv1 = phi i64 [ %lsr.iv.next2, %L_entry ], [ 0, %L_pre_head ]</font></div>
<div><font face="Courier New"> %lsr.iv = phi i64 [ %lsr.iv.next, %L_entry ], [ %iNumElements, %L_pre_head ]</font></div>
<div><font face="Courier New"> %idxprom = ashr exact i64 %lsr.iv1, 32</font></div>
<div><font face="Courier New"> %arrayidx = getelementptr inbounds float* %a, i64 %idxprom</font></div>
<div><font face="Courier New"> %0 = load float* %arrayidx, align 4</font></div>
<div><font face="Courier New"> %arrayidx2 = getelementptr inbounds float* %b, i64 %idxprom</font></div>
<div><font face="Courier New"> %1 = load float* %arrayidx2, align 4</font></div>
<div><font face="Courier New"> %add = fadd float %0, %1</font></div>
<div><font face="Courier New"> %arrayidx4 = getelementptr inbounds float* %c, i64 %idxprom</font></div>
<div><font face="Courier New"> store float %add, float* %arrayidx4, align 4</font></div>
<div><font face="Courier New"> %lsr.iv.next = add i64 %lsr.iv, -1</font></div>
<div><font face="Courier New"> %lsr.iv.next2 = add i64 %lsr.iv1, 4294967296</font></div>
<div><font face="Courier New"> %L_cmp.to.max = icmp eq i64 %lsr.iv.next, 0</font></div>
<div><font face="Courier New"> br i1 %L_cmp.to.max, label %L_exit, label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_exit: ; preds = %L_entry</font></div>
<div><font face="Courier New"> br label %2</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> ; <label>:2 ; preds = %L_exit</font></div>
<div><font face="Courier New"> ret void</font></div>
<div><font face="Courier New"> }</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Asm code:</div>
<div><font face="Courier New"> ArrayAdd1: # @ArrayAdd1</font></div>
<div><font face="Courier New"> .cfi_startproc</font></div>
<div><font face="Courier New"> # BB#0: # %Entry</font></div>
<div><font face="Courier New"> xorl %r9d, %r9d</font></div>
<div><font face="Courier New"> movabsq $4294967296, %r8 # imm = 0x100000000</font></div>
<div><font face="Courier New"> .align 16, 0x90</font></div>
<div><font face="Courier New"> .LBB0_1: # %L_entry</font></div>
<div><font face="Courier New"> # =>This Inner Loop Header: Depth=1</font></div>
<div><font face="Courier New"> movq %r9, %rax</font></div>
<div><font face="Courier New"> sarq $32, %rax</font></div>
<div><font face="Courier New"> movss (%rdi,%rax,4), %xmm0</font></div>
<div><font face="Courier New"> addss (%rsi,%rax,4), %xmm0</font></div>
<div><font face="Courier New"> movss %xmm0, (%rdx,%rax,4)</font></div>
<div><font face="Courier New"> addq %r8, %r9</font></div>
<div><font face="Courier New"> decq %rcx</font></div>
<div><font face="Courier New"> jne .LBB0_1</font></div>
<div><font face="Courier New"> # BB#2:</font></div>
<div><font face="Courier New"> Ret</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>This is what I want to get:</div>
<div><font face="Courier New"> ArrayAdd2: # @ArrayAdd2</font></div>
<div><font face="Courier New"> .cfi_startproc</font></div>
<div><font face="Courier New"> # BB#0: # %Entry</font></div>
<div><font face="Courier New"> xorl %eax, %eax</font></div>
<div><font face="Courier New"> .align 16, 0x90</font></div>
<div><font face="Courier New"> .LBB1_1: # %L_entry</font></div>
<div><font face="Courier New"> # =>This Inner Loop Header: Depth=1</font></div>
<div><font face="Courier New"> movslq %eax, %r8</font></div>
<div><font face="Courier New"> movss (%rdi,%r8,4), %xmm0</font></div>
<div><font face="Courier New"> addss (%rsi,%r8,4), %xmm0</font></div>
<div><font face="Courier New"> movss %xmm0, (%rdx,%r8,4)</font></div>
<div><font face="Courier New"> incq %rax</font></div>
<div><font face="Courier New"> cmpq %rax, %rcx</font></div>
<div><font face="Courier New"> jne .LBB1_1</font></div>
<div><font face="Courier New"> # BB#2:</font></div>
<div><font face="Courier New"> Ret</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
</span></font>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>