<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Verdana" size="2"><span style="font-size:10pt;">
<div>Hi,</div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>I will be glad if you can give me some advice on how to make LLVM generate better code for IV in a simple loop case. There is IR sample below, I know of one way to do it, and I’m looking for advice on another way.</div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>When compiling a simple C loop (c[i]=a[i]+b[i], a, b, c are float*), the IR starts as follows:</div>
<div> </div>
<div><font face="Courier New"> define void @ArrayAdd1(float* nocapture %a, float* nocapture %b, float* nocapture %c, i64 %iNumElements) {</font></div>
<div><font face="Courier New"> Entry:</font></div>
<div><font face="Courier New"> br label %L_pre_head</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_pre_head: ; preds = %Entry</font></div>
<div><font face="Courier New"> br label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_entry: ; preds = %L_entry, %L_pre_head</font></div>
<div><font face="Courier New"> %L_ind_var = phi i64 [ 0, %L_pre_head ], [ %L_inc_ind_var, %L_entry ]</font></div>
<div><font face="Courier New"> %L_tid = phi i64 [ 0, %L_pre_head ], [ %L_inc_tid, %L_entry ]</font></div>
<div><font face="Courier New"> %trunc = trunc i64 %L_tid to i32</font></div>
<div><font face="Courier New"> %idxprom = sext i32 %trunc to i64</font></div>
<div><font face="Courier New"> %arrayidx = getelementptr inbounds float* %a, i64 %idxprom</font></div>
<div><font face="Courier New"> %0 = load float* %arrayidx, align 4</font></div>
<div><font face="Courier New"> %arrayidx2 = getelementptr inbounds float* %b, i64 %idxprom</font></div>
<div><font face="Courier New"> %1 = load float* %arrayidx2, align 4</font></div>
<div><font face="Courier New"> %add = fadd float %0, %1</font></div>
<div><font face="Courier New"> %arrayidx4 = getelementptr inbounds float* %c, i64 %idxprom</font></div>
<div><font face="Courier New"> store float %add, float* %arrayidx4, align 4</font></div>
<div><font face="Courier New"> %L_inc_ind_var = add nuw nsw i64 %L_ind_var, 1</font></div>
<div><font face="Courier New"> %L_cmp.to.max = icmp eq i64 %L_inc_ind_var, %iNumElements</font></div>
<div><font face="Courier New"> %L_inc_tid = add nuw nsw i64 %L_tid, 1</font></div>
<div><font face="Courier New"> br i1 %L_cmp.to.max, label %L_exit, label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_exit: ; preds = %L_entry</font></div>
<div><font face="Courier New"> br label %2</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> ; <label>:2 ; preds = %L_exit</font></div>
<div><font face="Courier New"> ret void</font></div>
<div><font face="Courier New"> }</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>And after going through all passes before code generation it becomes:</div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Courier New"> define void @ArrayAdd1(float * nocapture %a, float * nocapture %b, float * nocapture %c, i64 %iNumElements)</font></div>
<div><font face="Courier New"> {</font></div>
<div><font face="Courier New"> Entry:</font></div>
<div><font face="Courier New"> br label %L_pre_head</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_pre_head: ; preds = %Entry</font></div>
<div><font face="Courier New"> br label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_entry: ; preds = %L_entry, %L_pre_head</font></div>
<div><font face="Courier New"> %L_ind_var = phi i64 [ 0, %L_pre_head ], [ %L_inc_ind_var, %L_entry ]</font></div>
<div><font face="Courier New"> %L_tid = phi i64 [ 0, %L_pre_head ], [ %L_inc_tid, %L_entry ]</font></div>
<div><font face="Courier New"> %sext = shl i64 %L_tid, 32</font></div>
<div><font face="Courier New"> %idxprom = ashr exact i64 %sext, 32</font></div>
<div><font face="Courier New"> %arrayidx = getelementptr inbounds float * %a, i64 %idxprom</font></div>
<div><font face="Courier New"> %0 = load float * %arrayidx, align 4</font></div>
<div><font face="Courier New"> %arrayidx2 = getelementptr inbounds float * %b, i64 %idxprom</font></div>
<div><font face="Courier New"> %1 = load float * %arrayidx2, align 4</font></div>
<div><font face="Courier New"> %add = fadd float %0, %1</font></div>
<div><font face="Courier New"> %arrayidx4 = getelementptr inbounds float * %c, i64 %idxprom</font></div>
<div><font face="Courier New"> store float %add, float * %arrayidx4, align 4</font></div>
<div><font face="Courier New"> %L_inc_ind_var = add nuw nsw i64 %L_ind_var, 1</font></div>
<div><font face="Courier New"> %L_cmp.to.max = icmp eq i64 %L_inc_ind_var, %iNumElements</font></div>
<div><font face="Courier New"> %L_inc_tid = add nuw nsw i64 %L_tid, 1</font></div>
<div><font face="Courier New"> br i1 %L_cmp.to.max, label %L_exit, label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_exit: ; preds = %L_entry</font></div>
<div><font face="Courier New"> br label %2</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> ; <label>:2 ; preds = %L_exit</font></div>
<div><font face="Courier New"> ret void</font></div>
<div><font face="Courier New"> }</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>On the way to code gen the IR goes through some additional passes, among them the LSR in LoopStrengthReduce.cpp. This pass promotes the shl operation out of the loop. As a result the generated code is awkward and uses 4 registers for IV management instead
of 2 (see transformed IR and result asm below). </div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>One way to deal with this is to prevent the transformation of </div>
<div><font face="Courier New"> %trunc = trunc i64 %L_tid to i32</font></div>
<div><font face="Courier New"> %idxprom = sext i32 %trunc to i64</font></div>
<div>to</div>
<div><font face="Courier New"> %sext = shl i64 %L_tid, 32</font></div>
<div><font face="Courier New"> %idxprom = ashr exact i64 %sext, 32</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Sext(trunk()) remains nice until code gen and one asm instruction of the sort “move %eax, %rax” is generated, as I want (see asm sample at the end of this message). I implemented this locally in InstCombineCasts.cpp, in InstCombiner::visitSExt(). The code
looks for the specific pattern of sext32to64(trunk64to32()) and doesn’t let it become ashr(shl()).</div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Another way that I can think of, is to keep ashr32(shl32()) together and then it’s possible to replace both instructions with one asm instruction (a la “move %eax, %rax”) and avoid an additional register for monitoring end of loop. I’m not sure what is
the right point in LSRInstance::LSRInstance to avoid this optimization. </div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Do you think that the first way is ok? If you think that this should be done in the second way or a third way I will appreciate your guidance. </div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Thanks, Anat</div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>The IR after LSR:</div>
<div><font face="Courier New"> define void @ArrayAdd1(float* nocapture %a, float* nocapture %b, float* nocapture %c, i64 %iNumElements) {</font></div>
<div><font face="Courier New"> Entry:</font></div>
<div><font face="Courier New"> br label %L_pre_head</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_pre_head: ; preds = %Entry</font></div>
<div><font face="Courier New"> br label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_entry: ; preds = %L_entry, %L_pre_head</font></div>
<div><font face="Courier New"> %lsr.iv1 = phi i64 [ %lsr.iv.next2, %L_entry ], [ 0, %L_pre_head ]</font></div>
<div><font face="Courier New"> %lsr.iv = phi i64 [ %lsr.iv.next, %L_entry ], [ %iNumElements, %L_pre_head ]</font></div>
<div><font face="Courier New"> %idxprom = ashr exact i64 %lsr.iv1, 32</font></div>
<div><font face="Courier New"> %arrayidx = getelementptr inbounds float* %a, i64 %idxprom</font></div>
<div><font face="Courier New"> %0 = load float* %arrayidx, align 4</font></div>
<div><font face="Courier New"> %arrayidx2 = getelementptr inbounds float* %b, i64 %idxprom</font></div>
<div><font face="Courier New"> %1 = load float* %arrayidx2, align 4</font></div>
<div><font face="Courier New"> %add = fadd float %0, %1</font></div>
<div><font face="Courier New"> %arrayidx4 = getelementptr inbounds float* %c, i64 %idxprom</font></div>
<div><font face="Courier New"> store float %add, float* %arrayidx4, align 4</font></div>
<div><font face="Courier New"> %lsr.iv.next = add i64 %lsr.iv, -1</font></div>
<div><font face="Courier New"> %lsr.iv.next2 = add i64 %lsr.iv1, 4294967296</font></div>
<div><font face="Courier New"> %L_cmp.to.max = icmp eq i64 %lsr.iv.next, 0</font></div>
<div><font face="Courier New"> br i1 %L_cmp.to.max, label %L_exit, label %L_entry</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> L_exit: ; preds = %L_entry</font></div>
<div><font face="Courier New"> br label %2</font></div>
<div><font face="Courier New"> </font></div>
<div><font face="Courier New"> ; <label>:2 ; preds = %L_exit</font></div>
<div><font face="Courier New"> ret void</font></div>
<div><font face="Courier New"> }</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>Asm code:</div>
<div><font face="Courier New"> ArrayAdd1: # @ArrayAdd1</font></div>
<div><font face="Courier New"> .cfi_startproc</font></div>
<div><font face="Courier New"> # BB#0: # %Entry</font></div>
<div><font face="Courier New"> xorl %r9d, %r9d</font></div>
<div><font face="Courier New"> movabsq $4294967296, %r8 # imm = 0x100000000</font></div>
<div><font face="Courier New"> .align 16, 0x90</font></div>
<div><font face="Courier New"> .LBB0_1: # %L_entry</font></div>
<div><font face="Courier New"> # =>This Inner Loop Header: Depth=1</font></div>
<div><font face="Courier New"> movq %r9, %rax</font></div>
<div><font face="Courier New"> sarq $32, %rax</font></div>
<div><font face="Courier New"> movss (%rdi,%rax,4), %xmm0</font></div>
<div><font face="Courier New"> addss (%rsi,%rax,4), %xmm0</font></div>
<div><font face="Courier New"> movss %xmm0, (%rdx,%rax,4)</font></div>
<div><font face="Courier New"> addq %r8, %r9</font></div>
<div><font face="Courier New"> decq %rcx</font></div>
<div><font face="Courier New"> jne .LBB0_1</font></div>
<div><font face="Courier New"> # BB#2:</font></div>
<div><font face="Courier New"> Ret</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div>This is what I want to get:</div>
<div><font face="Courier New"> ArrayAdd2: # @ArrayAdd2</font></div>
<div><font face="Courier New"> .cfi_startproc</font></div>
<div><font face="Courier New"> # BB#0: # %Entry</font></div>
<div><font face="Courier New"> xorl %eax, %eax</font></div>
<div><font face="Courier New"> .align 16, 0x90</font></div>
<div><font face="Courier New"> .LBB1_1: # %L_entry</font></div>
<div><font face="Courier New"> # =>This Inner Loop Header: Depth=1</font></div>
<div><font face="Courier New"> movslq %eax, %r8</font></div>
<div><font face="Courier New"> movss (%rdi,%r8,4), %xmm0</font></div>
<div><font face="Courier New"> addss (%rsi,%r8,4), %xmm0</font></div>
<div><font face="Courier New"> movss %xmm0, (%rdx,%r8,4)</font></div>
<div><font face="Courier New"> incq %rax</font></div>
<div><font face="Courier New"> cmpq %rax, %rcx</font></div>
<div><font face="Courier New"> jne .LBB1_1</font></div>
<div><font face="Courier New"> # BB#2:</font></div>
<div><font face="Courier New"> Ret</font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
</span></font>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>