Hi folks,<div><br></div><div>I couldn't find a specific XOR (OR and AND) optimization on llvm, and therefore I am about to implement it.</div><div>But first I would like to check with you guys that it really does not exist.</div>
<div><br></div><div>For a simple loop like this:</div><div><br></div><div><div><span class="Apple-tab-span" style="white-space:pre"> </span>nbits = 128;</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>bit_addr = 0;</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>while(nbits--)</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>{<span class="Apple-tab-span" style="white-space:pre"> </span></div>
<div>
<span class="Apple-tab-span" style="white-space:pre"> </span>bindex=bit_addr>>5; /* Index is number /32 */</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>bitnumb=bit_addr % 32; /* Bit number in longword */</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>bitmap[bindex]^=(1L<<bitnumb);</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>bit_addr++;</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>}</div>
</div><div><br></div><div><br></div><div>The -O3 set of optimizations generates a code like this:</div><div><br></div><div><br></div><div><div>entry:</div><div> br label %while.body</div><div><br></div><div>while.body: ; preds = %while.body, %entry</div>
<div> %0 = phi i32 [ 0, %entry ], [ %inc.3, %while.body ]</div><div> %shr = lshr i32 %0, 5</div><div> %rem = and i32 %0, 28</div><div> %shl = shl i32 1, %rem</div><div> %arrayidx = getelementptr inbounds i32* %bitmap, i32 %shr</div>
<div> %tmp6 = load i32* %arrayidx, align 4</div><div> %xor = xor i32 %tmp6, %shl</div><div> %rem.1 = or i32 %rem, 1</div><div> %shl.1 = shl i32 1, %rem.1</div><div> %xor.1 = xor i32 %xor, %shl.1</div><div> %rem.2 = or i32 %rem, 2</div>
<div> %shl.2 = shl i32 1, %rem.2</div><div> %xor.2 = xor i32 %xor.1, %shl.2</div><div> %rem.3 = or i32 %rem, 3</div><div> %shl.3 = shl i32 1, %rem.3</div><div> %xor.3 = xor i32 %xor.2, %shl.3</div><div> store i32 %xor.3, i32* %arrayidx, align 4</div>
<div> %inc.3 = add i32 %0, 4</div><div> %exitcond.3 = icmp eq i32 %inc.3, 128</div><div> br i1 %exitcond.3, label %while.end, label %while.body</div><div><br></div><div>while.end: ; preds = %while.body</div>
<div> ret void</div></div><div><br></div><div><br></div><div><br></div><div>It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs.</div><div>I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct?</div>
<div><br></div><div>If yes, I have a few other questions:</div><div><br></div><div> - Do you know of any other similar optimization that could do something here but is not being triggered for some reason??</div><div> - Do you know why a OR instruction is used for increments? instead of using a INC or ADD?</div>
<div> - Is there a straight forward way to know if an instruction belongs to a loop? (just curiosity)</div><div><br></div><div><br></div><div>Thanks very much</div><div><br></div><div>Daniel Nicacio</div><div><br></div><div>
<br></div><div><br></div>