<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Dec 22, 2021 at 8:43 PM Nuno Lopes <<a href="mailto:nunoplopes@sapo.pt">nunoplopes@sapo.pt</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The non-technical question is whether this matters at all. Do we expect any <br>
perf benefit from improving alias analysis for this case?<br></blockquote><div><br></div><div>I've seen a number of very hot functions in our code where alias analysis was at fault because of this.</div><div>The most impressive issue was in a compression algorithm, and the code was essentially:</div><div><br></div><div>```</div><div>struct Histogram {</div><div> Histogram();<br> int total;<br> int values[256];<br>};<br><br>Histogram DoIt(const int* image, int size) {<br> Histogram histogram;<br> for (int i = 0; i < size; ++i) {<br> ++histogram.values[image[i]];<br> ++histogram.total;<br> }<br> return histogram;<br>}<br></div><div>```</div><div><br class="gmail-Apple-interchange-newline">Because alias analysis is unable to tell that `histogram.total` and `histogram.values[*]` do not alias, the total has to be incremented one by one.<br></div><div>It was easy to fix by manually moving `histogram.total` outside of the loop. And this made compression 1% faster overall, so it does actually matter quite a lot.</div><div>Of course one might argue that this was not optimally written, but it was written like this, and I've seen other cases where it's not as obvious.</div><div>This is the generated code right now:</div><div><br></div><div>```</div><div>0000000000000000 <_Z4DoItPKii>:<br> 0: 48 89 f8 mov %rdi,%rax<br> 3: 85 d2 test %edx,%edx<br> 5: 7e 4d jle 54 <_Z4DoItPKii+0x54><br> 7: 41 89 d0 mov %edx,%r8d<br> a: 83 fa 01 cmp $0x1,%edx<br> d: 75 04 jne 13 <_Z4DoItPKii+0x13><br> f: 31 d2 xor %edx,%edx<br> 11: eb 2f jmp 42 <_Z4DoItPKii+0x42><br> 13: 44 89 c7 mov %r8d,%edi<br> 16: 83 e7 fe and $0xfffffffe,%edi<br> 19: 31 d2 xor %edx,%edx<br> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ; loop (unrolled 2 times)<br> 20: 48 63 0c 96 movslq (%rsi,%rdx,4),%rcx ; image[i]<br> 24: 83 44 88 04 01 addl $0x1,0x4(%rax,%rcx,4) ; <a class="gmail_plusreply" id="plusReplyChip-0">++</a>histogram.values[image[i]]</div><div> 29: 83 00 01 addl $0x1,(%rax) ; <a class="gmail_plusreply" id="plusReplyChip-1">++</a>histogram.total<br> 2c: 48 63 4c 96 04 movslq 0x4(%rsi,%rdx,4),%rcx<br> 31: 83 44 88 04 01 addl $0x1,0x4(%rax,%rcx,4)<br> 36: 83 00 01 addl $0x1,(%rax)<br> 39: 48 83 c2 02 add $0x2,%rdx<br> 3d: 48 39 d7 cmp %rdx,%rdi<br> 40: 75 de jne 20 <_Z4DoItPKii+0x20> ; end loop<br> 42: 41 f6 c0 01 test $0x1,%r8b<br> 46: 74 0c je 54 <_Z4DoItPKii+0x54><br> 48: 48 63 0c 96 movslq (%rsi,%rdx,4),%rcx<br> 4c: 83 44 88 04 01 addl $0x1,0x4(%rax,%rcx,4)<br> 51: 83 00 01 addl $0x1,(%rax)<br> 54: c3 ret <br></div><div>```</div><div><br></div><div>When I let clang emit range information (note: this was with a previous iteration of this proposal, but the results are the same), LLVM can now hoist `histogram.total` out of the loop.</div><div><br></div><div>```</div><div>0000000000000000 <_Z4DoItPKii>:<br> 0: 48 89 f8 mov %rdi,%rax<br> 3: 85 d2 test %edx,%edx<br> 5: 0f 8e 7d 00 00 00 jle 88 <_Z4DoItPKii+0x88><br> b: 41 89 d1 mov %edx,%r9d<br> e: 49 8d 49 ff lea -0x1(%r9),%rcx<br> 12: 45 89 c8 mov %r9d,%r8d<br> 15: 41 83 e0 03 and $0x3,%r8d<br> 19: 48 83 f9 03 cmp $0x3,%rcx<br> 1d: 73 04 jae 23 <_Z4DoItPKii+0x23><br> 1f: 31 c9 xor %ecx,%ecx<br> 21: eb 3d jmp 60 <_Z4DoItPKii+0x60><br> 23: 41 83 e1 fc and $0xfffffffc,%r9d<br> 27: 31 c9 xor %ecx,%ecx<br> 29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) ; loop (unrolled 4 times)<br> 30: 48 63 3c 8e movslq (%rsi,%rcx,4),%rdi ; image[i]<br> 34: 83 44 b8 04 01 addl $0x1,0x4(%rax,%rdi,4) ; <a class="gmail_plusreply" id="gmail-plusReplyChip-0">++</a>histogram.values[image[i]]<br> 39: 48 63 7c 8e 04 movslq 0x4(%rsi,%rcx,4),%rdi <br> 3e: 83 44 b8 04 01 addl $0x1,0x4(%rax,%rdi,4)<br> 43: 48 63 7c 8e 08 movslq 0x8(%rsi,%rcx,4),%rdi<br> 48: 83 44 b8 04 01 addl $0x1,0x4(%rax,%rdi,4)<br> 4d: 48 63 7c 8e 0c movslq 0xc(%rsi,%rcx,4),%rdi<br> 52: 83 44 b8 04 01 addl $0x1,0x4(%rax,%rdi,4)<br> 57: 48 83 c1 04 add $0x4,%rcx<br> 5b: 49 39 c9 cmp %rcx,%r9<br> 5e: 75 d0 jne 30 <_Z4DoItPKii+0x30><br> 60: 44 8b 08 mov (%rax),%r9d<br> 63: 4d 85 c0 test %r8,%r8<br> 66: 74 1a je 82 <_Z4DoItPKii+0x82><br> 68: 48 8d 0c 8e lea (%rsi,%rcx,4),%rcx<br> 6c: 31 f6 xor %esi,%esi<br> 6e: 66 90 xchg %ax,%ax<br> 70: 48 63 3c b1 movslq (%rcx,%rsi,4),%rdi<br> 74: 83 44 b8 04 01 addl $0x1,0x4(%rax,%rdi,4)<br> 79: 48 83 c6 01 add $0x1,%rsi<br> 7d: 49 39 f0 cmp %rsi,%r8<br> 80: 75 ee jne 70 <_Z4DoItPKii+0x70><br> 82: 41 01 d1 add %edx,%r9d<br> 85: 44 89 08 mov %r9d,(%rax) ; histogram.total = iter count<br> 88: c3 ret <br></div><div>```</div><div> </div></div></div>