<div dir="ltr">That sounds perfect, thanks. Indeed my pass currently improves performance only for small powers of two, and I'm waiting for the CGP approach to be enabled to make it work for all sizes !<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 7, 2017 at 7:03 PM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Hi Clement -<br></div><br>I started looking at CGP memcmp expansion for x86 more closely yesterday with:<br><a href="https://reviews.llvm.org/D33963" target="_blank">https://reviews.llvm.org/D3396<wbr>3</a><br><br></div><div>And just made another change here:<br><a href="https://reviews.llvm.org/rL304923" rel="noreferrer" target="_blank">https://reviews.llvm.org/rL304<wbr>923</a></div><div><br>This is part of solving:<br><a href="https://bugs.llvm.org/show_bug.cgi?id=33325" target="_blank">https://bugs.llvm.org/show_bug<wbr>.cgi?id=33325</a><br><a href="https://bugs.llvm.org/show_bug.cgi?id=33329" target="_blank">https://bugs.llvm.org/show_bug<wbr>.cgi?id=33329</a><br><br></div>So we want to enable the CGP expansion without regressing the optimal x86 memcmp codegen for the power-of-2 cases that are currently handled by SDAG builder. If this works out, we'll abandon the memcmp SDAG transforms for x86 (and hopefully other targets too) because we'll take care of all memcmp expansion in CGP.<br><br></div>I didn't look closely at your new pass proposal, but I think you'll see bigger improvements once we have the optimal x86 memcmp expansion in place for all sizes.<br><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Wed, Jun 7, 2017 at 10:00 AM, Clement Courbet via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr"><span id="m_2610070616199643685m_3247612252964599782gmail-docs-internal-guid-4704402e-8346-3f61-8049-e8977231128c"><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;white-space:pre-wrap">Hi all,</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;white-space:pre-wrap"><br></span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;white-space:pre-wrap">I'm working on a new pass to optimize comparison chains.</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;font-weight:700;white-space:pre-wrap"><br></span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;font-weight:700;white-space:pre-wrap">Motivation</span><br></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">struct A {</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  bool operator==(const A& o) const { return i == o.i && j == o.j; }</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  uint32 i;</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  uint32 j;</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">};</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">This generates:</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  mov     eax, dword ptr [rdi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  cmp     eax, dword ptr [rsi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  jne     .LBB0_1</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  mov     eax, dword ptr [rdi + 4]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  cmp     eax, dword ptr [rsi + 4]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  sete    al</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  ret</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">.LBB0_1:</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  xor     eax, eax</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  ret</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  mov     rax, qword ptr [rdi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  cmp     rax, qword ptr [rsi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  sete    al</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  ret</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">And thanks to </span><a href="https://reviews.llvm.org/D28637" style="text-decoration-line:none" target="_blank"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">recent improvements</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"> in the memcmp codegen, this can be made to work for all sizes.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;vertical-align:baseline;white-space:pre-wrap">Impact of the change</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">A more detailed description can be found </span><a href="https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit#" style="text-decoration-line:none" target="_blank"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">here</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"> and a proof of concept can be seen </span><a href="https://reviews.llvm.org/D33987" style="text-decoration-line:none" target="_blank"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">here</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Do you see any aspect of this that I may have missed?</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);font-family:Arial;font-size:14.6667px;white-space:pre-wrap">Thanks !</span></p><div><br></div></span></div>
<br></div></div>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div>
</blockquote></div><br></div>