<div dir="ltr"><span id="gmail-docs-internal-guid-4704402e-8346-3f61-8049-e8977231128c"><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;white-space:pre-wrap">Hi all,</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;white-space:pre-wrap"><br></span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;white-space:pre-wrap">I'm working on a new pass to optimize comparison chains.</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;font-weight:700;white-space:pre-wrap"><br></span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-size:11pt;font-weight:700;white-space:pre-wrap">Motivation</span><br></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">struct A {</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  bool operator==(const A& o) const { return i == o.i && j == o.j; }</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  uint32 i;</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  uint32 j;</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">};</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">This generates:</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  mov     eax, dword ptr [rdi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  cmp     eax, dword ptr [rsi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  jne     .LBB0_1</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  mov     eax, dword ptr [rdi + 4]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  cmp     eax, dword ptr [rsi + 4]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  sete    al</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  ret</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">.LBB0_1:</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  xor     eax, eax</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  ret</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  mov     rax, qword ptr [rdi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  cmp     rax, qword ptr [rsi]</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  sete    al</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10pt;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">  ret</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">And thanks to </span><a href="https://reviews.llvm.org/D28637" style="text-decoration-line:none"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">recent improvements</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"> in the memcmp codegen, this can be made to work for all sizes.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;vertical-align:baseline;white-space:pre-wrap">Impact of the change</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">A more detailed description can be found </span><a href="https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit#" style="text-decoration-line:none"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">here</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"> and a proof of concept can be seen </span><a href="https://reviews.llvm.org/D33987" style="text-decoration-line:none"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">here</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"> </p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Do you see any aspect of this that I may have missed?</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);font-family:Arial;font-size:14.6667px;white-space:pre-wrap">Thanks !</span></p><div><br></div></span></div>