[PATCH] D60318: [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.

Thu Apr 11 13:50:14 PDT 2019

spatel added reviewers: efriedma, chandlerc, hfinkel, echristo.
spatel added a comment.

In D60318#1462811 <https://reviews.llvm.org/D60318#1462811>, @courbet wrote:

> There is still some test fixing to do for Power, but before I do that I'd like to get your opinion on the approach, in particular regarding the pass placement (I pretty much placed it randomly here).

It's next to the memcpy optimization pass, so that seems reasonable to me. But I don't claim any expertise on pass management/placement, so let's add more potential reviewers.
Inline summary for those folks: we'd like to move memcmp expansion from codegen to late in the IR pipeline (still under the control of a target hook) because that can unlock follow-on optimizations for CSE and instcombine. Examples:
https://bugs.llvm.org/show_bug.cgi?id=36421
https://bugs.llvm.org/show_bug.cgi?id=34032#c13

> The nice thing here is that this discovered many more optimization opportunities (e.g. the `length2` tests). I attached some benchmark data + fixture to the patch.
>  There is only one regression for N==24 in the equality case (this can be seen in the `length24_eq` test).
> 
> I've added a test for this one, but note that it still requires `-extra-vectorizer-passes` because there is not EarlyCSE pass after the function simplification pipeline. I could leave it like this or add one non-optionally, WDYT ?

I haven't seen any complaints about the cost of EarlyCSE, so my initial guess is just add it within addMemcmpPasses(). If there's a way to predicate running it on successful memcmp expansion, that would be nice.

Finally (and this could be a follow-on change), we should make the corresponding change to the new pass manager too, so we don't lose these optimizations when we flip that setting.

================
Comment at: llvm/test/CodeGen/X86/memcmp.ll:1
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=cmov   | FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE
-; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse   | FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE1
-; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2  | FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE2
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown             | FileCheck %s --check-prefix=X64 --check-prefix=X64-SSE2
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx  | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX1
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX2
+; RUN: opt -O3 -mtriple=i686-unknown-unknown -mattr=cmov < %s | llc -mtriple=i686-unknown-unknown -mattr=cmov   | FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE
+; RUN: opt -O3 -mtriple=i686-unknown-unknown -mattr=+sse < %s | llc -mtriple=i686-unknown-unknown -mattr=+sse   | FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE1
----------------
It's great to see the end-to-end improvements here in the review, but I think we should not include this in the final commit (assuming we proceed).

We should have IR-only regression tests for the passes in question. We could also include 'opt -O2' phase ordering tests within 'test/Transforms/PhaseOrdering/'.

To make sure we have the complete opt+llc wins, we could add some small memcmp/bcmp benchmarks to test-suite. (I've never done that, so we can ask others for advice on how to do that.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60318/new/

https://reviews.llvm.org/D60318