[llvm] r374662 - [LoopIdiomRecognize] Recommit: BCmp loop idiom recognition
Mikael Holmén via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 17 02:25:06 PDT 2019
Hi Roman,
I found a case triggering an assert that was added in this patch.
I wrote
https://bugs.llvm.org/show_bug.cgi?id=43687
about it.
Regards,
Mikael
On Sat, 2019-10-12 at 15:35 +0000, Roman Lebedev via llvm-commits
wrote:
> Author: lebedevri
> Date: Sat Oct 12 08:35:32 2019
> New Revision: 374662
>
> URL:
> https://protect2.fireeye.com/url?k=fe872fce-a253278d-fe876f55-86a1150bc3ba-e399c40cca43960f&q=1&u=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%3Frev%3D374662%26view%3Drev
> Log:
> [LoopIdiomRecognize] Recommit: BCmp loop idiom recognition
>
> Summary:
> This is a recommit, this originally landed in rL370454 but was
> subsequently reverted in rL370788 due to
>
https://protect2.fireeye.com/url?k=725471d9-2e80799a-72543142-86a1150bc3ba-39d1e3953f39cb1a&q=1&u=https%3A%2F%2Fbugs.llvm.org%2Fshow_bug.cgi%3Fid%3D43206
> The reduced testcase was added to bcmp-negative-tests.ll
> as @pr43206_different_loops - we must ensure that the SCEV's
> we got are both for the same loop we are currently investigating.
>
> Original commit message:
>
> @mclow.lists brought up this issue up in IRC.
> It is a reasonably common problem to compare some two values for
> equality.
> Those may be just some integers, strings or arrays of integers.
>
> In C, there is `memcmp()`, `bcmp()` functions.
> In C++, there exists `std::equal()` algorithm.
> One can also write that function manually.
>
> libstdc++'s `std::equal()` is specialized to directly call `memcmp()`
> for
> various types, but not `std::byte` from C++2a.
> https://protect2.fireeye.com/url?k=80948aa7-dc4082e4-8094ca3c-86a1150bc3ba-c56443d6354933fd&q=1&u=https%3A%2F%2Fgodbolt.org%2Fz%2Fmx2ejJ
>
> libc++ does not do anything like that, it simply relies on simple
> C++'s
> `operator==()`.
> https://protect2.fireeye.com/url?k=d092386c-8c46302f-d09278f7-86a1150bc3ba-8aaf0d6ae46c766c&q=1&u=https%3A%2F%2Fgodbolt.org%2Fz%2Fer0Zwf
> (GOOD!)
>
> So likely, there exists a certain performance opportunities.
> Let's compare performance of naive `std::equal()` (no `memcmp()`)
> with one that
> is using `memcmp()` (in this case, compiled with modified compiler).
> {F8768213}
>
> ```
> #include <algorithm>
> #include <cmath>
> #include <cstdint>
> #include <iterator>
> #include <limits>
> #include <random>
> #include <type_traits>
> #include <utility>
> #include <vector>
>
> #include "benchmark/benchmark.h"
>
> template <class T>
> bool equal(T* a, T* a_end, T* b) noexcept {
> for (; a != a_end; ++a, ++b) {
> if (*a != *b) return false;
> }
> return true;
> }
>
> template <typename T>
> std::vector<T> getVectorOfRandomNumbers(size_t count) {
> std::random_device rd;
> std::mt19937 gen(rd());
> std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
> std::numeric_limits<T>::max())
> ;
> std::vector<T> v;
> v.reserve(count);
> std::generate_n(std::back_inserter(v), count,
> [&dis, &gen]() { return dis(gen); });
> assert(v.size() == count);
> return v;
> }
>
> struct Identical {
> template <typename T>
> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count)
> {
> auto Tmp = getVectorOfRandomNumbers<T>(count);
> return std::make_pair(Tmp, std::move(Tmp));
> }
> };
>
> struct InequalHalfway {
> template <typename T>
> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count)
> {
> auto V0 = getVectorOfRandomNumbers<T>(count);
> auto V1 = V0;
> V1[V1.size() / size_t(2)]++; // just change the value.
> return std::make_pair(std::move(V0), std::move(V1));
> }
> };
>
> template <class T, class Gen>
> void BM_bcmp(benchmark::State& state) {
> const size_t Length = state.range(0);
>
> const std::pair<std::vector<T>, std::vector<T>> Data =
> Gen::template Gen<T>(Length);
> const std::vector<T>& a = Data.first;
> const std::vector<T>& b = Data.second;
> assert(a.size() == Length && b.size() == a.size());
>
> benchmark::ClobberMemory();
> benchmark::DoNotOptimize(a);
> benchmark::DoNotOptimize(a.data());
> benchmark::DoNotOptimize(b);
> benchmark::DoNotOptimize(b.data());
>
> for (auto _ : state) {
> const bool is_equal = equal(a.data(), a.data() + a.size(),
> b.data());
> benchmark::DoNotOptimize(is_equal);
> }
> state.SetComplexityN(Length);
> state.counters["eltcnt"] =
> benchmark::Counter(Length,
> benchmark::Counter::kIsIterationInvariant);
> state.counters["eltcnt/sec"] =
> benchmark::Counter(Length,
> benchmark::Counter::kIsIterationInvariantRate);
> const size_t BytesRead = 2 * sizeof(T) * Length;
> state.counters["bytes_read/iteration"] =
> benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
> benchmark::Counter::OneK::kIs1024);
> state.counters["bytes_read/sec"] = benchmark::Counter(
> BytesRead, benchmark::Counter::kIsIterationInvariantRate,
> benchmark::Counter::OneK::kIs1024);
> }
>
> template <typename T>
> static void CustomArguments(benchmark::internal::Benchmark* b) {
> const size_t L2SizeBytes = []() {
> for (const benchmark::CPUInfo::CacheInfo& I :
> benchmark::CPUInfo::Get().caches) {
> if (I.level == 2) return I.size;
> }
> return 0;
> }();
> // What is the largest range we can check to always fit within
> given L2 cache?
> const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
> /*maximal elt size*/ sizeof(T) / /*safety
> margin*/ 2;
> b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
> }
>
> BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
> ->Apply(CustomArguments<uint8_t>);
> BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
> ->Apply(CustomArguments<uint16_t>);
> BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
> ->Apply(CustomArguments<uint32_t>);
> BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
> ->Apply(CustomArguments<uint64_t>);
>
> BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
> ->Apply(CustomArguments<uint8_t>);
> BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
> ->Apply(CustomArguments<uint16_t>);
> BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
> ->Apply(CustomArguments<uint32_t>);
> BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
> ->Apply(CustomArguments<uint64_t>);
> ```
> {F8768210}
> ```
> $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-
> {old,new}/test/llvm-bcmp-bench
> RUNNING: build-old/test/llvm-bcmp-bench --
> benchmark_out=/tmp/tmpb6PEUx
> 2019-04-25 21:17:11
> Running build-old/test/llvm-bcmp-bench
> Run on (8 X 4000 MHz CPU s)
> CPU Caches:
> L1 Data 16K (x8)
> L1 Instruction 64K (x4)
> L2 Unified 2048K (x4)
> L3 Unified 8192K (x1)
> Load Average: 0.65, 3.90, 4.14
> -------------------------------------------------------------------
> --------------------------------
> Benchmark Time CP
> U Iterations UserCounters...
> -------------------------------------------------------------------
> --------------------------------
> <...>
> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101
> ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s
> eltcnt=825.856M eltcnt/sec=1.18491G/s
> BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N
> BM_bcmp<uint8_t, Identical>_RMS 8 % 8 %
> <...>
> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409
> ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s
> eltcnt=1030.91M eltcnt/sec=1.58603G/s
> BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N
> BM_bcmp<uint16_t, Identical>_RMS 25 % 25 %
> <...>
> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488
> ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s
> eltcnt=1077.12M eltcnt/sec=1.57078G/s
> BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N
> BM_bcmp<uint32_t, Identical>_RMS 42 % 42 %
> <...>
> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138
> ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s
> eltcnt=698.176M eltcnt/sec=1.27647G/s
> BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N
> BM_bcmp<uint64_t, Identical>_RMS 27 % 27 %
> <...>
> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392
> ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s
> eltcnt=1.86266G eltcnt/sec=2.66124G/s
> BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N
> BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 %
> <...>
> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860
> ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s
> eltcnt=1.40211G eltcnt/sec=2.00219G/s
> BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N
> BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 %
> <...>
> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140
> ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s
> eltcnt=1.82797G eltcnt/sec=2.60478G/s
> BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N
> BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 %
> <...>
> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099
> ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s
> eltcnt=1.3943G eltcnt/sec=1.99381G/s
> BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N
> BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 %
> RUNNING: build-new/test/llvm-bcmp-bench --
> benchmark_out=/tmp/tmpQ46PP0
> 2019-04-25 21:19:29
> Running build-new/test/llvm-bcmp-bench
> Run on (8 X 4000 MHz CPU s)
> CPU Caches:
> L1 Data 16K (x8)
> L1 Instruction 64K (x4)
> L2 Unified 2048K (x4)
> L3 Unified 8192K (x1)
> Load Average: 1.01, 2.85, 3.71
> -------------------------------------------------------------------
> --------------------------------
> Benchmark Time CP
> U Iterations UserCounters...
> -------------------------------------------------------------------
> --------------------------------
> <...>
> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590
> ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s
> eltcnt=19.2333G eltcnt/sec=27.541G/s
> BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N
> BM_bcmp<uint8_t, Identical>_RMS 37 % 37 %
> <...>
> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948
> ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s
> eltcnt=9.52909G eltcnt/sec=13.511G/s
> BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N
> BM_bcmp<uint16_t, Identical>_RMS 34 % 34 %
> <...>
> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627
> ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s
> eltcnt=4.85056G eltcnt/sec=6.87168G/s
> BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N
> BM_bcmp<uint32_t, Identical>_RMS 35 % 35 %
> <...>
> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855
> ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s
> eltcnt=2.39731G eltcnt/sec=3.3943G/s
> BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N
> BM_bcmp<uint64_t, Identical>_RMS 33 % 33 %
> <...>
> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569
> ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s
> eltcnt=37.632G eltcnt/sec=53.5046G/s
> BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N
> BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 %
> <...>
> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547
> ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s
> eltcnt=19.0318G eltcnt/sec=26.8159G/s
> BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N
> BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 %
> <...>
> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394
> ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s
> eltcnt=9.41069G eltcnt/sec=13.6255G/s
> BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N
> BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 %
> <...>
> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498
> ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s
> eltcnt=4.72333G eltcnt/sec=6.73808G/s
> BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N
> BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 %
> Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-
> bench
> Benchmark Time
> CPU Time Old Time New CPU Old CPU New
> -------------------------------------------------------------------
> --------------------------------------------------------------------
> <...>
> BM_bcmp<uint8_t, Identical>/512000 -
> 0.9570 -
> 0.9570 432131 18593 432101 18590
> <...>
> BM_bcmp<uint16_t, Identical>/256000 -
> 0.8826 -
> 0.8826 161408 18950 161409 18948
> <...>
> BM_bcmp<uint32_t, Identical>/128000 -
> 0.7714 -
> 0.7714 81497 18627 81488 18627
> <...>
> BM_bcmp<uint64_t, Identical>/64000 -
> 0.6239 -
> 0.6239 50138 18855 50138 18855
> <...>
> BM_bcmp<uint8_t, InequalHalfway>/512000 -
> 0.9503 -
> 0.9503 192405 9570 192392 9569
> <...>
> BM_bcmp<uint16_t, InequalHalfway>/256000 -
> 0.9253 -
> 0.9253 127858 9547 127860 9547
> <...>
> BM_bcmp<uint32_t, InequalHalfway>/128000 -
> 0.8088 -
> 0.8088 49140 9396 49140 9394
> <...>
> BM_bcmp<uint64_t, InequalHalfway>/64000 -
> 0.7041 -
> 0.7041 32101 9499 32099 9498
> ```
>
> What can we tell from the benchmark?
> * Performance of naive equality check somewhat improves with element
> size,
> maxing out at eltcnt/sec=1.58603G/s for uint16_t, or
> bytes_read/sec=19.0209G/s
> for uint64_t. I think, that instability implies performance
> problems.
> * Performance of `memcmp()`-aware benchmark always maxes out at
> around
> bytes_read/sec=51.2991G/s for every type. That is 2.6x the
> throughput of the
> naive variant!
> * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
> eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so
> 24x) and
> linearly decreases with element size.
> For uint64_t, it's ~4x+ the elements/second.
> * The call obvious is more pricey than the loop, with small element
> count.
> As it can be seen from the full output {F8768210}, the `memcmp()`
> is almost
> universally worse, independent of the element size (and thus buffer
> size) when
> element count is less than 8.
>
> So all in all, bcmp idiom does indeed pose untapped performance
> headroom.
> This diff does implement said idiom recognition. I think a reasonable
> test
> coverage is present, but do tell if there is anything obvious
> missing.
>
> Now, quality. This does succeed to build and pass the test-suite, at
> least
> without any non-bundled elements. {F8768216} {F8768217}
> This transform fires 91 times:
> ```
> $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-
> new.json
> Tests: 1149
> Metric: loop-idiom.NumBCmp
>
> Program result-new
>
> MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00
> MultiSource/Applications/d/make_dparser 3.00
> SingleSource/UnitTests/vla 2.00
> MultiSource/Applications/Burg/burg 1.00
> MultiSourc.../Applications/JM/lencod/lencod 1.00
> MultiSource/Applications/lemon/lemon 1.00
> MultiSource/Benchmarks/Bullet/bullet 1.00
> MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00
> MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00
> MultiSourc...Prolangs-C/simulator/simulator 1.00
> ```
> The size changes are:
> I'm not sure what's going on with SingleSource/UnitTests/vla.test
> yet, did not look.
> ```
> $ /build/test-suite/utils/compare.py -m size..text result-
> {old,new}.json --filter-hash
> Tests: 1149
> Same hash: 907 (filtered out)
> Remaining: 242
> Metric: size..text
>
> Program result-old result-new
> diff
> test-
> suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6%
> test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -
> 3.5%
> test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -
> 0.1%
> test-suite...plications/d/make_dparser.test 89585.00 89505.00 -
> 0.1%
> test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -
> 0.1%
> test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -
> 0.1%
> test-
> suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0%
> test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -
> 0.0%
> test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -
> 0.0%
> test-
> suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0%
> test-
> suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan%
> test-
> suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan%
> test-
> suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan%
> test-
> suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan%
> test-
> suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan%
> Geomean
> difference nan%
> result-old result-new diff
> count 1.000000e+01 10.00000 10.000000
> mean 3.152090e+05 311695.40000 0.006749
> std 3.790398e+05 372091.42232 0.036605
> min 7.530000e+02 833.00000 -0.034981
> 25% 4.243300e+04 42401.00000 -0.000866
> 50% 1.197370e+05 119689.00000 -0.000392
> 75% 6.397050e+05 639705.00000 -0.000005
> max 1.001697e+06 966657.00000 0.106242
> ```
>
> I don't have timings though.
>
> And now to the code. The basic idea is to completely replace the
> whole loop.
> If we can't fully kill it, don't transform.
> I have left one or two comments in the code, so hopefully it can be
> understood.
>
> Also, there is a few TODO's that i have left for follow-ups:
> * widening of `memcmp()`/`bcmp()`
> * step smaller than the comparison size
> * Metadata propagation
> * more than two blocks as long as there is still a single backedge?
> * ???
>
> Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper,
> courbet
>
> Reviewed By: courbet
>
> Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet,
> courbet, llvm-commits, mclow.lists
>
> Tags: #llvm
>
> Differential Revision:
> https://protect2.fireeye.com/url?k=f70b7269-abdf7a2a-f70b32f2-86a1150bc3ba-4b6dcaf18ea1926c&q=1&u=https%3A%2F%2Freviews.llvm.org%2FD61144
>
> Modified:
> llvm/trunk/docs/ReleaseNotes.rst
> llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> llvm/trunk/test/Transforms/LoopIdiom/bcmp-basic.ll
> llvm/trunk/test/Transforms/LoopIdiom/bcmp-debugify-remarks.ll
> llvm/trunk/test/Transforms/LoopIdiom/bcmp-widening.ll
>
> Modified: llvm/trunk/docs/ReleaseNotes.rst
> URL:
> https://protect2.fireeye.com/url?k=f8a84c62-a47c4421-f8a80cf9-86a1150bc3ba-b4bb31a07dc73823&q=1&u=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Fdocs%2FReleaseNotes.rst%3Frev%3D374662%26r1%3D374661%26r2%3D374662%26view%3Ddiff
> =====================================================================
> =========
> --- llvm/trunk/docs/ReleaseNotes.rst (original)
> +++ llvm/trunk/docs/ReleaseNotes.rst Sat Oct 12 08:35:32 2019
> @@ -66,6 +66,9 @@ Non-comprehensive list of changes in thi
> Undefined Behaviour Sanitizer ``-fsanitize=pointer-overflow``
> check
> will now catch such cases.
>
> +* The Loop Idiom Recognition (``-loop-idiom``) pass has learned to
> recognize
> + ``bcmp`` pattern, and convert it into a call to ``bcmp`` (or
> ``memcmp``)
> + function.
>
> Changes to the LLVM IR
> ----------------------
>
> Modified: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> URL:
> https://protect2.fireeye.com/url?k=0c6cffac-50b8f7ef-0c6cbf37-86a1150bc3ba-d2a598ad8354e011&q=1&u=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Flib%2FTransforms%2FScalar%2FLoopIdiomRecognize.cpp%3Frev%3D374662%26r1%3D374661%26r2%3D374662%26view%3Ddiff
> =====================================================================
> =========
> --- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> (original)
> +++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Sat Oct
> 12 08:35:32 2019
> @@ -41,6 +41,7 @@
> #include "llvm/ADT/ArrayRef.h"
> #include "llvm/ADT/DenseMap.h"
> #include "llvm/ADT/MapVector.h"
> +#include "llvm/ADT/STLExtras.h"
> #include "llvm/ADT/SetVector.h"
> #include "llvm/ADT/SmallPtrSet.h"
> #include "llvm/ADT/SmallVector.h"
> @@ -77,16 +78,20 @@
> #include "llvm/IR/LLVMContext.h"
> #include "llvm/IR/Module.h"
> #include "llvm/IR/PassManager.h"
> +#include "llvm/IR/PatternMatch.h"
> #include "llvm/IR/Type.h"
> #include "llvm/IR/User.h"
> #include "llvm/IR/Value.h"
> #include "llvm/IR/ValueHandle.h"
> +#include "llvm/IR/Verifier.h"
> #include "llvm/Pass.h"
> #include "llvm/Support/Casting.h"
> #include "llvm/Support/CommandLine.h"
> #include "llvm/Support/Debug.h"
> #include "llvm/Support/raw_ostream.h"
> #include "llvm/Transforms/Scalar.h"
> +#include "llvm/Transforms/Scalar/LoopPassManager.h"
> +#include "llvm/Transforms/Utils/BasicBlockUtils.h"
> #include "llvm/Transforms/Utils/BuildLibCalls.h"
> #include "llvm/Transforms/Utils/Local.h"
> #include "llvm/Transforms/Utils/LoopUtils.h"
> @@ -102,6 +107,7 @@ using namespace llvm;
>
> STATISTIC(NumMemSet, "Number of memset's formed from loop stores");
> STATISTIC(NumMemCpy, "Number of memcpy's formed from loop
> load+stores");
> +STATISTIC(NumBCmp, "Number of memcmp's formed from loop 2xload+eq-
> compare");
>
> static cl::opt<bool> UseLIRCodeSizeHeurs(
> "use-lir-code-size-heurs",
> @@ -111,6 +117,26 @@ static cl::opt<bool> UseLIRCodeSizeHeurs
>
> namespace {
>
> +// FIXME: reinventing the wheel much? Is there a cleaner solution?
> +struct PMAbstraction {
> + virtual void markLoopAsDeleted(Loop *L) = 0;
> + virtual ~PMAbstraction() = default;
> +};
> +struct LegacyPMAbstraction : PMAbstraction {
> + LPPassManager &LPM;
> + LegacyPMAbstraction(LPPassManager &LPM) : LPM(LPM) {}
> + virtual ~LegacyPMAbstraction() = default;
> + void markLoopAsDeleted(Loop *L) override {
> LPM.markLoopAsDeleted(*L); }
> +};
> +struct NewPMAbstraction : PMAbstraction {
> + LPMUpdater &Updater;
> + NewPMAbstraction(LPMUpdater &Updater) : Updater(Updater) {}
> + virtual ~NewPMAbstraction() = default;
> + void markLoopAsDeleted(Loop *L) override {
> + Updater.markLoopAsDeleted(*L, L->getName());
> + }
> +};
> +
> class LoopIdiomRecognize {
> Loop *CurLoop = nullptr;
> AliasAnalysis *AA;
> @@ -120,6 +146,7 @@ class LoopIdiomRecognize {
> TargetLibraryInfo *TLI;
> const TargetTransformInfo *TTI;
> const DataLayout *DL;
> + PMAbstraction &LoopDeleter;
> OptimizationRemarkEmitter &ORE;
> bool ApplyCodeSizeHeuristics;
>
> @@ -128,9 +155,10 @@ public:
> LoopInfo *LI, ScalarEvolution *SE,
> TargetLibraryInfo *TLI,
> const TargetTransformInfo *TTI,
> - const DataLayout *DL,
> + const DataLayout *DL, PMAbstraction
> &LoopDeleter,
> OptimizationRemarkEmitter &ORE)
> - : AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI), DL(DL),
> ORE(ORE) {}
> + : AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI), DL(DL),
> + LoopDeleter(LoopDeleter), ORE(ORE) {}
>
> bool runOnLoop(Loop *L);
>
> @@ -144,6 +172,8 @@ private:
> bool HasMemset;
> bool HasMemsetPattern;
> bool HasMemcpy;
> + bool HasMemCmp;
> + bool HasBCmp;
>
> /// Return code for isLegalStore()
> enum LegalStoreKind {
> @@ -186,6 +216,32 @@ private:
>
> bool runOnNoncountableLoop();
>
> + struct CmpLoopStructure {
> + Value *BCmpValue, *LatchCmpValue;
> + BasicBlock *HeaderBrEqualBB, *HeaderBrUnequalBB;
> + BasicBlock *LatchBrFinishBB, *LatchBrContinueBB;
> + };
> + bool matchBCmpLoopStructure(CmpLoopStructure &CmpLoop) const;
> + struct CmpOfLoads {
> + ICmpInst::Predicate BCmpPred;
> + Value *LoadSrcA, *LoadSrcB;
> + Value *LoadA, *LoadB;
> + };
> + bool matchBCmpOfLoads(Value *BCmpValue, CmpOfLoads &CmpOfLoads)
> const;
> + bool recognizeBCmpLoopControlFlow(const CmpOfLoads &CmpOfLoads,
> + CmpLoopStructure &CmpLoop)
> const;
> + bool recognizeBCmpLoopSCEV(uint64_t BCmpTyBytes, CmpOfLoads
> &CmpOfLoads,
> + const SCEV *&SrcA, const SCEV *&SrcB,
> + const SCEV *&Iterations) const;
> + bool detectBCmpIdiom(ICmpInst *&BCmpInst, CmpInst *&LatchCmpInst,
> + LoadInst *&LoadA, LoadInst *&LoadB, const
> SCEV *&SrcA,
> + const SCEV *&SrcB, const SCEV *&NBytes)
> const;
> + BasicBlock *transformBCmpControlFlow(ICmpInst *ComparedEqual);
> + void transformLoopToBCmp(ICmpInst *BCmpInst, CmpInst
> *LatchCmpInst,
> + LoadInst *LoadA, LoadInst *LoadB, const
> SCEV *SrcA,
> + const SCEV *SrcB, const SCEV *NBytes);
> + bool recognizeBCmp();
> +
> bool recognizePopcount();
> void transformLoopToPopcount(BasicBlock *PreCondBB, Instruction
> *CntInst,
> PHINode *CntPhi, Value *Var);
> @@ -223,13 +279,14 @@ public:
> &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
> *L->getHeader()->getParent());
> const DataLayout *DL = &L->getHeader()->getModule()-
> >getDataLayout();
> + LegacyPMAbstraction LoopDeleter(LPM);
>
> // For the old PM, we can't use OptimizationRemarkEmitter as an
> analysis
> // pass. Function analyses need to be preserved across loop
> transformations
> // but ORE cannot be preserved (see comment before the pass
> definition).
> OptimizationRemarkEmitter ORE(L->getHeader()->getParent());
>
> - LoopIdiomRecognize LIR(AA, DT, LI, SE, TLI, TTI, DL, ORE);
> + LoopIdiomRecognize LIR(AA, DT, LI, SE, TLI, TTI, DL,
> LoopDeleter, ORE);
> return LIR.runOnLoop(L);
> }
>
> @@ -248,7 +305,7 @@ char LoopIdiomRecognizeLegacyPass::ID =
>
> PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L,
> LoopAnalysisManager &AM,
> LoopStandardAnalysisRe
> sults &AR,
> - LPMUpdater &) {
> + LPMUpdater &Updater) {
> const auto *DL = &L.getHeader()->getModule()->getDataLayout();
>
> const auto &FAM =
> @@ -262,8 +319,9 @@ PreservedAnalyses LoopIdiomRecognizePass
> "LoopIdiomRecognizePass: OptimizationRemarkEmitterAnalysis
> not cached "
> "at a higher level");
>
> + NewPMAbstraction LoopDeleter(Updater);
> LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI,
> &AR.TTI, DL,
> - *ORE);
> + LoopDeleter, *ORE);
> if (!LIR.runOnLoop(&L))
> return PreservedAnalyses::all();
>
> @@ -300,7 +358,8 @@ bool LoopIdiomRecognize::runOnLoop(Loop
>
> // Disable loop idiom recognition if the function's name is a
> common idiom.
> StringRef Name = L->getHeader()->getParent()->getName();
> - if (Name == "memset" || Name == "memcpy")
> + if (Name == "memset" || Name == "memcpy" || Name == "memcmp" ||
> + Name == "bcmp")
> return false;
>
> // Determine if code size heuristics need to be applied.
> @@ -310,8 +369,10 @@ bool LoopIdiomRecognize::runOnLoop(Loop
> HasMemset = TLI->has(LibFunc_memset);
> HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
> HasMemcpy = TLI->has(LibFunc_memcpy);
> + HasMemCmp = TLI->has(LibFunc_memcmp);
> + HasBCmp = TLI->has(LibFunc_bcmp);
>
> - if (HasMemset || HasMemsetPattern || HasMemcpy)
> + if (HasMemset || HasMemsetPattern || HasMemcpy || HasMemCmp ||
> HasBCmp)
> if (SE->hasLoopInvariantBackedgeTakenCount(L))
> return runOnCountableLoop();
>
> @@ -1150,7 +1211,7 @@ bool LoopIdiomRecognize::runOnNoncountab
> << "] Noncountable Loop %"
> << CurLoop->getHeader()->getName() << "\n");
>
> - return recognizePopcount() || recognizeAndInsertFFS();
> + return recognizeBCmp() || recognizePopcount() ||
> recognizeAndInsertFFS();
> }
>
> /// Check if the given conditional branch is based on the comparison
> between
> @@ -1824,3 +1885,804 @@ void LoopIdiomRecognize::transformLoopTo
> // loop. The loop would otherwise not be deleted even if it
> becomes empty.
> SE->forgetLoop(CurLoop);
> }
> +
> +bool LoopIdiomRecognize::matchBCmpLoopStructure(
> + CmpLoopStructure &CmpLoop) const {
> + ICmpInst::Predicate BCmpPred;
> +
> + // We are looking for the following basic layout:
> + // PreheaderBB: <preheader> ; preds = ???
> + // <...>
> + // br label %LoopHeaderBB
> + // LoopHeaderBB: <header,exiting> ; preds =
> %PreheaderBB,%LoopLatchBB
> + // <...>
> + // %BCmpValue = icmp <...>
> + // br i1 %BCmpValue, label %LoopLatchBB, label %Successor0
> + // LoopLatchBB: <latch,exiting> ; preds = %LoopHeaderBB
> + // <...>
> + // %LatchCmpValue = <are we done, or do next iteration?>
> + // br i1 %LatchCmpValue, label %Successor1, label %LoopHeaderBB
> + // Successor0: <exit> ; preds = %LoopHeaderBB
> + // <...>
> + // Successor1: <exit> ; preds = %LoopLatchBB
> + // <...>
> + //
> + // Successor0 and Successor1 may or may not be the same basic
> block.
> +
> + // Match basic frame-work of this supposedly-comparison loop.
> + using namespace PatternMatch;
> + if (!match(CurLoop->getHeader()->getTerminator(),
> + m_Br(m_CombineAnd(m_ICmp(BCmpPred, m_Value(),
> m_Value()),
> + m_Value(CmpLoop.BCmpValue)),
> + CmpLoop.HeaderBrEqualBB,
> CmpLoop.HeaderBrUnequalBB)) ||
> + !match(CurLoop->getLoopLatch()->getTerminator(),
> + m_Br(m_CombineAnd(m_Cmp(),
> m_Value(CmpLoop.LatchCmpValue)),
> + CmpLoop.LatchBrFinishBB,
> CmpLoop.LatchBrContinueBB))) {
> + LLVM_DEBUG(dbgs() << "Basic control-flow layout
> unrecognized.\n");
> + return false;
> + }
> + LLVM_DEBUG(dbgs() << "Recognized basic control-flow layout.\n");
> + return true;
> +}
> +
> +bool LoopIdiomRecognize::matchBCmpOfLoads(Value *BCmpValue,
> + CmpOfLoads &CmpOfLoads)
> const {
> + using namespace PatternMatch;
> + LLVM_DEBUG(dbgs() << "Analyzing header icmp " << *BCmpValue
> + << " as bcmp pattern.\n");
> +
> + // Match bcmp-style loop header cmp. It must be an eq-icmp of
> loads. Example:
> + // %v0 = load <...>, <...>* %LoadSrcA
> + // %v1 = load <...>, <...>* %LoadSrcB
> + // %CmpLoop.BCmpValue = icmp eq <...> %v0, %v1
> + // There won't be any no-op bitcasts between load and icmp,
> + // they would have been transformed into a load of bitcast.
> + // FIXME: {b,mem}cmp() calls have the same semantics as icmp.
> Match them too.
> + if (!match(BCmpValue,
> + m_ICmp(CmpOfLoads.BCmpPred,
> + m_CombineAnd(m_Load(m_Value(CmpOfLoads.LoadSrcA)
> ),
> + m_Value(CmpOfLoads.LoadA)),
> + m_CombineAnd(m_Load(m_Value(CmpOfLoads.LoadSrcB)
> ),
> + m_Value(CmpOfLoads.LoadB)))) ||
> + !ICmpInst::isEquality(CmpOfLoads.BCmpPred)) {
> + LLVM_DEBUG(dbgs() << "Loop header icmp did not match bcmp
> pattern.\n");
> + return false;
> + }
> + LLVM_DEBUG(dbgs() << "Recognized header icmp as bcmp pattern with
> loads:\n\t"
> + << *CmpOfLoads.LoadA << "\n\t" <<
> *CmpOfLoads.LoadB
> + << "\n");
> + // FIXME: handle memcmp pattern?
> + return true;
> +}
> +
> +bool LoopIdiomRecognize::recognizeBCmpLoopControlFlow(
> + const CmpOfLoads &CmpOfLoads, CmpLoopStructure &CmpLoop) const {
> + BasicBlock *LoopHeaderBB = CurLoop->getHeader();
> + BasicBlock *LoopLatchBB = CurLoop->getLoopLatch();
> +
> + // Be wary, comparisons can be inverted, canonicalize order.
> + // If this 'element' comparison passed, we expect to proceed to
> the next elt.
> + if (CmpOfLoads.BCmpPred != ICmpInst::Predicate::ICMP_EQ)
> + std::swap(CmpLoop.HeaderBrEqualBB, CmpLoop.HeaderBrUnequalBB);
> + // The predicate on loop latch does not matter, just canonicalize
> some order.
> + if (CmpLoop.LatchBrContinueBB != LoopHeaderBB)
> + std::swap(CmpLoop.LatchBrFinishBB, CmpLoop.LatchBrContinueBB);
> +
> + // Check that control-flow between blocks is as expected.
> + if (CmpLoop.HeaderBrEqualBB != LoopLatchBB ||
> + CmpLoop.LatchBrContinueBB != LoopHeaderBB) {
> + LLVM_DEBUG(dbgs() << "Loop control-flow not recognized.\n");
> + return false;
> + }
> +
> + SmallVector<BasicBlock *, 2> ExitBlocks;
> + CurLoop->getUniqueExitBlocks(ExitBlocks);
> + assert(ExitBlocks.size() <= 2U && "Can't have more than two exit
> blocks.");
> +
> + assert(!is_contained(ExitBlocks, CmpLoop.HeaderBrEqualBB) &&
> + is_contained(ExitBlocks, CmpLoop.HeaderBrUnequalBB) &&
> + !is_contained(ExitBlocks, CmpLoop.LatchBrContinueBB) &&
> + is_contained(ExitBlocks, CmpLoop.LatchBrFinishBB) &&
> + "Unexpected exit edges.");
> +
> + LLVM_DEBUG(dbgs() << "Recognized loop control-flow.\n");
> +
> + LLVM_DEBUG(dbgs() << "Performing side-effect analysis on the
> loop.\n");
> + assert(CurLoop->isLCSSAForm(*DT) && "Should only get LCSSA-form
> loops here.");
> + // No loop instructions must be used outside of the loop. Since we
> are in
> + // LCSSA form, we only need to check successor block's PHI nodes's
> incoming
> + // values for incoming blocks that are the loop basic blocks.
> + for (const BasicBlock *ExitBB : ExitBlocks) {
> + for (const PHINode &PHI : ExitBB->phis()) {
> + for (const BasicBlock *LoopBB :
> + make_filter_range(PHI.blocks(), [this](BasicBlock
> *PredecessorBB) {
> + return CurLoop->contains(PredecessorBB);
> + })) {
> + const auto *I =
> + dyn_cast<Instruction>(PHI.getIncomingValueForBlock(LoopB
> B));
> + if (I && CurLoop->contains(I)) {
> + LLVM_DEBUG(dbgs()
> + << "Loop contains instruction " << *I
> + << " which is used outside of the loop in
> basic block "
> + << ExitBB->getName() << " in phi node " <<
> PHI << "\n");
> + return false;
> + }
> + }
> + }
> + }
> + // Similarly, the loop should not have any other observable side-
> effects
> + // other than the final comparison result.
> + for (BasicBlock *LoopBB : CurLoop->blocks()) {
> + for (Instruction &I : *LoopBB) {
> + if (isa<DbgInfoIntrinsic>(I)) // Ignore dbginfo.
> + continue; // FIXME: anything else?
> lifetime info?
> + if ((I.mayHaveSideEffects() || I.isAtomic() ||
> I.isFenceLike()) &&
> + &I != CmpOfLoads.LoadA && &I != CmpOfLoads.LoadB) {
> + LLVM_DEBUG(
> + dbgs() << "Loop contains instruction with potential
> side-effects: "
> + << I << "\n");
> + return false;
> + }
> + }
> + }
> + LLVM_DEBUG(dbgs() << "No loop instructions deemed to have side-
> effects.\n");
> + return true;
> +}
> +
> +bool LoopIdiomRecognize::recognizeBCmpLoopSCEV(uint64_t BCmpTyBytes,
> + CmpOfLoads
> &CmpOfLoads,
> + const SCEV *&SrcA,
> + const SCEV *&SrcB,
> + const SCEV
> *&Iterations) const {
> + // Try to compute SCEV of the loads, for this loop's scope.
> + const auto *ScevForSrcA = dyn_cast<SCEVAddRecExpr>(
> + SE->getSCEVAtScope(CmpOfLoads.LoadSrcA, CurLoop));
> + const auto *ScevForSrcB = dyn_cast<SCEVAddRecExpr>(
> + SE->getSCEVAtScope(CmpOfLoads.LoadSrcB, CurLoop));
> + if (!ScevForSrcA || !ScevForSrcB) {
> + LLVM_DEBUG(dbgs() << "Failed to get SCEV expressions for load
> sources.\n");
> + return false;
> + }
> +
> + LLVM_DEBUG(dbgs() << "Got SCEV expressions (at loop scope) for
> loads:\n\t"
> + << *ScevForSrcA << "\n\t" << *ScevForSrcB <<
> "\n");
> +
> + // Loads must have folloving SCEV
> exprs: {%ptr,+,BCmpTyBytes}<%LoopHeaderBB>
> + const SCEV *RecStepForA = ScevForSrcA->getStepRecurrence(*SE);
> + const SCEV *RecStepForB = ScevForSrcB->getStepRecurrence(*SE);
> + if (!ScevForSrcA->isAffine() || !ScevForSrcB->isAffine() ||
> + ScevForSrcA->getLoop() != CurLoop || ScevForSrcB->getLoop() !=
> CurLoop ||
> + RecStepForA != RecStepForB || !isa<SCEVConstant>(RecStepForA)
> ||
> + cast<SCEVConstant>(RecStepForA)->getAPInt() != BCmpTyBytes) {
> + LLVM_DEBUG(dbgs() << "Unsupported SCEV expressions for loads.
> Only support "
> + "affine SCEV expressions originating in the
> loop we "
> + "are analysing with identical constant
> positive step, "
> + "equal to the count of bytes compared.
> Got:\n\t"
> + << *RecStepForA << "\n\t" << *RecStepForB <<
> "\n");
> + return false;
> + // FIXME: can support BCmpTyBytes > Step.
> + // But will need to account for the extra bytes compared at the
> end.
> + }
> +
> + SrcA = ScevForSrcA->getStart();
> + SrcB = ScevForSrcB->getStart();
> + LLVM_DEBUG(dbgs() << "Got SCEV expressions for load sources:\n\t"
> << *SrcA
> + << "\n\t" << *SrcB << "\n");
> +
> + // The load sources must be loop-invants that dominate the loop
> header.
> + if (SrcA == SE->getCouldNotCompute() || SrcB == SE-
> >getCouldNotCompute() ||
> + !SE->isAvailableAtLoopEntry(SrcA, CurLoop) ||
> + !SE->isAvailableAtLoopEntry(SrcB, CurLoop)) {
> + LLVM_DEBUG(dbgs() << "Unsupported SCEV expressions for loads,
> unavaliable "
> + "prior to loop header.\n");
> + return false;
> + }
> +
> + LLVM_DEBUG(dbgs() << "SCEV expressions for loads are
> acceptable.\n");
> +
> + // For how many iterations is loop guaranteed not to exit via
> LoopLatch?
> + // This is one less than the maximal number of comparisons,and
> is: n + -1
> + const SCEV *LoopExitCount =
> + SE->getExitCount(CurLoop, CurLoop->getLoopLatch());
> + LLVM_DEBUG(dbgs() << "Got SCEV expression for loop latch exit
> count: "
> + << *LoopExitCount << "\n");
> + // Exit count, similarly, must be loop-invant that dominates the
> loop header.
> + if (LoopExitCount == SE->getCouldNotCompute() ||
> + !LoopExitCount->getType()->isIntOrPtrTy() ||
> + !SE->isAvailableAtLoopEntry(LoopExitCount, CurLoop)) {
> + LLVM_DEBUG(dbgs() << "Unsupported SCEV expression for loop latch
> exit.\n");
> + return false;
> + }
> +
> + // LoopExitCount is always one less than the actual count of
> iterations.
> + // Do this before cast, else we will be stuck with 1 + zext(-1 +
> n)
> + Iterations = SE->getAddExpr(
> + LoopExitCount, SE->getOne(LoopExitCount->getType()),
> SCEV::FlagNUW);
> + assert(Iterations != SE->getCouldNotCompute() &&
> + "Shouldn't fail to increment by one.");
> +
> + LLVM_DEBUG(dbgs() << "Computed iteration count: " << *Iterations
> << "\n");
> + return true;
> +}
> +
> +/// Return true iff the bcmp idiom is detected in the loop.
> +///
> +/// Additionally:
> +/// 1) \p BCmpInst is set to the root byte-comparison instruction.
> +/// 2) \p LatchCmpInst is set to the comparison that controls the
> latch.
> +/// 3) \p LoadA is set to the first LoadInst.
> +/// 4) \p LoadB is set to the second LoadInst.
> +/// 5) \p SrcA is set to the first source location that is being
> compared.
> +/// 6) \p SrcB is set to the second source location that is being
> compared.
> +/// 7) \p NBytes is set to the number of bytes to compare.
> +bool LoopIdiomRecognize::detectBCmpIdiom(ICmpInst *&BCmpInst,
> + CmpInst *&LatchCmpInst,
> + LoadInst *&LoadA, LoadInst
> *&LoadB,
> + const SCEV *&SrcA, const
> SCEV *&SrcB,
> + const SCEV *&NBytes) const
> {
> + LLVM_DEBUG(dbgs() << "Recognizing bcmp idiom\n");
> +
> + // Give up if the loop is not in normal form, or has more than 2
> blocks.
> + if (!CurLoop->isLoopSimplifyForm() || CurLoop->getNumBlocks() > 2)
> {
> + LLVM_DEBUG(dbgs() << "Basic loop structure unrecognized.\n");
> + return false;
> + }
> + LLVM_DEBUG(dbgs() << "Recognized basic loop structure.\n");
> +
> + CmpLoopStructure CmpLoop;
> + if (!matchBCmpLoopStructure(CmpLoop))
> + return false;
> +
> + CmpOfLoads CmpOfLoads;
> + if (!matchBCmpOfLoads(CmpLoop.BCmpValue, CmpOfLoads))
> + return false;
> +
> + if (!recognizeBCmpLoopControlFlow(CmpOfLoads, CmpLoop))
> + return false;
> +
> + BCmpInst = cast<ICmpInst>(CmpLoop.BCmpValue); // FIXME: is
> there no
> + LatchCmpInst = cast<CmpInst>(CmpLoop.LatchCmpValue); // way to
> combine
> + LoadA = cast<LoadInst>(CmpOfLoads.LoadA); // these cast
> with
> + LoadB = cast<LoadInst>(CmpOfLoads.LoadB); // m_Value()
> matcher?
> +
> + Type *BCmpValTy = BCmpInst->getOperand(0)->getType();
> + LLVMContext &Context = BCmpValTy->getContext();
> + uint64_t BCmpTyBits = DL->getTypeSizeInBits(BCmpValTy);
> + static constexpr uint64_t ByteTyBits = 8;
> +
> + LLVM_DEBUG(dbgs() << "Got comparison between values of type " <<
> *BCmpValTy
> + << " of size " << BCmpTyBits
> + << " bits (while byte = " << ByteTyBits << "
> bits).\n");
> + // bcmp()/memcmp() minimal unit of work is a byte. Therefore we
> must check
> + // that we are dealing with a multiple of a byte here.
> + if (BCmpTyBits % ByteTyBits != 0) {
> + LLVM_DEBUG(dbgs() << "Value size is not a multiple of byte.\n");
> + return false;
> + // FIXME: could still be done under a run-time check that the
> total bit
> + // count is a multiple of a byte i guess? Or handle remainder
> separately?
> + }
> +
> + // Each comparison is done on this many bytes.
> + uint64_t BCmpTyBytes = BCmpTyBits / ByteTyBits;
> + LLVM_DEBUG(dbgs() << "Size is exactly " << BCmpTyBytes
> + << " bytes, eligible for bcmp conversion.\n");
> +
> + const SCEV *Iterations;
> + if (!recognizeBCmpLoopSCEV(BCmpTyBytes, CmpOfLoads, SrcA, SrcB,
> Iterations))
> + return false;
> +
> + // bcmp / memcmp take length argument as size_t, do promotion now.
> + Type *CmpFuncSizeTy = DL->getIntPtrType(Context);
> + Iterations = SE->getNoopOrZeroExtend(Iterations, CmpFuncSizeTy);
> + assert(Iterations != SE->getCouldNotCompute() && "Promotion
> failed.");
> + // Note that it didn't do ptrtoint cast, we will need to do it
> manually.
> +
> + // We will be comparing *bytes*, not BCmpTy, we need to
> recalculate size.
> + // It's a multiplication, and it *could* overflow. But for it to
> overflow
> + // we'd want to compare more bytes than could be represented by
> size_t, But
> + // allocation functions also take size_t. So how'd you produce
> such buffer?
> + // FIXME: we likely need to actually check that we know this won't
> overflow,
> + // via llvm::computeOverflowForUnsignedMul().
> + NBytes = SE->getMulExpr(
> + Iterations, SE->getConstant(CmpFuncSizeTy, BCmpTyBytes),
> SCEV::FlagNUW);
> + assert(NBytes != SE->getCouldNotCompute() &&
> + "Shouldn't fail to increment by one.");
> +
> + LLVM_DEBUG(dbgs() << "Computed total byte count: " << *NBytes <<
> "\n");
> +
> + if (LoadA->getPointerAddressSpace() != LoadB-
> >getPointerAddressSpace() ||
> + LoadA->getPointerAddressSpace() != 0 || !LoadA->isSimple() ||
> + !LoadB->isSimple()) {
> + StringLiteral L("Unsupported loads in idiom - only support
> identical, "
> + "simple loads from address space 0.\n");
> + LLVM_DEBUG(dbgs() << L);
> + ORE.emit([&]() {
> + return OptimizationRemarkMissed(DEBUG_TYPE,
> "BCmpIdiomUnsupportedLoads",
> + BCmpInst->getDebugLoc(),
> + CurLoop->getHeader())
> + << L;
> + });
> + return false; // FIXME
> + }
> +
> + LLVM_DEBUG(dbgs() << "Recognized bcmp idiom\n");
> + ORE.emit([&]() {
> + return OptimizationRemarkAnalysis(DEBUG_TYPE,
> "RecognizedBCmpIdiom",
> + CurLoop->getStartLoc(),
> + CurLoop->getHeader())
> + << "Loop recognized as a bcmp idiom";
> + });
> +
> + return true;
> +}
> +
> +BasicBlock *
> +LoopIdiomRecognize::transformBCmpControlFlow(ICmpInst
> *ComparedEqual) {
> + LLVM_DEBUG(dbgs() << "Transforming control-flow.\n");
> + SmallVector<DominatorTree::UpdateType, 8> DTUpdates;
> +
> + BasicBlock *PreheaderBB = CurLoop->getLoopPreheader();
> + BasicBlock *HeaderBB = CurLoop->getHeader();
> + BasicBlock *LoopLatchBB = CurLoop->getLoopLatch();
> + SmallString<32> LoopName = CurLoop->getName();
> + Function *Func = PreheaderBB->getParent();
> + LLVMContext &Context = Func->getContext();
> +
> + // Before doing anything, drop SCEV info.
> + SE->forgetLoop(CurLoop);
> +
> + // Here we start with: (0/6)
> + // PreheaderBB: <preheader> ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // br label %LoopHeaderBB
> + // LoopHeaderBB: <header,exiting> ; preds =
> %PreheaderBB,%LoopLatchBB
> + // <...>
> + // br i1 %<...>, label %LoopLatchBB, label %Successor0BB
> + // LoopLatchBB: <latch,exiting> ; preds = %LoopHeaderBB
> + // <...>
> + // br i1 %<...>, label %Successor1BB, label %LoopHeaderBB
> + // Successor0BB: <exit> ; preds = %LoopHeaderBB
> + // %S0PHI = phi <...> [ <...>, %LoopHeaderBB ]
> + // <...>
> + // Successor1BB: <exit> ; preds = %LoopLatchBB
> + // %S1PHI = phi <...> [ <...>, %LoopLatchBB ]
> + // <...>
> + //
> + // Successor0 and Successor1 may or may not be the same basic
> block.
> +
> + // Decouple the edge between loop preheader basic block and loop
> header basic
> + // block. Thus the loop has become unreachable.
> + assert(cast<BranchInst>(PreheaderBB->getTerminator())-
> >isUnconditional() &&
> + PreheaderBB->getTerminator()->getSuccessor(0) == HeaderBB
> &&
> + "Preheader bb must end with an unconditional branch to
> header bb.");
> + PreheaderBB->getTerminator()->eraseFromParent();
> + DTUpdates.push_back({DominatorTree::Delete, PreheaderBB,
> HeaderBB});
> +
> + // Create a new preheader basic block before loop header basic
> block.
> + auto *PhonyPreheaderBB = BasicBlock::Create(
> + Context, LoopName + ".phonypreheaderbb", Func, HeaderBB);
> + // And insert an unconditional branch from phony preheader basic
> block to
> + // loop header basic block.
> + IRBuilder<>(PhonyPreheaderBB).CreateBr(HeaderBB);
> + DTUpdates.push_back({DominatorTree::Insert, PhonyPreheaderBB,
> HeaderBB});
> +
> + // Create a *single* new empty block that we will substitute as a
> + // successor basic block for the loop's exits. This one is
> temporary.
> + // Much like phony preheader basic block, it is not connected.
> + auto *PhonySuccessorBB =
> + BasicBlock::Create(Context, LoopName + ".phonysuccessorbb",
> Func,
> + LoopLatchBB->getNextNode());
> + // That block must have *some* non-PHI instruction, or else
> deleteDeadLoop()
> + // will mess up cleanup of dbginfo, and verifier will complain.
> + IRBuilder<>(PhonySuccessorBB).CreateUnreachable();
> +
> + // Create two new empty blocks that we will use to preserve the
> original
> + // loop exit control-flow, and preserve the incoming values in the
> PHI nodes
> + // in loop's successor exit blocks. These will live one.
> + auto *ComparedUnequalBB =
> + BasicBlock::Create(Context, ComparedEqual->getName() +
> ".unequalbb", Func,
> + PhonySuccessorBB->getNextNode());
> + auto *ComparedEqualBB =
> + BasicBlock::Create(Context, ComparedEqual->getName() +
> ".equalbb", Func,
> + PhonySuccessorBB->getNextNode());
> +
> + // By now we have: (1/6)
> + // PreheaderBB: ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // [no terminator instruction!]
> + // PhonyPreheaderBB: <preheader> ; No preds, UNREACHABLE!
> + // br label %LoopHeaderBB
> + // LoopHeaderBB: <header,exiting> ; preds = %PhonyPreheaderBB,
> %LoopLatchBB
> + // <...>
> + // br i1 %<...>, label %LoopLatchBB, label %Successor0BB
> + // LoopLatchBB: <latch,exiting> ; preds = %LoopHeaderBB
> + // <...>
> + // br i1 %<...>, label %Successor1BB, label %LoopHeaderBB
> + // PhonySuccessorBB: ; No preds, UNREACHABLE!
> + // unreachable
> + // EqualBB: ; No preds, UNREACHABLE!
> + // [no terminator instruction!]
> + // UnequalBB: ; No preds, UNREACHABLE!
> + // [no terminator instruction!]
> + // Successor0BB: <exit> ; preds = %LoopHeaderBB
> + // %S0PHI = phi <...> [ <...>, %LoopHeaderBB ]
> + // <...>
> + // Successor1BB: <exit> ; preds = %LoopLatchBB
> + // %S1PHI = phi <...> [ <...>, %LoopLatchBB ]
> + // <...>
> +
> + // What is the mapping/replacement basic block for exiting out of
> the loop
> + // from either of old's loop basic blocks?
> + auto GetReplacementBB = [this, ComparedEqualBB,
> + ComparedUnequalBB](const BasicBlock
> *OldBB) {
> + assert(CurLoop->contains(OldBB) && "Only for loop's basic
> blocks.");
> + if (OldBB == CurLoop->getLoopLatch()) // "all elements compared
> equal".
> + return ComparedEqualBB;
> + if (OldBB == CurLoop->getHeader()) // "element compared
> unequal".
> + return ComparedUnequalBB;
> + llvm_unreachable("Only had two basic blocks in loop.");
> + };
> +
> + // What are the exits out of this loop?
> + SmallVector<Loop::Edge, 2> LoopExitEdges;
> + CurLoop->getExitEdges(LoopExitEdges);
> + assert(LoopExitEdges.size() == 2 && "Should have only to two exit
> edges.");
> +
> + // Populate new basic blocks, update the exiting control-flow, PHI
> nodes.
> + for (const Loop::Edge &Edge : LoopExitEdges) {
> + auto *OldLoopBB = const_cast<BasicBlock *>(Edge.first);
> + auto *SuccessorBB = const_cast<BasicBlock *>(Edge.second);
> + assert(CurLoop->contains(OldLoopBB) && !CurLoop-
> >contains(SuccessorBB) &&
> + "Unexpected edge.");
> +
> + // If we would exit the loop from this loop's basic block,
> + // what semantically would that mean? Did comparison succeed or
> fail?
> + BasicBlock *NewBB = GetReplacementBB(OldLoopBB);
> + assert(NewBB->empty() && "Should not get same new basic block
> here twice.");
> + IRBuilder<> Builder(NewBB);
> + Builder.SetCurrentDebugLocation(OldLoopBB->getTerminator()-
> >getDebugLoc());
> + Builder.CreateBr(SuccessorBB);
> + DTUpdates.push_back({DominatorTree::Insert, NewBB,
> SuccessorBB});
> + // Also, be *REALLY* careful with PHI nodes in successor basic
> block,
> + // update them to recieve the same input value, but not from
> current loop's
> + // basic block, but from new basic block instead.
> + SuccessorBB->replacePhiUsesWith(OldLoopBB, NewBB);
> + // Also, change loop control-flow. This loop's basic block shall
> no longer
> + // exit from the loop to it's original successor basic block,
> but to our new
> + // phony successor basic block. Note that new successor will be
> unique exit.
> + OldLoopBB->getTerminator()->replaceSuccessorWith(SuccessorBB,
> + PhonySuccessorB
> B);
> + DTUpdates.push_back({DominatorTree::Delete, OldLoopBB,
> SuccessorBB});
> + DTUpdates.push_back({DominatorTree::Insert, OldLoopBB,
> PhonySuccessorBB});
> + }
> +
> + // Inform DomTree about edge changes. Note that LoopInfo is still
> out-of-date.
> + assert(DTUpdates.size() == 8 && "Update count prediction
> failed.");
> + DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
> + DTU.applyUpdates(DTUpdates);
> + DTUpdates.clear();
> +
> + // By now we have: (2/6)
> + // PreheaderBB: ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // [no terminator instruction!]
> + // PhonyPreheaderBB: <preheader> ; No preds, UNREACHABLE!
> + // br label %LoopHeaderBB
> + // LoopHeaderBB: <header,exiting> ; preds = %PhonyPreheaderBB,
> %LoopLatchBB
> + // <...>
> + // br i1 %<...>, label %LoopLatchBB, label %PhonySuccessorBB
> + // LoopLatchBB: <latch,exiting> ; preds = %LoopHeaderBB
> + // <...>
> + // br i1 %<...>, label %PhonySuccessorBB, label %LoopHeaderBB
> + // PhonySuccessorBB: <uniq. exit> ; preds = %LoopHeaderBB,
> %LoopLatchBB
> + // unreachable
> + // EqualBB: ; No preds, UNREACHABLE!
> + // br label %Successor1BB
> + // UnequalBB: ; No preds, UNREACHABLE!
> + // br label %Successor0BB
> + // Successor0BB: ; preds = %UnequalBB
> + // %S0PHI = phi <...> [ <...>, %UnequalBB ]
> + // <...>
> + // Successor1BB: ; preds = %EqualBB
> + // %S0PHI = phi <...> [ <...>, %EqualBB ]
> + // <...>
> +
> + // *Finally*, zap the original loop. Record it's parent loop
> though.
> + Loop *ParentLoop = CurLoop->getParentLoop();
> + LLVM_DEBUG(dbgs() << "Deleting old loop.\n");
> + LoopDeleter.markLoopAsDeleted(CurLoop); // Mark as deleted
> *BEFORE* deleting!
> + deleteDeadLoop(CurLoop, DT, SE, LI); // And actually delete the
> loop.
> + CurLoop = nullptr;
> +
> + // By now we have: (3/6)
> + // PreheaderBB: ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // [no terminator instruction!]
> + // PhonyPreheaderBB: ; No preds, UNREACHABLE!
> + // br label %PhonySuccessorBB
> + // PhonySuccessorBB: ; preds = %PhonyPreheaderBB
> + // unreachable
> + // EqualBB: ; No preds, UNREACHABLE!
> + // br label %Successor1BB
> + // UnequalBB: ; No preds, UNREACHABLE!
> + // br label %Successor0BB
> + // Successor0BB: ; preds = %UnequalBB
> + // %S0PHI = phi <...> [ <...>, %UnequalBB ]
> + // <...>
> + // Successor1BB: ; preds = %EqualBB
> + // %S0PHI = phi <...> [ <...>, %EqualBB ]
> + // <...>
> +
> + // Now, actually restore the CFG.
> +
> + // Insert an unconditional branch from an actual preheader basic
> block to
> + // phony preheader basic block.
> + IRBuilder<>(PreheaderBB).CreateBr(PhonyPreheaderBB);
> + DTUpdates.push_back({DominatorTree::Insert, PhonyPreheaderBB,
> HeaderBB});
> + // Insert proper conditional branch from phony successor basic
> block to the
> + // "dispatch" basic blocks, which were used to preserve incoming
> values in
> + // original loop's successor basic blocks.
> + assert(isa<UnreachableInst>(PhonySuccessorBB->getTerminator()) &&
> + "Yep, that's the one we created to keep deleteDeadLoop()
> happy.");
> + PhonySuccessorBB->getTerminator()->eraseFromParent();
> + {
> + IRBuilder<> Builder(PhonySuccessorBB);
> + Builder.SetCurrentDebugLocation(ComparedEqual->getDebugLoc());
> + Builder.CreateCondBr(ComparedEqual, ComparedEqualBB,
> ComparedUnequalBB);
> + }
> + DTUpdates.push_back(
> + {DominatorTree::Insert, PhonySuccessorBB, ComparedEqualBB});
> + DTUpdates.push_back(
> + {DominatorTree::Insert, PhonySuccessorBB, ComparedUnequalBB});
> +
> + BasicBlock *DispatchBB = PhonySuccessorBB;
> + DispatchBB->setName(LoopName + ".bcmpdispatchbb");
> +
> + assert(DTUpdates.size() == 3 && "Update count prediction
> failed.");
> + DTU.applyUpdates(DTUpdates);
> + DTUpdates.clear();
> +
> + // By now we have: (4/6)
> + // PreheaderBB: ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // br label %PhonyPreheaderBB
> + // PhonyPreheaderBB: ; preds = %PreheaderBB
> + // br label %DispatchBB
> + // DispatchBB: ; preds = %PhonyPreheaderBB
> + // br i1 %ComparedEqual, label %EqualBB, label %UnequalBB
> + // EqualBB: ; preds = %DispatchBB
> + // br label %Successor1BB
> + // UnequalBB: ; preds = %DispatchBB
> + // br label %Successor0BB
> + // Successor0BB: ; preds = %UnequalBB
> + // %S0PHI = phi <...> [ <...>, %UnequalBB ]
> + // <...>
> + // Successor1BB: ; preds = %EqualBB
> + // %S0PHI = phi <...> [ <...>, %EqualBB ]
> + // <...>
> +
> + // The basic CFG has been restored! Now let's merge redundant
> basic blocks.
> +
> + // Merge phony successor basic block into it's only predecessor,
> + // phony preheader basic block. It is fully pointlessly redundant.
> + MergeBasicBlockIntoOnlyPred(DispatchBB, &DTU);
> +
> + // By now we have: (5/6)
> + // PreheaderBB: ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // br label %DispatchBB
> + // DispatchBB: ; preds = %PreheaderBB
> + // br i1 %ComparedEqual, label %EqualBB, label %UnequalBB
> + // EqualBB: ; preds = %DispatchBB
> + // br label %Successor1BB
> + // UnequalBB: ; preds = %DispatchBB
> + // br label %Successor0BB
> + // Successor0BB: ; preds = %UnequalBB
> + // %S0PHI = phi <...> [ <...>, %UnequalBB ]
> + // <...>
> + // Successor1BB: ; preds = %EqualBB
> + // %S0PHI = phi <...> [ <...>, %EqualBB ]
> + // <...>
> +
> + // Was this loop nested?
> + if (!ParentLoop) {
> + // If the loop was *NOT* nested, then let's also merge phony
> successor
> + // basic block into it's only predecessor, preheader basic
> block.
> + // Also, here we need to update LoopInfo.
> + LI->removeBlock(PreheaderBB);
> + MergeBasicBlockIntoOnlyPred(DispatchBB, &DTU);
> +
> + // By now we have: (6/6)
> + // DispatchBB: ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB,
> i64 %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // br i1 %ComparedEqual, label %EqualBB, label %UnequalBB
> + // EqualBB: ; preds = %DispatchBB
> + // br label %Successor1BB
> + // UnequalBB: ; preds = %DispatchBB
> + // br label %Successor0BB
> + // Successor0BB: ; preds = %UnequalBB
> + // %S0PHI = phi <...> [ <...>, %UnequalBB ]
> + // <...>
> + // Successor1BB: ; preds = %EqualBB
> + // %S0PHI = phi <...> [ <...>, %EqualBB ]
> + // <...>
> +
> + return DispatchBB;
> + }
> +
> + // Otherwise, we need to "preserve" the LoopSimplify form of the
> deleted loop.
> + // To achieve that, we shall keep the preheader basic block
> (mainly so that
> + // the loop header block will be guaranteed to have a predecessor
> outside of
> + // the loop), and create a phony loop with all these new three
> basic blocks.
> + Loop *PhonyLoop = LI->AllocateLoop();
> + ParentLoop->addChildLoop(PhonyLoop);
> + PhonyLoop->addBasicBlockToLoop(DispatchBB, *LI);
> + PhonyLoop->addBasicBlockToLoop(ComparedEqualBB, *LI);
> + PhonyLoop->addBasicBlockToLoop(ComparedUnequalBB, *LI);
> +
> + // But we only have a preheader basic block, a header basic block
> block and
> + // two exiting basic blocks. For a proper loop we also need a
> backedge from
> + // non-header basic block to header bb.
> + // Let's just add a never-taken branch from both of the exiting
> basic blocks.
> + for (BasicBlock *BB : {ComparedEqualBB, ComparedUnequalBB}) {
> + BranchInst *OldTerminator = cast<BranchInst>(BB-
> >getTerminator());
> + assert(OldTerminator->isUnconditional() && "That's the one we
> created.");
> + BasicBlock *SuccessorBB = OldTerminator->getSuccessor(0);
> +
> + IRBuilder<> Builder(OldTerminator);
> + Builder.SetCurrentDebugLocation(OldTerminator->getDebugLoc());
> + Builder.CreateCondBr(ConstantInt::getTrue(Context), SuccessorBB,
> + DispatchBB);
> + OldTerminator->eraseFromParent();
> + // Yes, the backedge will never be taken. The control-flow is
> redundant.
> + // If it can be simplified further, other passes will take care.
> + DTUpdates.push_back({DominatorTree::Delete, BB, SuccessorBB});
> + DTUpdates.push_back({DominatorTree::Insert, BB, SuccessorBB});
> + DTUpdates.push_back({DominatorTree::Insert, BB, DispatchBB});
> + }
> + assert(DTUpdates.size() == 6 && "Update count prediction
> failed.");
> + DTU.applyUpdates(DTUpdates);
> + DTUpdates.clear();
> +
> + // By now we have: (6/6)
> + // PreheaderBB: <preheader> ; preds = ???
> + // <...>
> + // %memcmp = call i32 @memcmp(i8* %LoadSrcA, i8* %LoadSrcB, i64
> %Nbytes)
> + // %ComparedEqual = icmp eq <...> %memcmp, 0
> + // br label %BCmpDispatchBB
> + // BCmpDispatchBB: <header> ; preds = %PreheaderBB
> + // br i1 %ComparedEqual, label %EqualBB, label %UnequalBB
> + // EqualBB: <latch,exiting> ; preds = %BCmpDispatchBB
> + // br i1 %true, label %Successor1BB, label %BCmpDispatchBB
> + // UnequalBB: <latch,exiting> ; preds = %BCmpDispatchBB
> + // br i1 %true, label %Successor0BB, label %BCmpDispatchBB
> + // Successor0BB: ; preds = %UnequalBB
> + // %S0PHI = phi <...> [ <...>, %UnequalBB ]
> + // <...>
> + // Successor1BB: ; preds = %EqualBB
> + // %S0PHI = phi <...> [ <...>, %EqualBB ]
> + // <...>
> +
> + // Finally fully DONE!
> + return DispatchBB;
> +}
> +
> +void LoopIdiomRecognize::transformLoopToBCmp(ICmpInst *BCmpInst,
> + CmpInst *LatchCmpInst,
> + LoadInst *LoadA,
> LoadInst *LoadB,
> + const SCEV *SrcA, const
> SCEV *SrcB,
> + const SCEV *NBytes) {
> + // We will be inserting before the terminator instruction of
> preheader block.
> + IRBuilder<> Builder(CurLoop->getLoopPreheader()->getTerminator());
> +
> + LLVM_DEBUG(dbgs() << "Transforming bcmp loop idiom into a
> call.\n");
> + LLVM_DEBUG(dbgs() << "Emitting new instructions.\n");
> +
> + // Expand the SCEV expressions for both sources to compare, and
> produce value
> + // for the byte len (beware of Iterations potentially being a
> pointer, and
> + // account for element size being BCmpTyBytes bytes, which may be
> not 1 byte)
> + Value *PtrA, *PtrB, *Len;
> + {
> + SCEVExpander SExp(*SE, *DL, "LoopToBCmp");
> + SExp.setInsertPoint(&*Builder.GetInsertPoint());
> +
> + auto HandlePtr = [&SExp](LoadInst *Load, const SCEV *Src) {
> + SExp.SetCurrentDebugLocation(DebugLoc());
> + // If the pointer operand of original load had dbgloc - use
> it.
> + if (const auto *I = dyn_cast<Instruction>(Load-
> >getPointerOperand()))
> + SExp.SetCurrentDebugLocation(I->getDebugLoc());
> + return SExp.expandCodeFor(Src);
> + };
> + PtrA = HandlePtr(LoadA, SrcA);
> + PtrB = HandlePtr(LoadB, SrcB);
> +
> + // For len calculation let's use dbgloc for the loop's latch
> condition.
> + Builder.SetCurrentDebugLocation(LatchCmpInst->getDebugLoc());
> + SExp.SetCurrentDebugLocation(LatchCmpInst->getDebugLoc());
> + Len = SExp.expandCodeFor(NBytes);
> +
> + Type *CmpFuncSizeTy = DL->getIntPtrType(Builder.getContext());
> + assert(SE->getTypeSizeInBits(Len->getType()) ==
> + DL->getTypeSizeInBits(CmpFuncSizeTy) &&
> + "Len should already have the correct size.");
> +
> + // Make sure that iteration count is a number, insert ptrtoint
> cast if not.
> + if (Len->getType()->isPointerTy())
> + Len = Builder.CreatePtrToInt(Len, CmpFuncSizeTy);
> + assert(Len->getType() == CmpFuncSizeTy && "Should have correct
> type now.");
> +
> + Len->setName(Len->getName() + ".bytecount");
> +
> + // There is no legality check needed. We want to compare that
> the memory
> + // regions [PtrA, PtrA+Len) and [PtrB, PtrB+Len) are fully
> identical, equal.
> + // For them to be fully equal, they must match bit-by-bit. And
> likewise,
> + // for them to *NOT* be fully equal, they have to differ just by
> one bit.
> + // The step of comparison (bits compared at once) simply does
> not matter.
> + }
> +
> + // For the rest of new instructions, dbgloc should point at the
> value cmp.
> + Builder.SetCurrentDebugLocation(BCmpInst->getDebugLoc());
> +
> + // Emit the comparison itself.
> + auto *CmpCall =
> + cast<CallInst>(HasBCmp ? emitBCmp(PtrA, PtrB, Len, Builder,
> *DL, TLI)
> + : emitMemCmp(PtrA, PtrB, Len, Builder,
> *DL, TLI));
> + // FIXME: add {B,Mem}CmpInst with MemoryCompareInst
> + // (based on MemIntrinsicBase) as base?
> + // FIXME: propagate metadata from loads? (alignments, AS, TBAA,
> ...)
> +
> + // {b,mem}cmp returned 0 if they were equal, or non-zero if not
> equal.
> + auto *ComparedEqual = cast<ICmpInst>(Builder.CreateICmpEQ(
> + CmpCall, ConstantInt::get(CmpCall->getType(), 0),
> + PtrA->getName() + ".vs." + PtrB->getName() + ".eqcmp"));
> +
> + BasicBlock *BB = transformBCmpControlFlow(ComparedEqual);
> + Builder.ClearInsertionPoint();
> +
> + // We're done.
> + LLVM_DEBUG(dbgs() << "Transformed loop bcmp idiom into a
> call.\n");
> + ORE.emit([&]() {
> + return OptimizationRemark(DEBUG_TYPE,
> "TransformedBCmpIdiomToCall",
> + CmpCall->getDebugLoc(), BB)
> + << "Transformed bcmp idiom into a call to "
> + << ore::NV("NewFunction", CmpCall->getCalledFunction())
> + << "() function";
> + });
> + ++NumBCmp;
> +}
> +
> +/// Recognizes a bcmp idiom in a non-countable loop.
> +///
> +/// If detected, transforms the relevant code to issue the bcmp (or
> memcmp)
> +/// intrinsic function call, and returns true; otherwise, returns
> false.
> +bool LoopIdiomRecognize::recognizeBCmp() {
> + if (!HasMemCmp && !HasBCmp)
> + return false;
> +
> + ICmpInst *BCmpInst;
> + CmpInst *LatchCmpInst;
> + LoadInst *LoadA, *LoadB;
> + const SCEV *SrcA, *SrcB, *NBytes;
> + if (!detectBCmpIdiom(BCmpInst, LatchCmpInst, LoadA, LoadB, SrcA,
> SrcB,
> + NBytes)) {
> + LLVM_DEBUG(dbgs() << "bcmp idiom recognition failed.\n");
> + return false;
> + }
> +
> + transformLoopToBCmp(BCmpInst, LatchCmpInst, LoadA, LoadB, SrcA,
> SrcB, NBytes);
> + return true;
> +}
>
> Modified: llvm/trunk/test/Transforms/LoopIdiom/bcmp-basic.ll
> URL:
> https://protect2.fireeye.com/url?k=4cbc9876-10689035-4cbcd8ed-86a1150bc3ba-9c5a050ca9d4dae5&q=1&u=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FLoopIdiom%2Fbcmp-basic.ll%3Frev%3D374662%26r1%3D374661%26r2%3D374662%26view%3Ddiff
> =====================================================================
> =========
> --- llvm/trunk/test/Transforms/LoopIdiom/bcmp-basic.ll (original)
> +++ llvm/trunk/test/Transforms/LoopIdiom/bcmp-basic.ll Sat Oct 12
> 08:35:32 2019
> @@ -1,5 +1,5 @@
> ; NOTE: Assertions have been autogenerated by
> utils/update_test_checks.py
> -; RUN: opt -loop-idiom < %s -S | FileCheck %s
> +; RUN: opt -loop-idiom -verify -verify-each -verify-dom-info
> -verify-loop-info < %s -S | FileCheck %s
>
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-
> i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-
> s0:64:64-f80:128:128-n8:16:32:64"
>
> @@ -239,24 +239,17 @@ target datalayout = "e-p:64:64:64-i1:8:8
>
> define i1 @_Z39pointer_iteration_const_size_no_overlapPKc(i8* %ptr)
> {
> ; CHECK-LABEL: @_Z39pointer_iteration_const_size_no_overlapPKc(
> -; CHECK-NEXT: entry:
> +; CHECK-NEXT: for.body.i.i.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 8
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[ADD_PTR]],
> [[ENTRY:%.*]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_IDX:%.*]] = phi i64 [
> [[__FIRST1_ADDR_06_I_I_ADD:%.*]], [[FOR_INC_I_I]] ], [ 0, [[ENTRY]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_PTR:%.*]] = getelementptr
> inbounds i8, i8* [[PTR]], i64 [[__FIRST1_ADDR_06_I_I_IDX]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8*
> [[__FIRST1_ADDR_06_I_I_PTR]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_ADD]] = add nuw nsw i64
> [[__FIRST1_ADDR_06_I_I_IDX]], 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i64
> [[__FIRST1_ADDR_06_I_I_ADD]], 8
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], label [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 8)
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> -; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RETVAL_0_I_I]]
> ;
> entry:
> @@ -285,24 +278,17 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
>
> define i1 @_Z44pointer_iteration_const_size_partial_overlapPKc(i8*
> %ptr) {
> ; CHECK-LABEL: @_Z44pointer_iteration_const_size_partial_overlapPKc(
> -; CHECK-NEXT: entry:
> +; CHECK-NEXT: for.body.i.i.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR1:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 8
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[ADD_PTR1]],
> [[ENTRY:%.*]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_IDX:%.*]] = phi i64 [
> [[__FIRST1_ADDR_06_I_I_ADD:%.*]], [[FOR_INC_I_I]] ], [ 0, [[ENTRY]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_PTR:%.*]] = getelementptr
> inbounds i8, i8* [[PTR]], i64 [[__FIRST1_ADDR_06_I_I_IDX]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8*
> [[__FIRST1_ADDR_06_I_I_PTR]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_ADD]] = add nuw nsw i64
> [[__FIRST1_ADDR_06_I_I_IDX]], 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i64
> [[__FIRST1_ADDR_06_I_I_ADD]], 16
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], label [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR1]], i64 16)
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR1_EQCMP]], label
> [[PTR_VS_ADD_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> -; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RETVAL_0_I_I]]
> ;
> entry:
> @@ -331,23 +317,16 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
>
> define i1
> @_Z44pointer_iteration_const_size_overlap_unknownPKcS0_(i8* %ptr0,
> i8* %ptr1) {
> ; CHECK-LABEL:
> @_Z44pointer_iteration_const_size_overlap_unknownPKcS0_(
> -; CHECK-NEXT: entry:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[ENTRY:%.*]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_IDX:%.*]] = phi i64 [
> [[__FIRST1_ADDR_06_I_I_ADD:%.*]], [[FOR_INC_I_I]] ], [ 0, [[ENTRY]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_PTR:%.*]] = getelementptr
> inbounds i8, i8* [[PTR0:%.*]], i64 [[__FIRST1_ADDR_06_I_I_IDX]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8*
> [[__FIRST1_ADDR_06_I_I_PTR]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I_ADD]] = add nuw nsw i64
> [[__FIRST1_ADDR_06_I_I_IDX]], 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i64
> [[__FIRST1_ADDR_06_I_I_ADD]], 8
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], label [[FOR_BODY_I_I]]
> +; CHECK-NEXT: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 8)
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> -; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RETVAL_0_I_I]]
> ;
> entry:
> @@ -376,25 +355,19 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> define i1 @_Z42pointer_iteration_variable_size_no_overlapPKcm(i8*
> %ptr, i64 %count) {
> ; CHECK-LABEL: @_Z42pointer_iteration_variable_size_no_overlapPKcm(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[ADD_PTR]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]],
> 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[ENTRY:%.*]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -427,27 +400,21 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> define i1
> @_Z47pointer_iteration_variable_size_partial_overlapPKcm(i8* %ptr,
> i64 %count) {
> ; CHECK-LABEL:
> @_Z47pointer_iteration_variable_size_partial_overlapPKcm(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[MUL:%.*]] = shl i64 [[COUNT:%.*]], 1
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[MUL]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[MUL]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> +; CHECK-NEXT: [[MUL_BYTECOUNT:%.*]] = shl i64 [[COUNT:%.*]], 1
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[MUL_BYTECOUNT]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[MUL_BYTECOUNT]], 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR1:%.*]] = getelementptr inbounds i8, i8*
> [[PTR]], i64 [[COUNT]]
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[ADD_PTR1]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR1]], i64 [[MUL_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR1_EQCMP]], label
> [[PTR_VS_ADD_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[ENTRY:%.*]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -485,25 +452,19 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> define i1
> @_Z47pointer_iteration_variable_size_overlap_unknownPKcS0_m(i8*
> %ptr0, i8* %ptr1, i64 %count) {
> ; CHECK-LABEL:
> @_Z47pointer_iteration_variable_size_overlap_unknownPKcS0_m(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]],
> 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0]], i8*
> [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[ENTRY:%.*]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -535,23 +496,17 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
>
> define i1 @_Z40index_iteration_eq_const_size_no_overlapPKc(i8* %ptr)
> {
> ; CHECK-LABEL: @_Z40index_iteration_eq_const_size_no_overlapPKc(
> -; CHECK-NEXT: entry:
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 8
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], 8
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_013:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
> [[INC]], [[FOR_COND:%.*]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR]], i64 [[I_013]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[ADD_PTR]], i64 [[I_013]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_013]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 8)
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RES]]
> ;
> entry:
> @@ -579,23 +534,17 @@ cleanup:
>
> define i1 @_Z45index_iteration_eq_const_size_partial_overlapPKc(i8*
> %ptr) {
> ; CHECK-LABEL:
> @_Z45index_iteration_eq_const_size_partial_overlapPKc(
> -; CHECK-NEXT: entry:
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 8
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], 16
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_013:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
> [[INC]], [[FOR_COND:%.*]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR]], i64 [[I_013]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[ADD_PTR]], i64 [[I_013]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_013]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 16)
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RES]]
> ;
> entry:
> @@ -623,22 +572,16 @@ cleanup:
>
> define i1
> @_Z45index_iteration_eq_const_size_overlap_unknownPKcS0_(i8* %ptr0,
> i8* %ptr1) {
> ; CHECK-LABEL:
> @_Z45index_iteration_eq_const_size_overlap_unknownPKcS0_(
> -; CHECK-NEXT: entry:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], 8
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_08:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
> [[INC]], [[FOR_COND:%.*]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[I_08]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[PTR1:%.*]], i64 [[I_08]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP]]
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 8)
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RES]]
> ;
> entry:
> @@ -666,25 +609,19 @@ cleanup:
> define i1 @_Z43index_iteration_eq_variable_size_no_overlapPKcm(i8*
> %ptr, i64 %count) {
> ; CHECK-LABEL: @_Z43index_iteration_eq_variable_size_no_overlapPKcm(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], [[COUNT]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_015:%.*]] = phi i64 [ [[INC]], [[FOR_COND:%.*]]
> ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR]], i64 [[I_015]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[ADD_PTR]], i64 [[I_015]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_015]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]], 0
> +; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -718,25 +655,19 @@ define i1 @_Z48index_iteration_eq_variab
> ; CHECK-LABEL:
> @_Z48index_iteration_eq_variable_size_partial_overlapPKcm(
> ; CHECK-NEXT: entry:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[MUL:%.*]] = shl i64 [[COUNT]], 1
> -; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[MUL]], 0
> -; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], [[MUL]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_015:%.*]] = phi i64 [ [[INC]], [[FOR_COND:%.*]]
> ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR]], i64 [[I_015]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[ADD_PTR]], i64 [[I_015]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_015]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: [[MUL_BYTECOUNT:%.*]] = shl i64 [[COUNT]], 1
> +; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[MUL_BYTECOUNT]], 0
> +; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 [[MUL_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -770,24 +701,18 @@ cleanup:
> define i1
> @_Z48index_iteration_eq_variable_size_overlap_unknownPKcS0_m(i8*
> %ptr0, i8* %ptr1, i64 %count) {
> ; CHECK-LABEL:
> @_Z48index_iteration_eq_variable_size_overlap_unknownPKcS0_m(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[CMP8:%.*]] = icmp eq i64 [[COUNT:%.*]], 0
> -; CHECK-NEXT: br i1 [[CMP8]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], [[COUNT]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_09:%.*]] = phi i64 [ [[INC]], [[FOR_COND:%.*]]
> ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[I_09]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[PTR1:%.*]], i64 [[I_09]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_09]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: [[CMP8:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT:%.*]],
> 0
> +; CHECK-NEXT: br i1 [[CMP8]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -818,22 +743,18 @@ cleanup:
>
> define i1 @_Z38index_iteration_starting_from_negativePKcS0_(i8*
> %ptr0, i8* %ptr1) {
> ; CHECK-LABEL: @_Z38index_iteration_starting_from_negativePKcS0_(
> -; CHECK-NEXT: entry:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT:%.*]],
> 4
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ -4, [[ENTRY:%.*]] ],
> [ [[INDVARS_IV_NEXT]], [[FOR_COND:%.*]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[INDVARS_IV]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i8, i8*
> [[PTR1:%.*]], i64 [[INDVARS_IV]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX2]]
> -; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 1
> -; CHECK-NEXT: br i1 [[CMP4]], label [[FOR_COND]], label
> [[CLEANUP]]
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, i8*
> [[PTR0:%.*]], i64 -4
> +; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, i8*
> [[PTR1:%.*]], i64 -4
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[SCEVGEP]],
> i8* [[SCEVGEP1]], i64 8)
> +; CHECK-NEXT: [[SCEVGEP_VS_SCEVGEP1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[SCEVGEP_VS_SCEVGEP1_EQCMP]], label
> [[SCEVGEP_VS_SCEVGEP1_EQCMP_EQUALBB:%.*]], label
> [[SCEVGEP_VS_SCEVGEP1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: scevgep.vs.scevgep1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: scevgep.vs.scevgep1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RET:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RET:%.*]] = phi i1 [ false,
> [[SCEVGEP_VS_SCEVGEP1_EQCMP_UNEQUALBB]] ], [ true,
> [[SCEVGEP_VS_SCEVGEP1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RET]]
> ;
> entry:
> @@ -860,25 +781,17 @@ cleanup:
>
> define i1 @_Z43combined_iteration_eq_const_size_no_overlapPKc(i8*
> %ptr) {
> ; CHECK-LABEL: @_Z43combined_iteration_eq_const_size_no_overlapPKc(
> -; CHECK-NEXT: entry:
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 8
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_015:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
> [[INC:%.*]], [[FOR_INC:%.*]] ]
> -; CHECK-NEXT: [[PTR1_014:%.*]] = phi i8* [ [[ADD_PTR]], [[ENTRY]]
> ], [ [[INCDEC_PTR3:%.*]], [[FOR_INC]] ]
> -; CHECK-NEXT: [[PTR0_013:%.*]] = phi i8* [ [[PTR]], [[ENTRY]] ],
> [ [[INCDEC_PTR:%.*]], [[FOR_INC]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[PTR0_013]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[PTR1_014]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_015]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_013]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_014]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 8
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 8)
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RES]]
> ;
> entry:
> @@ -908,25 +821,17 @@ cleanup:
>
> define i1
> @_Z48combined_iteration_eq_const_size_partial_overlapPKc(i8* %ptr) {
> ; CHECK-LABEL:
> @_Z48combined_iteration_eq_const_size_partial_overlapPKc(
> -; CHECK-NEXT: entry:
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 8
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_015:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
> [[INC:%.*]], [[FOR_INC:%.*]] ]
> -; CHECK-NEXT: [[PTR1_014:%.*]] = phi i8* [ [[ADD_PTR]], [[ENTRY]]
> ], [ [[INCDEC_PTR3:%.*]], [[FOR_INC]] ]
> -; CHECK-NEXT: [[PTR0_013:%.*]] = phi i8* [ [[PTR]], [[ENTRY]] ],
> [ [[INCDEC_PTR:%.*]], [[FOR_INC]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[PTR0_013]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[PTR1_014]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_015]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_013]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_014]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 16
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 16)
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RES]]
> ;
> entry:
> @@ -956,24 +861,16 @@ cleanup:
>
> define i1
> @_Z48combined_iteration_eq_const_size_overlap_unknownPKcS0_(i8*
> %ptr0, i8* %ptr1) {
> ; CHECK-LABEL:
> @_Z48combined_iteration_eq_const_size_overlap_unknownPKcS0_(
> -; CHECK-NEXT: entry:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_010:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
> [[INC:%.*]], [[FOR_INC:%.*]] ]
> -; CHECK-NEXT: [[PTR1_ADDR_09:%.*]] = phi i8* [ [[PTR1:%.*]],
> [[ENTRY]] ], [ [[INCDEC_PTR3:%.*]], [[FOR_INC]] ]
> -; CHECK-NEXT: [[PTR0_ADDR_08:%.*]] = phi i8* [ [[PTR0:%.*]],
> [[ENTRY]] ], [ [[INCDEC_PTR:%.*]], [[FOR_INC]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[PTR0_ADDR_08]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[PTR1_ADDR_09]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_010]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_ADDR_08]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_ADDR_09]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 8
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP]]
> +; CHECK-NEXT: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 8)
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> -; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: ret i1 [[RES]]
> ;
> entry:
> @@ -1003,27 +900,19 @@ cleanup:
> define i1
> @_Z46combined_iteration_eq_variable_size_no_overlapPKcm(i8* %ptr, i64
> %count) {
> ; CHECK-LABEL:
> @_Z46combined_iteration_eq_variable_size_no_overlapPKcm(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[COUNT:%.*]], 0
> -; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT]]
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_017:%.*]] = phi i64 [ [[INC:%.*]],
> [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR1_016:%.*]] = phi i8* [ [[INCDEC_PTR3:%.*]],
> [[FOR_INC]] ], [ [[ADD_PTR]], [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR0_015:%.*]] = phi i8* [ [[INCDEC_PTR:%.*]],
> [[FOR_INC]] ], [ [[PTR]], [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[PTR0_015]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[PTR1_016]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_017]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_015]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_016]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], [[COUNT]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64
> [[COUNT_BYTECOUNT:%.*]], 0
> +; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT_BYTECOUNT]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -1061,28 +950,20 @@ cleanup:
> define i1
> @_Z51combined_iteration_eq_variable_size_partial_overlapPKcm(i8*
> %ptr, i64 %count) {
> ; CHECK-LABEL:
> @_Z51combined_iteration_eq_variable_size_partial_overlapPKcm(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[MUL:%.*]] = shl i64 [[COUNT:%.*]], 1
> -; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[MUL]], 0
> -; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> +; CHECK-NEXT: [[MUL_BYTECOUNT:%.*]] = shl i64 [[COUNT:%.*]], 1
> +; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[MUL_BYTECOUNT]], 0
> +; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT]]
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_017:%.*]] = phi i64 [ [[INC:%.*]],
> [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR1_016:%.*]] = phi i8* [ [[INCDEC_PTR3:%.*]],
> [[FOR_INC]] ], [ [[ADD_PTR]], [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR0_015:%.*]] = phi i8* [ [[INCDEC_PTR:%.*]],
> [[FOR_INC]] ], [ [[PTR]], [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[PTR0_015]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[PTR1_016]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_017]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_015]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_016]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], [[MUL]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 [[MUL_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -1121,26 +1002,18 @@ cleanup:
> define i1
> @_Z51combined_iteration_eq_variable_size_overlap_unknownPKcS0_m(i8*
> %ptr0, i8* %ptr1, i64 %count) {
> ; CHECK-LABEL:
> @_Z51combined_iteration_eq_variable_size_overlap_unknownPKcS0_m(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[CMP8:%.*]] = icmp eq i64 [[COUNT:%.*]], 0
> -; CHECK-NEXT: br i1 [[CMP8]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_011:%.*]] = phi i64 [ [[INC:%.*]],
> [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR1_ADDR_010:%.*]] = phi i8* [
> [[INCDEC_PTR3:%.*]], [[FOR_INC]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR0_ADDR_09:%.*]] = phi i8* [
> [[INCDEC_PTR:%.*]], [[FOR_INC]] ], [ [[PTR0:%.*]],
> [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[PTR0_ADDR_09]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[PTR1_ADDR_010]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_011]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_ADDR_09]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_ADDR_010]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], [[COUNT]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: [[CMP8:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT:%.*]],
> 0
> +; CHECK-NEXT: br i1 [[CMP8]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -1174,25 +1047,19 @@ cleanup:
> define i1
> @_Z55negated_pointer_iteration_variable_size_overlap_unknownPKcS0_m(i
> 8* %ptr0, i8* %ptr1, i64 %count) {
> ; CHECK-LABEL:
> @_Z55negated_pointer_iteration_variable_size_overlap_unknownPKcS0_m(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]],
> 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0]], i8*
> [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ true,
> [[FOR_BODY_I_I]] ], [ false, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ true,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ false,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ false,
> [[ENTRY:%.*]] ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -1227,23 +1094,24 @@ define i1 @_Z55integer_pointer_iteration
> ; CHECK-NEXT: entry:
> ; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i32, i32*
> [[PTR0:%.*]], i64 [[COUNT:%.*]]
> ; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i32* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i32* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i32, i32*
> [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T1:%.*]] = load i32, i32*
> [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i32 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i32,
> i32* [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i32,
> i32* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i32* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[TMP0:%.*]] = shl nsw i64 [[COUNT]], 2
> +; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], -4
> +; CHECK-NEXT: [[TMP2:%.*]] = lshr i64 [[TMP1]], 2
> +; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
> +; CHECK-NEXT: [[DOTBYTECOUNT:%.*]] = add i64 [[TMP3]], 4
> +; CHECK-NEXT: [[CSTR:%.*]] = bitcast i32* [[PTR0]] to i8*
> +; CHECK-NEXT: [[CSTR1:%.*]] = bitcast i32* [[PTR1:%.*]] to i8*
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[CSTR]], i8*
> [[CSTR1]], i64 [[DOTBYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKiS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKiS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[ENTRY:%.*]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKIS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -1277,25 +1145,18 @@ define i1 @_Z21small_index_iterationPKcS
> ; CHECK-LABEL: @_Z21small_index_iterationPKcS0_i(
> ; CHECK-NEXT: entry:
> ; CHECK-NEXT: [[CMP8:%.*]] = icmp sgt i32 [[COUNT:%.*]], 0
> -; CHECK-NEXT: br i1 [[CMP8]], label [[FOR_BODY_PREHEADER:%.*]],
> label [[CLEANUP:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_011:%.*]] = phi i32 [ [[INC:%.*]],
> [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR1_ADDR_010:%.*]] = phi i8* [
> [[INCDEC_PTR3:%.*]], [[FOR_INC]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR0_ADDR_09:%.*]] = phi i8* [
> [[INCDEC_PTR:%.*]], [[FOR_INC]] ], [ [[PTR0:%.*]],
> [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i8, i8* [[PTR0_ADDR_09]]
> -; CHECK-NEXT: [[T1:%.*]] = load i8, i8* [[PTR1_ADDR_010]]
> -; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_INC]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_011]], 1
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_ADDR_09]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR3]] = getelementptr inbounds i8, i8*
> [[PTR1_ADDR_010]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[COUNT]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: br i1 [[CMP8]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]], label [[CLEANUP:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[DOTBYTECOUNT:%.*]] = zext i32 [[COUNT]] to i64
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 [[DOTBYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[T2_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[T2_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[T2:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[T2_PH]], [[CLEANUP_LOOPEXIT]] ]
> @@ -1329,24 +1190,22 @@ cleanup:
> define i1 @_Z23three_pointer_iterationPKcS0_S0_(i8* %ptr0, i8*
> %ptr0_end, i8* %ptr1) {
> ; CHECK-LABEL: @_Z23three_pointer_iterationPKcS0_S0_(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i8* [[PTR0:%.*]],
> [[PTR0_END:%.*]]
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[PTR0_END]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[PTR01:%.*]] = ptrtoint i8* [[PTR0:%.*]] to i64
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i8* [[PTR0]],
> [[PTR0_END:%.*]]
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[TMP0:%.*]] = sub i64 0, [[PTR01]]
> +; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, i8*
> [[PTR0_END]], i64 [[TMP0]]
> +; CHECK-NEXT: [[DOTBYTECOUNT:%.*]] = ptrtoint i8* [[SCEVGEP]] to
> i64
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0]], i8*
> [[PTR1:%.*]], i64 [[DOTBYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[ENTRY:%.*]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -1378,25 +1237,19 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> define i32 @_Z17value_propagationPKcS0_mii(i8* %ptr0, i8* %ptr1, i64
> %count, i32 %on_equal, i32 %on_unequal) {
> ; CHECK-LABEL: @_Z17value_propagationPKcS0_mii(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]],
> 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0]], i8*
> [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[T2_PH:%.*]] = phi i32 [ [[ON_UNEQUAL:%.*]],
> [[FOR_BODY_I_I]] ], [ [[ON_EQUAL:%.*]], [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[T2_PH:%.*]] = phi i32 [ [[ON_UNEQUAL:%.*]],
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ [[ON_EQUAL:%.*]],
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[T2:%.*]] = phi i32 [ [[ON_EQUAL]], [[ENTRY:%.*]]
> ], [ [[T2_PH]], [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -1429,23 +1282,17 @@ _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> define void @_Z20multiple_exit_blocksPKcS0_m(i8* %ptr0, i8* %ptr1,
> i64 %count) {
> ; CHECK-LABEL: @_Z20multiple_exit_blocksPKcS0_m(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label [[IF_END:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[IF_THEN:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label [[IF_END_LOOPEXIT:%.*]],
> label [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]],
> 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label [[IF_END:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0]], i8*
> [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[IF_END_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[IF_THEN:%.*]]
> ; CHECK: if.then:
> ; CHECK-NEXT: tail call void @_Z17callee_on_unequalv()
> ; CHECK-NEXT: br label [[RETURN:%.*]]
> @@ -1493,26 +1340,20 @@ declare void @_Z17callee_on_successv()
> define void @_Z13multiple_phisPKcS0_mS0_S0_S0_S0_PS0_S1_(i8* %ptr0,
> i8* %ptr1, i64 %count, i8* %v0, i8* %v1, i8* %v2, i8* %v3, i8**
> %out0, i8** %out1) {
> ; CHECK-LABEL: @_Z13multiple_phisPKcS0_mS0_S0_S0_S0_PS0_S1_(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT:%.*]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT]], 0
> -; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> -; CHECK: for.body.i.i.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[PTR0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T0:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T1:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T0]], [[T1]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[COUNT_BYTECOUNT:%.*]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]],
> 0
> +; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0]], i8*
> [[PTR1:%.*]], i64 [[COUNT_BYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[T2_PH:%.*]] = phi i8* [ [[V2:%.*]],
> [[FOR_BODY_I_I]] ], [ [[V0:%.*]], [[FOR_INC_I_I]] ]
> -; CHECK-NEXT: [[T3_PH:%.*]] = phi i8* [ [[V3:%.*]],
> [[FOR_BODY_I_I]] ], [ [[V1:%.*]], [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[T2_PH:%.*]] = phi i8* [ [[V2:%.*]],
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ [[V0:%.*]],
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> +; CHECK-NEXT: [[T3_PH:%.*]] = phi i8* [ [[V3:%.*]],
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ [[V1:%.*]],
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[T2:%.*]] = phi i8* [ [[V0]], [[ENTRY:%.*]] ], [
> [[T2_PH]], [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -1564,28 +1405,24 @@ define void @_Z16loop_within_loopmPPKcS1
> ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8*, i8**
> [[PTR0:%.*]], i64 [[I_012]]
> ; CHECK-NEXT: [[T0:%.*]] = load i8*, i8** [[ARRAYIDX]]
> ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i64,
> i64* [[COUNT:%.*]], i64 [[I_012]]
> -; CHECK-NEXT: [[T1:%.*]] = load i64, i64* [[ARRAYIDX2]]
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[T0]], i64 [[T1]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[T1]], 0
> +; CHECK-NEXT: [[T1_BYTECOUNT:%.*]] = load i64, i64* [[ARRAYIDX2]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[T0]], i64 [[T1_BYTECOUNT]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[T1_BYTECOUNT]], 0
> ; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> ; CHECK: for.body.i.i.preheader:
> ; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8*,
> i8** [[PTR1:%.*]], i64 [[I_012]]
> ; CHECK-NEXT: [[T2:%.*]] = load i8*, i8** [[ARRAYIDX3]]
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[T2]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[T0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T3:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T4:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T3]], [[T4]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[T0]], i8*
> [[T2]], i64 [[T1_BYTECOUNT]])
> +; CHECK-NEXT: [[T0_VS_T2_EQCMP:%.*]] = icmp eq i32 [[MEMCMP]], 0
> +; CHECK-NEXT: br label [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: br i1 [[T0_VS_T2_EQCMP]], label
> [[T0_VS_T2_EQCMP_EQUALBB:%.*]], label
> [[T0_VS_T2_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: t0.vs.t2.eqcmp.equalbb:
> +; CHECK-NEXT: br i1 true, label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB]]
> +; CHECK: t0.vs.t2.eqcmp.unequalbb:
> +; CHECK-NEXT: br i1 true, label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[T0_VS_T2_EQCMP_UNEQUALBB]] ], [ true, [[T0_VS_T2_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]]
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> ; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[FOR_BODY]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ]
> @@ -1651,26 +1488,22 @@ define void @_Z42loop_within_loop_with_m
> ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8*, i8**
> [[PTR0:%.*]], i64 [[I_012]]
> ; CHECK-NEXT: [[T0:%.*]] = load i8*, i8** [[ARRAYIDX]]
> ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i64,
> i64* [[COUNT:%.*]], i64 [[I_012]]
> -; CHECK-NEXT: [[T1:%.*]] = load i64, i64* [[ARRAYIDX2]]
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[T0]], i64 [[T1]]
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[T1]], 0
> +; CHECK-NEXT: [[T1_BYTECOUNT:%.*]] = load i64, i64* [[ARRAYIDX2]]
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[T0]], i64 [[T1_BYTECOUNT]]
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[T1_BYTECOUNT]], 0
> ; CHECK-NEXT: br i1 [[CMP5_I_I]], label [[IF_END]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]]
> ; CHECK: for.body.i.i.preheader:
> ; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8*,
> i8** [[PTR1:%.*]], i64 [[I_012]]
> ; CHECK-NEXT: [[T2:%.*]] = load i8*, i8** [[ARRAYIDX3]]
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]]
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[T2]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[T0]],
> [[FOR_BODY_I_I_PREHEADER]] ]
> -; CHECK-NEXT: [[T3:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]]
> -; CHECK-NEXT: [[T4:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]]
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T3]], [[T4]]
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[IF_THEN:%.*]]
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]]
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label [[IF_END_LOOPEXIT:%.*]],
> label [[FOR_BODY_I_I]]
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[T0]], i8*
> [[T2]], i64 [[T1_BYTECOUNT]])
> +; CHECK-NEXT: [[T0_VS_T2_EQCMP:%.*]] = icmp eq i32 [[MEMCMP]], 0
> +; CHECK-NEXT: br label [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: br i1 [[T0_VS_T2_EQCMP]], label
> [[T0_VS_T2_EQCMP_EQUALBB:%.*]], label
> [[T0_VS_T2_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: t0.vs.t2.eqcmp.equalbb:
> +; CHECK-NEXT: br i1 true, label [[IF_END_LOOPEXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB]]
> +; CHECK: t0.vs.t2.eqcmp.unequalbb:
> +; CHECK-NEXT: br i1 true, label [[IF_THEN:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB]]
> ; CHECK: if.then:
> ; CHECK-NEXT: tail call void @_Z17callee_on_unequalv()
> ; CHECK-NEXT: br label [[CLEANUP]]
> @@ -1740,19 +1573,17 @@ define void @_Z21endless_loop_if_equalPi
> ; CHECK: for.cond.loopexit:
> ; CHECK-NEXT: br label [[FOR_COND]]
> ; CHECK: for.cond:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.cond1:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV_NEXT:%.*]],
> 4
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[FOR_COND_LOOPEXIT:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[FOR_COND]] ], [
> [[INDVARS_IV_NEXT]], [[FOR_COND1:%.*]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32*
> [[A:%.*]], i64 [[INDVARS_IV]]
> -; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i32,
> i32* [[B:%.*]], i64 [[INDVARS_IV]]
> -; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[ARRAYIDX3]]
> -; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i32 [[TMP0]], [[TMP1]]
> -; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64
> [[INDVARS_IV]], 1
> -; CHECK-NEXT: br i1 [[CMP4]], label [[FOR_COND1]], label
> [[RETURN:%.*]]
> +; CHECK-NEXT: [[CSTR:%.*]] = bitcast i32* [[A:%.*]] to i8*
> +; CHECK-NEXT: [[CSTR1:%.*]] = bitcast i32* [[B:%.*]] to i8*
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[CSTR]], i8*
> [[CSTR1]], i64 16)
> +; CHECK-NEXT: [[A_VS_B_EQCMP:%.*]] = icmp eq i32 [[MEMCMP]], 0
> +; CHECK-NEXT: br label [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: br i1 [[A_VS_B_EQCMP]], label
> [[A_VS_B_EQCMP_EQUALBB:%.*]], label [[A_VS_B_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: a.vs.b.eqcmp.equalbb:
> +; CHECK-NEXT: br i1 true, label [[FOR_COND_LOOPEXIT:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB]]
> +; CHECK: a.vs.b.eqcmp.unequalbb:
> +; CHECK-NEXT: br i1 true, label [[RETURN:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB]]
> ; CHECK: return:
> ; CHECK-NEXT: ret void
> ;
> @@ -1784,27 +1615,19 @@ define i1 @_Z21load_of_bitcastsPKcPKfm(i
> ; CHECK-LABEL: @_Z21load_of_bitcastsPKcPKfm(
> ; CHECK-NEXT: entry:
> ; CHECK-NEXT: [[CMP13:%.*]] = icmp eq i64 [[COUNT:%.*]], 0
> -; CHECK-NEXT: br i1 [[CMP13]], label [[CLEANUP3:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[PTR0_ADDR_016:%.*]] = phi i8* [ [[ADD_PTR:%.*]],
> [[FOR_INC:%.*]] ], [ [[PTR0:%.*]], [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[I_015:%.*]] = phi i64 [ [[INC:%.*]], [[FOR_INC]]
> ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[PTR1_ADDR_014:%.*]] = phi float* [
> [[INCDEC_PTR:%.*]], [[FOR_INC]] ], [ [[PTR1:%.*]],
> [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[V0_0__SROA_CAST:%.*]] = bitcast i8*
> [[PTR0_ADDR_016]] to i32*
> -; CHECK-NEXT: [[V0_0_COPYLOAD:%.*]] = load i32, i32*
> [[V0_0__SROA_CAST]]
> -; CHECK-NEXT: [[V1_0__SROA_CAST:%.*]] = bitcast float*
> [[PTR1_ADDR_014]] to i32*
> -; CHECK-NEXT: [[V1_0_COPYLOAD:%.*]] = load i32, i32*
> [[V1_0__SROA_CAST]]
> -; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[V0_0_COPYLOAD]],
> [[V1_0_COPYLOAD]]
> -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_INC]], label
> [[CLEANUP3_LOOPEXIT:%.*]]
> -; CHECK: for.inc:
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_015]], 1
> -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8*
> [[PTR0_ADDR_016]], i64 4
> -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float,
> float* [[PTR1_ADDR_014]], i64 1
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], [[COUNT]]
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP3_LOOPEXIT]]
> +; CHECK-NEXT: br i1 [[CMP13]], label [[CLEANUP3:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[DOTBYTECOUNT:%.*]] = shl nuw i64 [[COUNT]], 2
> +; CHECK-NEXT: [[CSTR:%.*]] = bitcast float* [[PTR1:%.*]] to i8*
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[CSTR]], i64 [[DOTBYTECOUNT]])
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP3_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP3_LOOPEXIT]]
> ; CHECK: cleanup3.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_INC]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP3]]
> ; CHECK: cleanup3:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP3_LOOPEXIT]] ]
> @@ -1898,23 +1721,17 @@ cleanup4:
> define i1 @exit_block_is_not_dedicated(i8* %ptr0, i8* %ptr1) {
> ; CHECK-LABEL: @exit_block_is_not_dedicated(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: br i1 true, label [[FOR_BODY_PREHEADER:%.*]], label
> [[CLEANUP:%.*]]
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]]
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_08:%.*]] = phi i64 [ [[INC:%.*]],
> [[FOR_COND:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR0:%.*]], i64 [[I_08]]
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]]
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[PTR1:%.*]], i64 [[I_08]]
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]]
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]]
> -; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP_LOOPEXIT:%.*]]
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 8
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT]]
> +; CHECK-NEXT: br i1 true, label [[FOR_BODY_BCMPDISPATCHBB:%.*]],
> label [[CLEANUP:%.*]]
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR0:%.*]],
> i8* [[PTR1:%.*]], i64 8)
> +; CHECK-NEXT: [[PTR0_VS_PTR1_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0
> +; CHECK-NEXT: br i1 [[PTR0_VS_PTR1_EQCMP]], label
> [[PTR0_VS_PTR1_EQCMP_EQUALBB:%.*]], label
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]]
> +; CHECK: ptr0.vs.ptr1.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]]
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ true, [[FOR_COND]] ], [
> false, [[FOR_BODY]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ true,
> [[PTR0_VS_PTR1_EQCMP_EQUALBB]] ], [ false,
> [[PTR0_VS_PTR1_EQCMP_UNEQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]]
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ]
>
> Modified: llvm/trunk/test/Transforms/LoopIdiom/bcmp-debugify-
> remarks.ll
> URL:
> https://protect2.fireeye.com/url?k=dfb96a5a-836d6219-dfb92ac1-86a1150bc3ba-c519507d4c2144a1&q=1&u=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FLoopIdiom%2Fbcmp-debugify-remarks.ll%3Frev%3D374662%26r1%3D374661%26r2%3D374662%26view%3Ddiff
> =====================================================================
> =========
> --- llvm/trunk/test/Transforms/LoopIdiom/bcmp-debugify-remarks.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopIdiom/bcmp-debugify-remarks.ll Sat
> Oct 12 08:35:32 2019
> @@ -1,5 +1,5 @@
> ; NOTE: Assertions have been autogenerated by
> utils/update_test_checks.py
> -; RUN: opt -debugify -loop-idiom < %s -S 2>&1 | FileCheck %s
> +; RUN: opt -debugify -loop-idiom -pass-remarks=loop-idiom -pass-
> remarks-analysis=loop-idiom -verify -verify-each -verify-dom-info
> -verify-loop-info < %s -S 2>&1 | FileCheck %s
>
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-
> i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-
> s0:64:64-f80:128:128-n8:16:32:64"
>
> @@ -23,38 +23,37 @@ target datalayout = "e-p:64:64:64-i1:8:8
> ; sink(std::equal(ptr0[i], ptr0[i] + count[i], ptr1[i]));
> ; }
>
> +; CHECK: remark: <stdin>:13:1: Loop recognized as a bcmp idiom
> +; CHECK: remark: <stdin>:11:1: Transformed bcmp idiom into a call to
> memcmp() function
> +; CHECK: remark: <stdin>:29:1: Loop recognized as a bcmp idiom
> +; CHECK: remark: <stdin>:34:1: Transformed bcmp idiom into a call to
> memcmp() function
> +
> define i1 @_Z43index_iteration_eq_variable_size_no_overlapPKcm(i8*
> nocapture %ptr, i64 %count) {
> ; CHECK-LABEL: @_Z43index_iteration_eq_variable_size_no_overlapPKcm(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT:%.*]], !dbg !22
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[PTR:%.*]], i64 [[COUNT_BYTECOUNT:%.*]], !dbg !22
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i8* [[ADD_PTR]],
> metadata !9, metadata !DIExpression()), !dbg !22
> -; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[COUNT]], 0, !dbg !23
> +; CHECK-NEXT: [[CMP14:%.*]] = icmp eq i64 [[COUNT_BYTECOUNT]], 0,
> !dbg !23
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP14]],
> metadata !11, metadata !DIExpression()), !dbg !23
> -; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_PREHEADER:%.*]], !dbg !24
> -; CHECK: for.body.preheader:
> -; CHECK-NEXT: br label [[FOR_BODY:%.*]], !dbg !25
> -; CHECK: for.cond:
> -; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC:%.*]], [[COUNT]],
> !dbg !26
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP]],
> metadata !13, metadata !DIExpression()), !dbg !26
> -; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label
> [[CLEANUP_LOOPEXIT:%.*]], !dbg !27
> -; CHECK: for.body:
> -; CHECK-NEXT: [[I_015:%.*]] = phi i64 [ [[INC]], [[FOR_COND:%.*]]
> ], [ 0, [[FOR_BODY_PREHEADER]] ], !dbg !28
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I_015]],
> metadata !14, metadata !DIExpression()), !dbg !28
> -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, i8*
> [[PTR]], i64 [[I_015]], !dbg !29
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8*
> [[ARRAYIDX]], metadata !15, metadata !DIExpression()), !dbg !29
> -; CHECK-NEXT: [[V0:%.*]] = load i8, i8* [[ARRAYIDX]], !dbg !30
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8 [[V0]],
> metadata !16, metadata !DIExpression()), !dbg !30
> -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, i8*
> [[ADD_PTR]], i64 [[I_015]], !dbg !31
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8*
> [[ARRAYIDX1]], metadata !17, metadata !DIExpression()), !dbg !31
> -; CHECK-NEXT: [[V1:%.*]] = load i8, i8* [[ARRAYIDX1]], !dbg !32
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8 [[V1]],
> metadata !18, metadata !DIExpression()), !dbg !32
> -; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[V0]], [[V1]], !dbg !33
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP3]],
> metadata !19, metadata !DIExpression()), !dbg !33
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_015]], 1, !dbg !34
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[INC]],
> metadata !20, metadata !DIExpression()), !dbg !34
> -; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_COND]], label
> [[CLEANUP_LOOPEXIT]], !dbg !25
> +; CHECK-NEXT: br i1 [[CMP14]], label [[CLEANUP:%.*]], label
> [[FOR_BODY_BCMPDISPATCHBB:%.*]], !dbg !24
> +; CHECK: for.body.bcmpdispatchbb:
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[PTR]], i8*
> [[ADD_PTR]], i64 [[COUNT_BYTECOUNT]]), !dbg !25
> +; CHECK-NEXT: [[PTR_VS_ADD_PTR_EQCMP:%.*]] = icmp eq i32
> [[MEMCMP]], 0, !dbg !25
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !14, metadata !DIExpression()), !dbg !26
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !15, metadata !DIExpression()), !dbg !27
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !16, metadata !DIExpression()), !dbg !28
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !17, metadata !DIExpression()), !dbg !29
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !18, metadata !DIExpression()), !dbg !30
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !19, metadata !DIExpression()), !dbg !25
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !20, metadata !DIExpression()), !dbg !31
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !13, metadata !DIExpression()), !dbg !32
> +; CHECK-NEXT: br i1 [[PTR_VS_ADD_PTR_EQCMP]], label
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB:%.*]], label
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB:%.*]], !dbg !25
> +; CHECK: ptr.vs.add.ptr.eqcmp.equalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT:%.*]], !dbg !33
> +; CHECK: ptr.vs.add.ptr.eqcmp.unequalbb:
> +; CHECK-NEXT: br label [[CLEANUP_LOOPEXIT]], !dbg !34
> ; CHECK: cleanup.loopexit:
> -; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false, [[FOR_BODY]] ], [
> true, [[FOR_COND]] ]
> +; CHECK-NEXT: [[RES_PH:%.*]] = phi i1 [ false,
> [[PTR_VS_ADD_PTR_EQCMP_UNEQUALBB]] ], [ true,
> [[PTR_VS_ADD_PTR_EQCMP_EQUALBB]] ]
> ; CHECK-NEXT: br label [[CLEANUP]], !dbg !35
> ; CHECK: cleanup:
> ; CHECK-NEXT: [[RES:%.*]] = phi i1 [ true, [[ENTRY:%.*]] ], [
> [[RES_PH]], [[CLEANUP_LOOPEXIT]] ], !dbg !36
> @@ -106,11 +105,11 @@ define void @_Z16loop_within_loopmPPKcS1
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i8* [[T0]],
> metadata !42, metadata !DIExpression()), !dbg !66
> ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i64,
> i64* [[COUNT:%.*]], i64 [[I_012]], !dbg !67
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i64*
> [[ARRAYIDX2]], metadata !43, metadata !DIExpression()), !dbg !67
> -; CHECK-NEXT: [[T1:%.*]] = load i64, i64* [[ARRAYIDX2]], !dbg !68
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[T1]],
> metadata !44, metadata !DIExpression()), !dbg !68
> -; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[T0]], i64 [[T1]], !dbg !69
> +; CHECK-NEXT: [[T1_BYTECOUNT:%.*]] = load i64, i64*
> [[ARRAYIDX2]], !dbg !68
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i64
> [[T1_BYTECOUNT]], metadata !44, metadata !DIExpression()), !dbg !68
> +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8*
> [[T0]], i64 [[T1_BYTECOUNT]], !dbg !69
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i8* [[ADD_PTR]],
> metadata !45, metadata !DIExpression()), !dbg !69
> -; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[T1]], 0, !dbg !70
> +; CHECK-NEXT: [[CMP5_I_I:%.*]] = icmp eq i64 [[T1_BYTECOUNT]], 0,
> !dbg !70
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP5_I_I]],
> metadata !46, metadata !DIExpression()), !dbg !70
> ; CHECK-NEXT: br i1 [[CMP5_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], label
> [[FOR_BODY_I_I_PREHEADER:%.*]], !dbg !62
> ; CHECK: for.body.i.i.preheader:
> @@ -118,39 +117,35 @@ define void @_Z16loop_within_loopmPPKcS1
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i8**
> [[ARRAYIDX3]], metadata !47, metadata !DIExpression()), !dbg !71
> ; CHECK-NEXT: [[T2:%.*]] = load i8*, i8** [[ARRAYIDX3]], !dbg !72
> ; CHECK-NEXT: call void @llvm.dbg.value(metadata i8* [[T2]],
> metadata !48, metadata !DIExpression()), !dbg !72
> -; CHECK-NEXT: br label [[FOR_BODY_I_I:%.*]], !dbg !73
> -; CHECK: for.body.i.i:
> -; CHECK-NEXT: [[__FIRST2_ADDR_07_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR1_I_I:%.*]], [[FOR_INC_I_I:%.*]] ], [ [[T2]],
> [[FOR_BODY_I_I_PREHEADER]] ], !dbg !74
> -; CHECK-NEXT: [[__FIRST1_ADDR_06_I_I:%.*]] = phi i8* [
> [[INCDEC_PTR_I_I:%.*]], [[FOR_INC_I_I]] ], [ [[T0]],
> [[FOR_BODY_I_I_PREHEADER]] ], !dbg !75
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8*
> [[__FIRST2_ADDR_07_I_I]], metadata !49, metadata !DIExpression()),
> !dbg !74
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8*
> [[__FIRST1_ADDR_06_I_I]], metadata !50, metadata !DIExpression()),
> !dbg !75
> -; CHECK-NEXT: [[T3:%.*]] = load i8, i8* [[__FIRST1_ADDR_06_I_I]],
> !dbg !76
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8 [[T3]],
> metadata !51, metadata !DIExpression()), !dbg !76
> -; CHECK-NEXT: [[T4:%.*]] = load i8, i8* [[__FIRST2_ADDR_07_I_I]],
> !dbg !77
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8 [[T4]],
> metadata !52, metadata !DIExpression()), !dbg !77
> -; CHECK-NEXT: [[CMP_I_I_I:%.*]] = icmp eq i8 [[T3]], [[T4]], !dbg
> !78
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i1
> [[CMP_I_I_I]], metadata !53, metadata !DIExpression()), !dbg !78
> -; CHECK-NEXT: br i1 [[CMP_I_I_I]], label [[FOR_INC_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]], !dbg !79
> -; CHECK: for.inc.i.i:
> -; CHECK-NEXT: [[INCDEC_PTR_I_I]] = getelementptr inbounds i8, i8*
> [[__FIRST1_ADDR_06_I_I]], i64 1, !dbg !80
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8*
> [[INCDEC_PTR_I_I]], metadata !54, metadata !DIExpression()), !dbg !80
> -; CHECK-NEXT: [[INCDEC_PTR1_I_I]] = getelementptr inbounds i8,
> i8* [[__FIRST2_ADDR_07_I_I]], i64 1, !dbg !81
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i8*
> [[INCDEC_PTR1_I_I]], metadata !55, metadata !DIExpression()), !dbg
> !81
> -; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8* [[INCDEC_PTR_I_I]],
> [[ADD_PTR]], !dbg !82
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP_I_I]],
> metadata !56, metadata !DIExpression()), !dbg !82
> -; CHECK-NEXT: br i1 [[CMP_I_I]], label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I]], !dbg !83
> +; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(i8* [[T0]], i8*
> [[T2]], i64 [[T1_BYTECOUNT]]), !dbg !73
> +; CHECK-NEXT: [[T0_VS_T2_EQCMP:%.*]] = icmp eq i32 [[MEMCMP]], 0,
> !dbg !73
> +; CHECK-NEXT: br label [[FOR_BODY_I_I_BCMPDISPATCHBB:%.*]]
> +; CHECK: for.body.i.i.bcmpdispatchbb:
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !49, metadata !DIExpression()), !dbg !74
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !50, metadata !DIExpression()), !dbg !75
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !51, metadata !DIExpression()), !dbg !76
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !52, metadata !DIExpression()), !dbg !77
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !53, metadata !DIExpression()), !dbg !73
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !54, metadata !DIExpression()), !dbg !78
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !55, metadata !DIExpression()), !dbg !79
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 undef,
> metadata !56, metadata !DIExpression()), !dbg !80
> +; CHECK-NEXT: br i1 [[T0_VS_T2_EQCMP]], label
> [[T0_VS_T2_EQCMP_EQUALBB:%.*]], label
> [[T0_VS_T2_EQCMP_UNEQUALBB:%.*]], !dbg !73
> +; CHECK: t0.vs.t2.eqcmp.equalbb:
> +; CHECK-NEXT: br i1 true, label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT:%.*]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB]], !dbg !81
> +; CHECK: t0.vs.t2.eqcmp.unequalbb:
> +; CHECK-NEXT: br i1 true, label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]], label
> [[FOR_BODY_I_I_BCMPDISPATCHBB]], !dbg !82
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit.loopexit:
> -; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[FOR_BODY_I_I]] ], [ true, [[FOR_INC_I_I]] ]
> -; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], !dbg !84
> +; CHECK-NEXT: [[RETVAL_0_I_I_PH:%.*]] = phi i1 [ false,
> [[T0_VS_T2_EQCMP_UNEQUALBB]] ], [ true, [[T0_VS_T2_EQCMP_EQUALBB]] ]
> +; CHECK-NEXT: br label
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT]], !dbg !83
> ; CHECK: _ZNSt3__15equalIPKcS2_EEbT_S3_T0_.exit:
> -; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[FOR_BODY]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ], !dbg !85
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i1
> [[RETVAL_0_I_I]], metadata !57, metadata !DIExpression()), !dbg !85
> -; CHECK-NEXT: tail call void @_Z4sinkb(i1 [[RETVAL_0_I_I]]), !dbg
> !84
> -; CHECK-NEXT: [[INC]] = add nuw i64 [[I_012]], 1, !dbg !86
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[INC]],
> metadata !58, metadata !DIExpression()), !dbg !86
> -; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[INC]], [[OUTER_COUNT]],
> !dbg !87
> -; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP]],
> metadata !59, metadata !DIExpression()), !dbg !87
> -; CHECK-NEXT: br i1 [[CMP]], label
> [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[FOR_BODY]], !dbg !88
> +; CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i1 [ true, [[FOR_BODY]]
> ], [ [[RETVAL_0_I_I_PH]],
> [[_ZNST3__15EQUALIPKCS2_EEBT_S3_T0__EXIT_LOOPEXIT]] ], !dbg !84
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i1
> [[RETVAL_0_I_I]], metadata !57, metadata !DIExpression()), !dbg !84
> +; CHECK-NEXT: tail call void @_Z4sinkb(i1 [[RETVAL_0_I_I]]), !dbg
> !83
> +; CHECK-NEXT: [[INC]] = add nuw i64 [[I_012]], 1, !dbg !85
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[INC]],
> metadata !58, metadata !DIExpression()), !dbg !85
> +; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[INC]], [[OUTER_COUNT]],
> !dbg !86
> +; CHECK-NEXT: call void @llvm.dbg.value(metadata i1 [[CMP]],
> metadata !59, metadata !DIExpression()), !dbg !86
> +; CHECK-NEXT: br i1 [[CMP]], label
> [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[FOR_BODY]], !dbg !87
> ;
> entry:
> %cmp11 = icmp eq i64 %outer_count, 0
>
> Modified: llvm/trunk/test/Transforms/LoopIdiom/bcmp-widening.ll
> URL:
> https://protect2.fireeye.com/url?k=295a002b-758e0868-295a40b0-86a1150bc3ba-0192ac886f53f618&q=1&u=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FLoopIdiom%2Fbcmp-widening.ll%3Frev%3D374662%26r1%3D374661%26r2%3D374662%26view%3Ddiff
> =====================================================================
> =========
> --- llvm/trunk/test/Transforms/LoopIdiom/bcmp-widening.ll (original)
> +++ llvm/trunk/test/Transforms/LoopIdiom/bcmp-widening.ll Sat Oct 12
> 08:35:32 2019
> @@ -1,5 +1,5 @@
> ; NOTE: Assertions have been autogenerated by
> utils/update_test_checks.py
> -; RUN: opt -loop-idiom < %s -S | FileCheck %s
> +; RUN: opt -loop-idiom -verify -verify-each -verify-dom-info
> -verify-loop-info < %s -S | FileCheck %s
>
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-
> i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-
> s0:64:64-f80:128:128-n8:16:32:64"
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
>
https://protect2.fireeye.com/url?k=8d609907-d1b49144-8d60d99c-86a1150bc3ba-80b44b31d4fb602f&q=1&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-commits
More information about the llvm-commits
mailing list