[llvm] 30463bc - [SLP]Do not count perfect diamond matches for gathers several times.

Fri May 21 05:20:10 PDT 2021

There is a problem that without this patch there are other benchmarks
that have about 20-30% perf regression. I know the root cause already,
the patch is in work, hope to prepare in 1-2 hours, polishing some final
things.

-------------
Best regards,
Alexey Bataev

5/21/2021 8:11 AM, Alexander Kornienko пишет:
> Given that the change causes performance regressions in the range of
> 20-30% for some benchmarks, would you agree to rollback the patch
> while you're working on a fix? It would be helpful for us, if there
> was a cleaner version in mainline in the meantime.
>
> Thanks!
>
> On Thu, May 20, 2021 at 10:11 PM Alexey.Bataev <a.bataev at outlook.com
> <mailto:a.bataev at outlook.com>> wrote:
>
>     Checked the regression, the fix is correct but I need to prepare
>     another patch for better match detection in the vectorization
>     tree. Hope to commit it in a day or two.
>
>     -------------
>     Best regards,
>     Alexey Bataev
>
>     5/20/2021 12:03 PM, Alexander Kornienko пишет:
>>     We see performance regressions after this patch. A number of
>>     benchmarks regressed for more than 10%. One example is the
>>     flops-6.c from the LLVM test-suite. An isolated test based on
>>     that benchmark:
>>
>>     $ cat flops-6.c
>>     extern int printf (const char *__restrict __format, ...);
>>     double T[36];
>>     double sa,sb,sc,sd,one,two;
>>     double four,piref;
>>     double scale;
>>     double A1 = -0.1666666666671334;
>>     double A2 = 0.833333333809067E-2;
>>     double A3 = 0.198412715551283E-3;
>>     double A4 = 0.27557589750762E-5;
>>     double A5 = 0.2507059876207E-7;
>>     double A6 = 0.164105986683E-9;
>>     double B1 = -0.4999999999982;
>>     double B2 = 0.4166666664651E-1;
>>     double B3 = -0.1388888805755E-2;
>>     double B4 = 0.24801428034E-4;
>>     double B5 = -0.2754213324E-6;
>>     double B6 = 0.20189405E-8;
>>     int main()
>>     {
>>        double s,u,v,w,x;
>>        long loops;
>>        register long i, m, n;
>>        printf("\n");
>>        printf("   FLOPS C Program (Double Precision), V2.0 18 Dec
>>     1992\n\n");
>>        loops = 15625;
>>        piref = 3.14159265358979324;
>>        one = 1.0;
>>        two = 2.0;
>>        four = 4.0;
>>        scale = one;
>>        printf("   Module     Error        RunTime      MFLOPS\n");
>>        printf("                            (usec)\n");
>>        m = loops*10000;
>>        x = piref / ( four * (double)m );
>>        s = 0.0;
>>        v = 0.0;
>>        for( i = 1 ; i <= m-1 ; i++ )
>>        {
>>        u = (double)i * x;
>>        w = u * u;
>>        v = u * ((((((A6*w+A5)*w+A4)*w+A3)*w+A2)*w+A1)*w+one);
>>        s = s + v*(w*(w*(w*(w*(w*(B6*w+B5)+B4)+B3)+B2)+B1)+one);
>>        }
>>        u = piref / four;
>>        w = u * u;
>>        sa = u*((((((A6*w+A5)*w+A4)*w+A3)*w+A2)*w+A1)*w+one);
>>        sb = w*(w*(w*(w*(w*(B6*w+B5)+B4)+B3)+B2)+B1)+one;
>>        sa = sa * sb;
>>        sa = x * ( sa + two * s ) / two;
>>        sb = 0.25;
>>        sc = sa - sb;
>>        printf("     6   %13.4lf  %10.4lf  %10.4lf\n",
>>               sc* 1e-30,
>>               0* 1e-30 ,
>>               0* 1e-30);
>>        return 0;
>>     }
>>     $ clang-base -O3 -maes -m64 -mcx16 -msse4.2 -mpclmul
>>     '-mprefer-vector-width=128' flops-6.c -o flops-6-base
>>     $ clang-new -O3 -maes -m64 -mcx16 -msse4.2 -mpclmul
>>     '-mprefer-vector-width=128' flops-6.c -o flops-6-new
>>     $ for i in $(seq 5) ; do time ./flops-6-base ; done
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.705s
>>     user    0m0.700s
>>     sys     0m0.004s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.706s
>>     user    0m0.704s
>>     sys     0m0.001s
>>          6          0.0000      0.0000      0.0000

>>
>>     real    0m0.706s
>>     user    0m0.705s
>>     sys     0m0.001s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.706s
>>     user    0m0.704s
>>     sys     0m0.001s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.707s
>>     user    0m0.705s
>>     sys     0m0.001s
>>     $ for i in $(seq 5) ; do time ./flops-6-new ; done
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.899s
>>     user    0m0.898s
>>     sys     0m0.000s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.899s
>>     user    0m0.898s
>>     sys     0m0.000s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.900s
>>     user    0m0.899s
>>     sys     0m0.000s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.899s
>>     user    0m0.898s
>>     sys     0m0.000s
>>          6          0.0000      0.0000      0.0000
>>
>>     real    0m0.899s
>>     user    0m0.898s
>>     sys     0m0.000s
>>
>>     Can you take a look at this and maybe revert in the meantime?
>>
>>     Thanks!
>>
>>     -- Alex
>>
>>     On Mon, May 10, 2021 at 4:10 PM Alexey Bataev via llvm-commits
>>     <llvm-commits at lists.llvm.org
>>     <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>>
>>         Author: Alexey Bataev
>>         Date: 2021-05-10T07:08:07-07:00
>>         New Revision: 30463bc3f1839e8a238be4c137e2356f3cca2771
>>
>>         URL:
>>         https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771
>>         <https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771>
>>         DIFF:
>>         https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771.diff
>>         <https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771.diff>
>>
>>         LOG: [SLP]Do not count perfect diamond matches for gathers
>>         several times.
>>
>>         Need to remove the old code for avoiding double counting of
>>         the gather
>>         nodes with perfect diamond matches within the tree after we
>>         started
>>         detecting perfect/shuffled matching in the previous patch
>>         D100495. We
>>         may skip the cost for such nodes completely.
>>
>>         Differential Revision: https://reviews.llvm.org/D102023
>>         <https://reviews.llvm.org/D102023>
>>
>>         Added:
>>
>>
>>         Modified:
>>             llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>             llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>>
>>         Removed:
>>
>>
>>
>>         ################################################################################
>>         diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>         b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>         index 22e090fd1d7c..e656b189c779 100644
>>         --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>         +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>         @@ -4233,27 +4233,6 @@ InstructionCost BoUpSLP::getTreeCost() {
>>            for (unsigned I = 0, E = VectorizableTree.size(); I < E;
>>         ++I) {
>>              TreeEntry &TE = *VectorizableTree[I].get();
>>
>>         -    // We create duplicate tree entries for gather sequences
>>         that have multiple
>>         -    // uses. However, we should not compute the cost of
>>         duplicate sequences.
>>         -    // For example, if we have a build vector (i.e.,
>>         insertelement sequence)
>>         -    // that is used by more than one vector instruction, we
>>         only need to
>>         -    // compute the cost of the insertelement instructions
>>         once. The redundant
>>         -    // instructions will be eliminated by CSE.
>>         -    //
>>         -    // We should consider not creating duplicate tree
>>         entries for gather
>>         -    // sequences, and instead add additional edges to the
>>         tree representing
>>         -    // their uses. Since such an approach results in fewer
>>         total entries,
>>         -    // existing heuristics based on tree size may yield
>>         diff erent results.
>>         -    //
>>         -    if (TE.State == TreeEntry::NeedToGather &&
>>         -        std::any_of(std::next(VectorizableTree.begin(), I + 1),
>>         -                    VectorizableTree.end(),
>>         -                    [TE](const std::unique_ptr<TreeEntry>
>>         &EntryPtr) {
>>         -                      return EntryPtr->State ==
>>         TreeEntry::NeedToGather &&
>>         -                             EntryPtr->isSame(TE.Scalars);
>>         -                    }))
>>         -      continue;
>>         -
>>              InstructionCost C = getEntryCost(&TE);
>>              Cost += C;
>>              LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
>>
>>         diff  --git
>>         a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>>         b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>>         index 31c63d31f4df..57db62ace206 100644
>>         --- a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>>         +++ b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>>         @@ -10,7 +10,7 @@ target triple = "aarch64--linux-gnu"
>>          ; REMARK-LABEL: Function: gather_multiple_use
>>          ; REMARK:       Args:
>>          ; REMARK-NEXT:    - String: 'Vectorized horizontal reduction
>>         with cost '
>>         -; REMARK-NEXT:    - Cost: '-16'
>>         +; REMARK-NEXT:    - Cost: '-7'
>>          ;
>>          ; REMARK-NOT: Function: gather_load
>>
>>
>>
>>
>>         _______________________________________________
>>         llvm-commits mailing list
>>         llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>         <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210521/7ea081d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210521/7ea081d8/attachment.sig>