[llvm] 30463bc - [SLP]Do not count perfect diamond matches for gathers several times.
Alexey.Bataev via llvm-commits
llvm-commits at lists.llvm.org
Fri May 21 05:20:10 PDT 2021
There is a problem that without this patch there are other benchmarks
that have about 20-30% perf regression. I know the root cause already,
the patch is in work, hope to prepare in 1-2 hours, polishing some final
things.
-------------
Best regards,
Alexey Bataev
5/21/2021 8:11 AM, Alexander Kornienko пишет:
> Given that the change causes performance regressions in the range of
> 20-30% for some benchmarks, would you agree to rollback the patch
> while you're working on a fix? It would be helpful for us, if there
> was a cleaner version in mainline in the meantime.
>
> Thanks!
>
> On Thu, May 20, 2021 at 10:11 PM Alexey.Bataev <a.bataev at outlook.com
> <mailto:a.bataev at outlook.com>> wrote:
>
> Checked the regression, the fix is correct but I need to prepare
> another patch for better match detection in the vectorization
> tree. Hope to commit it in a day or two.
>
> -------------
> Best regards,
> Alexey Bataev
>
> 5/20/2021 12:03 PM, Alexander Kornienko пишет:
>> We see performance regressions after this patch. A number of
>> benchmarks regressed for more than 10%. One example is the
>> flops-6.c from the LLVM test-suite. An isolated test based on
>> that benchmark:
>>
>> $ cat flops-6.c
>> extern int printf (const char *__restrict __format, ...);
>> double T[36];
>> double sa,sb,sc,sd,one,two;
>> double four,piref;
>> double scale;
>> double A1 = -0.1666666666671334;
>> double A2 = 0.833333333809067E-2;
>> double A3 = 0.198412715551283E-3;
>> double A4 = 0.27557589750762E-5;
>> double A5 = 0.2507059876207E-7;
>> double A6 = 0.164105986683E-9;
>> double B1 = -0.4999999999982;
>> double B2 = 0.4166666664651E-1;
>> double B3 = -0.1388888805755E-2;
>> double B4 = 0.24801428034E-4;
>> double B5 = -0.2754213324E-6;
>> double B6 = 0.20189405E-8;
>> int main()
>> {
>> double s,u,v,w,x;
>> long loops;
>> register long i, m, n;
>> printf("\n");
>> printf(" FLOPS C Program (Double Precision), V2.0 18 Dec
>> 1992\n\n");
>> loops = 15625;
>> piref = 3.14159265358979324;
>> one = 1.0;
>> two = 2.0;
>> four = 4.0;
>> scale = one;
>> printf(" Module Error RunTime MFLOPS\n");
>> printf(" (usec)\n");
>> m = loops*10000;
>> x = piref / ( four * (double)m );
>> s = 0.0;
>> v = 0.0;
>> for( i = 1 ; i <= m-1 ; i++ )
>> {
>> u = (double)i * x;
>> w = u * u;
>> v = u * ((((((A6*w+A5)*w+A4)*w+A3)*w+A2)*w+A1)*w+one);
>> s = s + v*(w*(w*(w*(w*(w*(B6*w+B5)+B4)+B3)+B2)+B1)+one);
>> }
>> u = piref / four;
>> w = u * u;
>> sa = u*((((((A6*w+A5)*w+A4)*w+A3)*w+A2)*w+A1)*w+one);
>> sb = w*(w*(w*(w*(w*(B6*w+B5)+B4)+B3)+B2)+B1)+one;
>> sa = sa * sb;
>> sa = x * ( sa + two * s ) / two;
>> sb = 0.25;
>> sc = sa - sb;
>> printf(" 6 %13.4lf %10.4lf %10.4lf\n",
>> sc* 1e-30,
>> 0* 1e-30 ,
>> 0* 1e-30);
>> return 0;
>> }
>> $ clang-base -O3 -maes -m64 -mcx16 -msse4.2 -mpclmul
>> '-mprefer-vector-width=128' flops-6.c -o flops-6-base
>> $ clang-new -O3 -maes -m64 -mcx16 -msse4.2 -mpclmul
>> '-mprefer-vector-width=128' flops-6.c -o flops-6-new
>> $ for i in $(seq 5) ; do time ./flops-6-base ; done
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.705s
>> user 0m0.700s
>> sys 0m0.004s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.706s
>> user 0m0.704s
>> sys 0m0.001s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.706s
>> user 0m0.705s
>> sys 0m0.001s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.706s
>> user 0m0.704s
>> sys 0m0.001s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.707s
>> user 0m0.705s
>> sys 0m0.001s
>> $ for i in $(seq 5) ; do time ./flops-6-new ; done
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.899s
>> user 0m0.898s
>> sys 0m0.000s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.899s
>> user 0m0.898s
>> sys 0m0.000s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.900s
>> user 0m0.899s
>> sys 0m0.000s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.899s
>> user 0m0.898s
>> sys 0m0.000s
>> 6 0.0000 0.0000 0.0000
>>
>> real 0m0.899s
>> user 0m0.898s
>> sys 0m0.000s
>>
>> Can you take a look at this and maybe revert in the meantime?
>>
>> Thanks!
>>
>> -- Alex
>>
>> On Mon, May 10, 2021 at 4:10 PM Alexey Bataev via llvm-commits
>> <llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>>
>> Author: Alexey Bataev
>> Date: 2021-05-10T07:08:07-07:00
>> New Revision: 30463bc3f1839e8a238be4c137e2356f3cca2771
>>
>> URL:
>> https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771
>> <https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771>
>> DIFF:
>> https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771.diff
>> <https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771.diff>
>>
>> LOG: [SLP]Do not count perfect diamond matches for gathers
>> several times.
>>
>> Need to remove the old code for avoiding double counting of
>> the gather
>> nodes with perfect diamond matches within the tree after we
>> started
>> detecting perfect/shuffled matching in the previous patch
>> D100495. We
>> may skip the cost for such nodes completely.
>>
>> Differential Revision: https://reviews.llvm.org/D102023
>> <https://reviews.llvm.org/D102023>
>>
>> Added:
>>
>>
>> Modified:
>> llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>>
>> Removed:
>>
>>
>>
>> ################################################################################
>> diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> index 22e090fd1d7c..e656b189c779 100644
>> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> @@ -4233,27 +4233,6 @@ InstructionCost BoUpSLP::getTreeCost() {
>> for (unsigned I = 0, E = VectorizableTree.size(); I < E;
>> ++I) {
>> TreeEntry &TE = *VectorizableTree[I].get();
>>
>> - // We create duplicate tree entries for gather sequences
>> that have multiple
>> - // uses. However, we should not compute the cost of
>> duplicate sequences.
>> - // For example, if we have a build vector (i.e.,
>> insertelement sequence)
>> - // that is used by more than one vector instruction, we
>> only need to
>> - // compute the cost of the insertelement instructions
>> once. The redundant
>> - // instructions will be eliminated by CSE.
>> - //
>> - // We should consider not creating duplicate tree
>> entries for gather
>> - // sequences, and instead add additional edges to the
>> tree representing
>> - // their uses. Since such an approach results in fewer
>> total entries,
>> - // existing heuristics based on tree size may yield
>> diff erent results.
>> - //
>> - if (TE.State == TreeEntry::NeedToGather &&
>> - std::any_of(std::next(VectorizableTree.begin(), I + 1),
>> - VectorizableTree.end(),
>> - [TE](const std::unique_ptr<TreeEntry>
>> &EntryPtr) {
>> - return EntryPtr->State ==
>> TreeEntry::NeedToGather &&
>> - EntryPtr->isSame(TE.Scalars);
>> - }))
>> - continue;
>> -
>> InstructionCost C = getEntryCost(&TE);
>> Cost += C;
>> LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
>>
>> diff --git
>> a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>> b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>> index 31c63d31f4df..57db62ace206 100644
>> --- a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>> +++ b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>> @@ -10,7 +10,7 @@ target triple = "aarch64--linux-gnu"
>> ; REMARK-LABEL: Function: gather_multiple_use
>> ; REMARK: Args:
>> ; REMARK-NEXT: - String: 'Vectorized horizontal reduction
>> with cost '
>> -; REMARK-NEXT: - Cost: '-16'
>> +; REMARK-NEXT: - Cost: '-7'
>> ;
>> ; REMARK-NOT: Function: gather_load
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210521/7ea081d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210521/7ea081d8/attachment.sig>
More information about the llvm-commits
mailing list