[llvm] 30463bc - [SLP]Do not count perfect diamond matches for gathers several times.
Alexander Kornienko via llvm-commits
llvm-commits at lists.llvm.org
Thu May 20 09:03:06 PDT 2021
We see performance regressions after this patch. A number of benchmarks
regressed for more than 10%. One example is the flops-6.c from the LLVM
test-suite. An isolated test based on that benchmark:
$ cat flops-6.c
extern int printf (const char *__restrict __format, ...);
double T[36];
double sa,sb,sc,sd,one,two;
double four,piref;
double scale;
double A1 = -0.1666666666671334;
double A2 = 0.833333333809067E-2;
double A3 = 0.198412715551283E-3;
double A4 = 0.27557589750762E-5;
double A5 = 0.2507059876207E-7;
double A6 = 0.164105986683E-9;
double B1 = -0.4999999999982;
double B2 = 0.4166666664651E-1;
double B3 = -0.1388888805755E-2;
double B4 = 0.24801428034E-4;
double B5 = -0.2754213324E-6;
double B6 = 0.20189405E-8;
int main()
{
double s,u,v,w,x;
long loops;
register long i, m, n;
printf("\n");
printf(" FLOPS C Program (Double Precision), V2.0 18 Dec 1992\n\n");
loops = 15625;
piref = 3.14159265358979324;
one = 1.0;
two = 2.0;
four = 4.0;
scale = one;
printf(" Module Error RunTime MFLOPS\n");
printf(" (usec)\n");
m = loops*10000;
x = piref / ( four * (double)m );
s = 0.0;
v = 0.0;
for( i = 1 ; i <= m-1 ; i++ )
{
u = (double)i * x;
w = u * u;
v = u * ((((((A6*w+A5)*w+A4)*w+A3)*w+A2)*w+A1)*w+one);
s = s + v*(w*(w*(w*(w*(w*(B6*w+B5)+B4)+B3)+B2)+B1)+one);
}
u = piref / four;
w = u * u;
sa = u*((((((A6*w+A5)*w+A4)*w+A3)*w+A2)*w+A1)*w+one);
sb = w*(w*(w*(w*(w*(B6*w+B5)+B4)+B3)+B2)+B1)+one;
sa = sa * sb;
sa = x * ( sa + two * s ) / two;
sb = 0.25;
sc = sa - sb;
printf(" 6 %13.4lf %10.4lf %10.4lf\n",
sc* 1e-30,
0* 1e-30 ,
0* 1e-30);
return 0;
}
$ clang-base -O3 -maes -m64 -mcx16 -msse4.2 -mpclmul
'-mprefer-vector-width=128' flops-6.c -o flops-6-base
$ clang-new -O3 -maes -m64 -mcx16 -msse4.2 -mpclmul
'-mprefer-vector-width=128' flops-6.c -o flops-6-new
$ for i in $(seq 5) ; do time ./flops-6-base ; done
6 0.0000 0.0000 0.0000
real 0m0.705s
user 0m0.700s
sys 0m0.004s
6 0.0000 0.0000 0.0000
real 0m0.706s
user 0m0.704s
sys 0m0.001s
6 0.0000 0.0000 0.0000
real 0m0.706s
user 0m0.705s
sys 0m0.001s
6 0.0000 0.0000 0.0000
real 0m0.706s
user 0m0.704s
sys 0m0.001s
6 0.0000 0.0000 0.0000
real 0m0.707s
user 0m0.705s
sys 0m0.001s
$ for i in $(seq 5) ; do time ./flops-6-new ; done
6 0.0000 0.0000 0.0000
real 0m0.899s
user 0m0.898s
sys 0m0.000s
6 0.0000 0.0000 0.0000
real 0m0.899s
user 0m0.898s
sys 0m0.000s
6 0.0000 0.0000 0.0000
real 0m0.900s
user 0m0.899s
sys 0m0.000s
6 0.0000 0.0000 0.0000
real 0m0.899s
user 0m0.898s
sys 0m0.000s
6 0.0000 0.0000 0.0000
real 0m0.899s
user 0m0.898s
sys 0m0.000s
Can you take a look at this and maybe revert in the meantime?
Thanks!
-- Alex
On Mon, May 10, 2021 at 4:10 PM Alexey Bataev via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
>
> Author: Alexey Bataev
> Date: 2021-05-10T07:08:07-07:00
> New Revision: 30463bc3f1839e8a238be4c137e2356f3cca2771
>
> URL:
> https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771
> DIFF:
> https://github.com/llvm/llvm-project/commit/30463bc3f1839e8a238be4c137e2356f3cca2771.diff
>
> LOG: [SLP]Do not count perfect diamond matches for gathers several times.
>
> Need to remove the old code for avoiding double counting of the gather
> nodes with perfect diamond matches within the tree after we started
> detecting perfect/shuffled matching in the previous patch D100495. We
> may skip the cost for such nodes completely.
>
> Differential Revision: https://reviews.llvm.org/D102023
>
> Added:
>
>
> Modified:
> llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
>
> Removed:
>
>
>
>
> ################################################################################
> diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> index 22e090fd1d7c..e656b189c779 100644
> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> @@ -4233,27 +4233,6 @@ InstructionCost BoUpSLP::getTreeCost() {
> for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
> TreeEntry &TE = *VectorizableTree[I].get();
>
> - // We create duplicate tree entries for gather sequences that have
> multiple
> - // uses. However, we should not compute the cost of duplicate
> sequences.
> - // For example, if we have a build vector (i.e., insertelement
> sequence)
> - // that is used by more than one vector instruction, we only need to
> - // compute the cost of the insertelement instructions once. The
> redundant
> - // instructions will be eliminated by CSE.
> - //
> - // We should consider not creating duplicate tree entries for gather
> - // sequences, and instead add additional edges to the tree
> representing
> - // their uses. Since such an approach results in fewer total entries,
> - // existing heuristics based on tree size may yield
> diff erent results.
> - //
> - if (TE.State == TreeEntry::NeedToGather &&
> - std::any_of(std::next(VectorizableTree.begin(), I + 1),
> - VectorizableTree.end(),
> - [TE](const std::unique_ptr<TreeEntry> &EntryPtr) {
> - return EntryPtr->State == TreeEntry::NeedToGather &&
> - EntryPtr->isSame(TE.Scalars);
> - }))
> - continue;
> -
> InstructionCost C = getEntryCost(&TE);
> Cost += C;
> LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
>
> diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
> b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
> index 31c63d31f4df..57db62ace206 100644
> --- a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
> +++ b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll
> @@ -10,7 +10,7 @@ target triple = "aarch64--linux-gnu"
> ; REMARK-LABEL: Function: gather_multiple_use
> ; REMARK: Args:
> ; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '
> -; REMARK-NEXT: - Cost: '-16'
> +; REMARK-NEXT: - Cost: '-7'
> ;
> ; REMARK-NOT: Function: gather_load
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210520/fbdcd69f/attachment.html>
More information about the llvm-commits
mailing list