[PATCH] D135282: [SLP]Improve costs of vectorized loads/stores by analyzing GEPs.
    Valeriy Dmitriev via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Thu Nov 17 18:31:45 PST 2022
    
    
  
vdmitrie added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6656
+            continue;
+          ScalarLdCost += TTI->getArithmeticInstrCost(Instruction::Add,
+                                                      Ptr->getType(), CostKind);
----------------
We see quite a significant performance regression related to this patch.
It does not look the right adjustment. For x86 specifically these GEPS cost nothing as they end up merely as different displacement values in memory operands. So the bias towards vectorization isn't justified for plain loads and stores.
It can be seen even for test case test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll
Vectorization makes code less profitable here. It already existed before the patch but this patch but cost modeling although tipped over to vectorization it was close enough to say "not profitable".
 But now we have even more bias.
Vecorized code:
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.25                        subq  $136, %rsp
 1      0     0.17                        xorl  %ecx, %ecx
 1      0     0.17                        xorl  %eax, %eax
 1      5     0.50    *                   movq  (%rdi,%rcx), %xmm0
 1      5     0.50    *                   movq  16(%rdi,%rcx), %xmm1
 1      1     0.33                        paddd %xmm0, %xmm1
 1      2     1.00                        movd  %xmm1, %edx
 1      1     0.25                        addl  %eax, %edx
 2      1     1.00           *            movq  %xmm1, -128(%rsp,%rcx)
 1      1     1.00                        pshufd        $85, %xmm1, %xmm0
 1      2     1.00                        movd  %xmm0, %eax
 1      1     0.25                        addl  %edx, %eax
 1      1     0.25                        addq  $32, %rcx
 1      1     0.25                        cmpq  $256, %rcx
 1      1     0.50                        jne   .LBB0_1
 1      1     0.25                        addq  $136, %rsp
 3      7     1.00                  U     retq
Original:
[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.25                        subq  $136, %rsp
 1      0     0.17                        xorl  %eax, %eax
 1      1     0.25                        movq  $-256, %rcx
 1      5     0.50    *                   movl  272(%rdi,%rcx), %edx
 2      6     0.50    *                   addl  256(%rdi,%rcx), %edx
 1      1     1.00           *            movl  %edx, 128(%rsp,%rcx)
 1      5     0.50    *                   movl  276(%rdi,%rcx), %esi
 2      6     0.50    *                   addl  260(%rdi,%rcx), %esi
 1      1     1.00           *            movl  %esi, 132(%rsp,%rcx)
 1      1     0.25                        addl  %esi, %eax
 1      1     0.25                        addl  %edx, %eax
 1      1     0.25                        addq  $32, %rcx
 1      1     0.50                        jne   .LBB0_1
 1      1     0.25                        addq  $136, %rsp
 3      7     1.00                  U     retq
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135282/new/
https://reviews.llvm.org/D135282
    
    
More information about the llvm-commits
mailing list