[PATCH] D29900: [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put back into a vector

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 14 01:39:47 PST 2017


ABataev added inline comments.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1672
           Instruction *E = cast<Instruction>(VL[i]);
-          if (E->hasOneUse())
+          if (hasOneUse(E))
             // Take credit for instruction that will become dead.
----------------
hfinkel wrote:
> ABataev wrote:
> > hfinkel wrote:
> > > Why are we checking this use(r) count at all? If canReuseExtract is true, then don't we just care that all users are part of the to-be-vectorized tree?
> > > 
> > It does not check if we can vectorize extract, it checks should we remove this extract from the code or there are other users of this extract. If we can remove this extract from the code, we can consider this instruction as dead and subtract from the cost, otherwise this instruction is alive and its cost should be considered during vectorization.
> Right, but what's special in this regard about all of the uses belonging to the same instruction? What if the test case, for example, looked like this:
> 
> 
>   %x0 = extractelement <2 x float> %x, i32 0
>   %x1 = extractelement <2 x float> %x, i32 1
>   %x0x1 = fmul float %x0, %x1
>   %x1x1 = fmul float %x1, %x1
>   %add = fadd float %x0x1, %x1x1
>   ret float %add
> }
> 
> are the costs of all of these extracts still accounted for correctly? The final code presumably has a shuffle in place of one of the extracts?
> 
> 
Oh, now I see what are you talking about. But we still may run into something like this:
```
@a = global float 0.000000e+00, align 4

define float @f(<2 x float> %x) {
  %x0 = extractelement <2 x float> %x, i32 0
  %x1 = extractelement <2 x float> %x, i32 1
  %x0x0 = fmul float %x0, %x0
  %x1x1 = fmul float %x1, %x1
  %add = fadd float %x0x0, %x1x1
  store float %add, float* @a
  ret float %x0
}
```
where we have `ret` user of `%x0`, which is not a part of the vectorized tree. What we should do is to check that all users of `extractelement` instructions are going to be vectorized.


https://reviews.llvm.org/D29900





More information about the llvm-commits mailing list