[LLVMdev] Vectorization: Next Steps

Fri Feb 10 10:02:47 PST 2012

Carl-Philip,

The reason that this does not vectorize is that it cannot vectorize the
stores; this leaves only the mul-add chains (and some chains with
loads), and they only have a depth of 2 (the threshold is 6).

If you give clang -mllvm -bb-vectorize-req-chain-depth=2 then it will
vectorize. The reason the heuristic has such a large default value is to
prevent cases where it costs more to permute all of the necessary values
into and out of the vector registers than is saved by vectorizing. Does
the code generated with -bb-vectorize-req-chain-depth=2 run faster than
the unvectorized code?

The heuristic can certainly be improved, and these kinds of test cases
are very important to that improvement process.

 -Hal

On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip Hänsch wrote:
> I have a super-simple test case 4x4 matrix * 4-vector which gets
> correctly unrolled, but is not vectorized by -bb-vectorize. (I used
> llvm 3.1svn)
> I attached the test case so you can see what is going wrong there.
> 
> 2012/2/3 Hal Finkel <hfinkel at anl.gov>
>         As some of you may know, I committed my basic-block
>         autovectorization
>         pass a few days ago. I encourage anyone interested to try it
>         out (pass
>         -vectorize to opt or -mllvm -vectorize to clang) and provide
>         feedback.
>         Especially in combination with -unroll-allow-partial, I have
>         observed
>         some significant benchmark speedups, but, I have also observed
>         some
>         significant slowdowns. I would like to share my thoughts, and
>         hopefully
>         get feedback, on next steps.
>         
>         1. "Target Data" for vectorization - I think that in order to
>         improve
>         the vectorization quality, the vectorizer will need more
>         information
>         about the target. This information could be provided in the
>         form of a
>         kind of extended target data. This extended target data might
>         contain:
>          - What basic types can be vectorized, and how many of them
>         will fit
>         into (the largest) vector registers
>          - What classes of operations can be vectorized (division,
>         conversions /
>         sign extension, etc. are not always supported)
>          - What alignment is necessary for loads and stores
>          - Is scalar-to-vector free?
>         
>         2. Feedback between passes - We may to implement a closer
>         coupling
>         between optimization passes than currently exists.
>         Specifically, I have
>         in mind two things:
>          - The vectorizer should communicate more closely with the
>         loop
>         unroller. First, the loop unroller should try to unroll to
>         preserve
>         maximal load/store alignments. Second, I think it would make a
>         lot of
>         sense to be able to unroll and, only if this helps
>         vectorization should
>         the unrolled version be kept in preference to the original.
>         With basic
>         block vectorization, it is often necessary to (partially)
>         unroll in
>         order to vectorize. Even when we also have real loop
>         vectorization,
>         however, I still think that it will be important for the loop
>         unroller
>         to communicate with the vectorizer.
>          - After vectorization, it would make sense for the
>         vectorization pass
>         to request further simplification, but only on those parts of
>         the code
>         that it modified.
>         
>         3. Loop vectorization - It would be nice to have, in addition
>         to
>         basic-block vectorization, a more-traditional loop
>         vectorization pass. I
>         think that we'll need a better loop analysis pass in order for
>         this to
>         happen. Some of this was started in LoopDependenceAnalysis,
>         but that
>         pass is not yet finished. We'll need something like this to
>         recognize
>         affine memory references, etc.
>         
>         I look forward to hearing everyone's thoughts.
>         
>          -Hal
>         
>         --
>         Hal Finkel
>         Postdoctoral Appointee
>         Leadership Computing Facility
>         Argonne National Laboratory
>         
>         _______________________________________________
>         LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory