[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Thu Jan 26 12:49:53 PST 2012

On Thu, 2012-01-26 at 14:34 -0600, Sebastian Pop wrote:
> On Tue, Jan 24, 2012 at 6:41 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> enabling vectorization gets the performance down by 80% on ARM.
> >> I will prepare a reduced testcase and try to find out the reason.
> >> As a first shot, I would say that this comes from the vectorization of
> >> code in a loop and the overhead of transfer between scalar and
> >> vector registers.
> >
> > This is good; as has been pointed out, we'll need to develop a
> > vectorization cost model for this kind of thing to really be successful,
> > and so we should start thinking about that.
> >
> > The pass, as implemented, has an semi-implicit cost model which says
> > that permutations followed by another vector operation are free, scalar
> > -> vector transfers are free, and vectorizing a memory operation is just
> > as good as vectorizing an arithmetic operation. Depending on the system,
> > these may all be untrue (although on some systems they are true).
> >
> > If you can generate a test case that would be great, I'd like to look at
> > it.
> 
> Here is the testcase with calls to gettimeofday to measure time spent
> in the kernel and not in the ini/fini phases.
> On ARM I saw around 5 to 6x slowdown in the vector version.
> I haven't tried this on x86 yet but that should also produce slowdowns
> as the cost between scalar and vector regs is non null there as well.

Thanks! Did you compile with any non-default flags other than -mllvm
-vectorize?

 -Hal

> 
> Sebastian
> --
> Qualcomm Innovation Center, Inc is a member of Code Aurora Forum

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory