[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Thu Jan 26 12:34:33 PST 2012

On Tue, Jan 24, 2012 at 6:41 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> enabling vectorization gets the performance down by 80% on ARM.
>> I will prepare a reduced testcase and try to find out the reason.
>> As a first shot, I would say that this comes from the vectorization of
>> code in a loop and the overhead of transfer between scalar and
>> vector registers.
>
> This is good; as has been pointed out, we'll need to develop a
> vectorization cost model for this kind of thing to really be successful,
> and so we should start thinking about that.
>
> The pass, as implemented, has an semi-implicit cost model which says
> that permutations followed by another vector operation are free, scalar
> -> vector transfers are free, and vectorizing a memory operation is just
> as good as vectorizing an arithmetic operation. Depending on the system,
> these may all be untrue (although on some systems they are true).
>
> If you can generate a test case that would be great, I'd like to look at
> it.

Here is the testcase with calls to gettimeofday to measure time spent
in the kernel and not in the ini/fini phases.
On ARM I saw around 5 to 6x slowdown in the vector version.
I haven't tried this on x86 yet but that should also produce slowdowns
as the cost between scalar and vector regs is non null there as well.

Sebastian
--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-csrc
Size: 891 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120126/08013f8f/attachment.c>