[PATCH] Calculate vectorization factor using the narrowest type instead of widest type

Tue Apr 14 11:25:14 PDT 2015

On Tue, Apr 14, 2015 at 8:49 AM, Chandler Carruth <chandlerc at google.com>
wrote:

> I've replied to some of the higher level concerns already, but I wanted to
> point out one specific thing:
>
> On Fri, Apr 10, 2015 at 3:30 AM Cong Hou <congh at google.com> wrote:
>
>> LLVM uses the widest type to calculate the maximum vectorization factor,
>> which greatly limits the bandwidth of either calculations or loads/stores
>> from SIMD instructions. One example is converting 8-bit integers to 32-bit
>> integers from arrays in a loop: currently the VF of this simple loop is
>> decided by 32-bit integer and for SSE2 it will be 4. Then we will have 1
>> load and 1 store in every 4 iterations. If we calculate VF based on 8-bit
>> integer, which will be 16, we will have 1 load and 4 stores in every 16
>> iterations, saving many loads.
>>
>
> While I'm generally in favor of this kind of change, I think the test case
> you're looking at is actually a separate issue that I've written up several
> times w.r.t. our vectorizer.
>

You mean fp64_to_uint32-cost-model.ll? I think you are right. My patch
invalidates this test and that is why I need to change the test criteria.

>
> Because C does integer promotion from 8-bit integer types to 32-bit
> integer types, we very commonly see things that are vectorized with 32-bit
> integer math when they don't need to.
>

Yes, the promotion to 32-bit integers are quite annoying to vectorizer: too
many packing/unpacking instructions will be generated which could be
eliminated if directly doing operations on 8-bit integers won't affect the
results.

>
> The IR can't narrow these operations from 32-bit integer operations to
> 8-bit integer operations without losing information because in 8-bits the
> operations might overflow. But when vectorizing, we don't care about this.
> We should aggressively narrow operations above a trunc which we could hoist
> the trunc above by stripping overflow flags while building the vectorizable
> operation tree so that we can fit more operations into a single vector.
> Does that make sense?
>

That is also what I am thinking about. If LLVM supports pattern recognition
(like in GCC), we could recognize this type-promotion-then-demotion as a
pattern then generate better vectorized code. The pattern recognizer can
also help generate better SIMD code for dot-product/SAD/widening
operations. I am not sure how the SAD patch is implemented and hope we
could have a general way to detect those patterns.

Cong

>
> -Chandler
>
>
>>
>> This patch mainly changes the function getWidestType() to
>> getNarrowestType(), and uses it to calculate VF.
>>
>> http://reviews.llvm.org/D8943
>>
>> Files:
>>   lib/Target/X86/X86TargetTransformInfo.cpp
>>   lib/Transforms/Vectorize/LoopVectorize.cpp
>>   test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll
>>   test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll
>>
>> EMAIL PREFERENCES
>>   http://reviews.llvm.org/settings/panel/emailpreferences/
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150414/75b0cbdd/attachment.html>