[LLVMdev] loop vectorizer

Wed Nov 6 07:42:25 PST 2013

On 06/11/13 08:54, Arnold wrote:
>
>
> Sent from my iPhone
>
> On Nov 5, 2013, at 7:39 PM, Frank Winter <fwinter at jlab.org 
> <mailto:fwinter at jlab.org>> wrote:
>
>> Good that you bring this up. I still have no solution to this 
>> vectorization problem.
>>
>> However, I can rewrite the code and insert a second loop which 
>> eliminates the 'urem' and 'div' instructions in the index 
>> calculations. In this case, the inner loop's trip count would be 
>> equal to the SIMD length and the loop vectorizer ignores the loop. 
>> Unrolling the loop and SLP is not an option, since the loop body can 
>> get lengthy.
>>
>> What would be a quicker to implement:
>>
>> a) Teach the loop vectorizer the 'urem' and 'div' instructions, or
>
> This would probably be harder because your individual accesses are 
> consecutive within a stride.
>
> a[0] a[1] a[3] a[4]  a[9] a[10] a[11] a[12]
>
> Not something the loop vectorizer currently understands.
>> b) have the loop vectorizer process loops with trip count equal to 
>> the vector length ?
>
> You should be able to change "TinyTripCountVectorThreshold" in 
> loopvectorizer.cpp

I managed to set this option when using 'opt' tool. Is there a way to 
set it when using the API without changing the default value in the 
source code and recompiling LLVM?

>>
>> One of both solutions will be needed, I guess.
>>
>> Frank
>>
>>
>>
>> On 05/11/13 22:12, Andrew Trick wrote:
>>>
>>> On Oct 30, 2013, at 11:21 PM, Renato Golin <renato.golin at linaro.org 
>>> <mailto:renato.golin at linaro.org>> wrote:
>>>
>>>> On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org 
>>>> <mailto:fwinter at jlab.org>> wrote:
>>>>
>>>>           const std::uint64_t ir0 = (i+0)%4;  // not working
>>>>
>>>>
>>>> I thought this would be the case when I saw the original 
>>>> expression. Maybe we need to teach module arithmetic to SCEV?
>>>
>>> I let this thread get stale, so here’s the background again:
>>>
>>> source:
>>>
>>>       const std::uint64_t ir0 = i%4 + 8*(i/4);
>>>       c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>>>
>>> before instcombine:
>>>
>>>   %4 = urem i64 %i.0, 4
>>>   %5 = udiv i64 %i.0, 4
>>>   %6 = mul i64 8, %5
>>>   %7 = add i64 %4, %6
>>>   %8 = getelementptr inbounds float* %a, i64 %7
>>>
>>> after instcombine:
>>>
>>>   %2 = and i64 %i.04, 3
>>>   %3 = lshr i64 %i.04, 2
>>>   %4 = shl i64 %3, 3
>>>   %5 = or i64 %4, %2
>>>   %11 = getelementptr inbounds float* %c, i64 %5
>>>   store float %10, float* %11, align 4, !tbaa !0
>>>
>>> Honestly, I don't understand why InstCombine "anti-canonicalizes" 
>>> add->or. I think that transformation should be deferred into we 
>>> begin target-specific lower (e.g. InstOptimize pass).
>>>
>>> Given, that we aren't going to change that any time soon, SCEV could 
>>> probably be taught to recognize the specific pattern:
>>>
>>> Instructions (or (and %a, C1), (shl %b, C2)) -> SCEV (add %a, %b)
>>>
>>> -Andy
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131106/380a4a29/attachment.html>