[PATCH] Improve Cost model for SLPVectorizer when we have a vector division by power of 2

Thu Aug 21 10:16:58 PDT 2014

On Thu, Aug 21, 2014 at 5:47 PM, Sanjay Patel <spatel at rotateright.com> wrote:
>>>! In D4971#29, @andreadb wrote:
>>
>> We  can safely do this only for UDIV. UDIV by pow-2 is always converted to a SRL. This is true for all targets.
>
> Does it make sense to split this patch into 2 pieces then? One that handles UDIV universally, and then a follow-on for SDIV. I was going to suggest additional test cases for each op anyway. :)
>

I think it makes perfect sense :-).
I have a (maybe stupid) question: do we really have to worry about the
case of UDIV by powers-of-2 in the vectorizer?
I am asking this because the optimizer would always convert an UDIV by
powers-of-2 into a SRL. So, by the time we run the vectorizer, all the
foldable UDIV by power-of-2 have been already optimized into SRL..

In general I don't have a strong opinion about what to do in this
case; I let other people with more knowledge than me on the vectorizer
to decide :-).

>
>> However, we cannot guarantee that SDIV is treated the same way by all targets.
>> How SDIV gets expanded in the backend really depends is always target specific.
>> By default, SDIV is expanded into a sequence of shifts+adds (this is the behavior on X86). Other targets may not implement that same default behavior.
>> For example, Aarch64 custom expands SDIV bu Pow2 in a different way (see AArch64TargetLowering::BuildSDIVPow2).
>
> Right - Aarch64 has extra goodness via the rounding constant in "usra"; this is shown in the bug ( http://llvm.org/bugs/show_bug.cgi?id=20714 ) that Jim filed.
>
> But we could still have a conservative upper cost bound for all targets? As you noted, division by exactly "2" costs one inst less, and there's one more instruction if we change this code to handle negative divisors, so we're not getting an exact cost value in any case.
>

If the default SRA+SRL+ADD+SRA was meant to be a "worst case
scenario", then yes, we can hoist the code from X86TTI.
I honestly don't know if that was intended to be the case though :-(.

>> Also, some targets may want to define TLI.isPow2DivCheap... so, as you can see, the problem is complicated.
>
> I think PPC is the only arch that sets it (which seems like a bug to me, but I'm probably missing the reason; the PPC backend turns scalar signed int pow2div into sra/addze anyway).
>
> I assume the intent of that flag is to say that the HW itself recognizes pow2div (signed or unsigned?) and can do it just as fast as a shift. But I'm not aware of any vector ISA that even includes an integer division instruction.
>
> http://reviews.llvm.org/D4971
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits