[PATCH] Improve Cost model for SLPVectorizer when we have a vector division by power of 2

Thu Aug 21 09:47:42 PDT 2014

>>! In D4971#29, @andreadb wrote:
> 
> We  can safely do this only for UDIV. UDIV by pow-2 is always converted to a SRL. This is true for all targets.

Does it make sense to split this patch into 2 pieces then? One that handles UDIV universally, and then a follow-on for SDIV. I was going to suggest additional test cases for each op anyway. :)

> However, we cannot guarantee that SDIV is treated the same way by all targets.
> How SDIV gets expanded in the backend really depends is always target specific.
> By default, SDIV is expanded into a sequence of shifts+adds (this is the behavior on X86). Other targets may not implement that same default behavior.
> For example, Aarch64 custom expands SDIV bu Pow2 in a different way (see AArch64TargetLowering::BuildSDIVPow2).

Right - Aarch64 has extra goodness via the rounding constant in "usra"; this is shown in the bug ( http://llvm.org/bugs/show_bug.cgi?id=20714 ) that Jim filed. 

But we could still have a conservative upper cost bound for all targets? As you noted, division by exactly "2" costs one inst less, and there's one more instruction if we change this code to handle negative divisors, so we're not getting an exact cost value in any case.

> Also, some targets may want to define TLI.isPow2DivCheap... so, as you can see, the problem is complicated.

I think PPC is the only arch that sets it (which seems like a bug to me, but I'm probably missing the reason; the PPC backend turns scalar signed int pow2div into sra/addze anyway). 

I assume the intent of that flag is to say that the HW itself recognizes pow2div (signed or unsigned?) and can do it just as fast as a shift. But I'm not aware of any vector ISA that even includes an integer division instruction.

http://reviews.llvm.org/D4971