[LLVMdev] ARM vectorizer cost model
nrotem at apple.com
Thu Jan 10 15:00:39 PST 2013
On Jan 10, 2013, at 2:19 PM, Renato Golin Linaro <renato.golin at linaro.org> wrote:
> I'm also thinking about the individual instructions cost (getArithmeticInstrCost, getShuffleCost, etc). That can be a simple and easily parallelized task. I got the A9 manual that has the cost of all instructions (including NEON and VFP), that should give us a head start.
Thanks for working on this!
Some of the costs for the arithmetic operations should be handled automatically by the BasicTTI (which asks TartetLowering if the type and operations are legal). We need to have cost tables for things like "trunk <4 x i64> to <4 x i8>" because even TLI does not know how custom operations gets lowered.
> I'm guessing the cost you already have for Intel and the BasicTTI is in "ideal cycle count", not taking into consideration the time available to get the results or pipeline stalls, etc.
Yes. Throughput numbers, assuming that all branches are predicated and all memory is in L1.
> Yes, this direct access is very convenient. For now, I'll focus on A9 and later we can add the subtleties of each sub-target.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev