[PATCH 2/3] ARM cost model: Address computation in vector mem ops not free

Thu Feb 7 11:47:29 PST 2013

This must be code from an older patch.

The latest patch has this:
@@ -3223,10 +3232,12 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
   // TODO: We need to estimate the cost of intrinsic calls.
   switch (I->getOpcode()) {
   case Instruction::GetElementPtr:
-    // We mark this instruction as zero-cost because scalar GEPs are usually
-    // lowered to the intruction addressing mode. At the moment we don't
-    // generate vector geps.
+    // We mark this instruction as zero-cost because the cost of GEPs in
+    // vectorized code depends on whether the corresponding memory instruction
+    // is scalarized or not. Therefore, we handle GEPs with the memory
+    // instruction cost.
     return 0;
+

I attached the latest patch again. It has to be rebase against master (due to the memory cost refactoring)



On Feb 7, 2013, at 1:42 PM, Nadav Rotem <nrotem at apple.com> wrote:

> +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp
> @@ -3045,7 +3045,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
>      // We mark this instruction as zero-cost because scalar GEPs are usually
>      // lowered to the intruction addressing mode. At the moment we don't
>      // generate vector geps.
> -    return 0;
> +    return TTI.getAddressComputationCost(VectorTy);
> +
> 
> We include the cost of GEPs when we calculate the Load/Store costs.   Are you worried about cases where GEPs is not consumed by load/stores ? 
> 
> Thanks,
> Nadav
> 
> 
> 
> On Feb 7, 2013, at 11:32 AM, Renato Golin <renato.golin at linaro.org> wrote:
> 
>> On 7 February 2013 14:31, Arnold <aschwaighofer at apple.com> wrote:
>> I agree with you, it is unfortunate. However, I am trying to model an idiosyncrasy of the processor that has a big implication on performance. It is very expensive on swift if you happen to load into a S register, or D sub lane. Two such instructions are not pipelined but sequentialized.
>> 
>> In that case, the cost will be much more than 2 or 3, no?
>> 
>> 
>> Stride has the value of the isConsecutivePtr method:
>> 
>> Ok, in the original code you had:
>> 
>> if (Stride < 0)
>>   return parent::cost();
>> return Cost;
>> 
>> In this you have:
>> 
>> if (Stride > 0)
>>   return Cost;
>> return parent::cost();
>> 
>> It seems you're missing the case where it's == 0, but I can't tell which way it should go.
>> 
>> 
>> I don't think we need a function call for the value 3 here. It is a value just like any other that is returned by TTI.
>> 
>> What I'm trying to say is that this value seems to come out of the blue. I could be wrong, obviously, but it seems to me that you're experimenting with a micro-benchmark and fine-tuning to your particular example, which is dangerous on a wider perspective.
>> 
>> I understand that this might be a big hit on a set of examples, but we should get some constants out, just to make it clear that we're not talking about "idealized cycle count", but something else entirely.
>> 
>> Like:
>> 
>> const int AVOID_AT_ALL_COSTS = 100;
>> const int DANGEROUS_IN_MOST_CASES = 10;
>> const int NOT_GOOD_BUT_COULD_BE_OK = 5;
>> 
>> etc...
>> 
>> cheers,
>> --renato
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130207/5c082118/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-cost-model-Address-computation-in-vector-mem-ops.patch
Type: application/octet-stream
Size: 15454 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130207/5c082118/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130207/5c082118/attachment-0001.html>