[PATCH] D30732: LoopVectorizer: let target limit memory intensive loops

Thu Mar 9 04:15:38 PST 2017

hfinkel added a comment.

In https://reviews.llvm.org/D30732#696453, @jonpa wrote:

> In https://reviews.llvm.org/D30732#695820, @mssimpso wrote:
>
> >
>

...

> 
> 
>> Let me make sure that I understand: Are you trying to limit the number of store instructions/loop or are you trying to limit the numbers of stores/cycle?
> 
> It's about limiting the general input of stores into the processor.

Please be very clear. Let me explain:

1. Imagine a processor that has a special cache for loop instructions, and this loop cache can only hold N stores. In this case, it is really important that the loop have no more than N store instructions.
2. Imagine a processor that cannot sustain one store/cycle because of limited bandwidth or other resources. Over N cycles, only M (< N) stores can be processed. This might or might not be related to the size of the store.

I think you're in situation (2) where the size of the store does not matter (since you described some kind of tag resource). In that case, please don't count stores in the loop. You'll want to get the anticipated loop cost (which is the closest thing we have to a cycle count) for each VF and go from there. You might also limit the VF based on the number of stores that would appear in a row, because that will always exceed your limit and stall something.

If you're in (1), then your limit probably makes sense only for loops that will fit in the relevant cache.

In short, vectorized loops can be quite large and, whatever you're doing, just counting stores sounds like an insufficiently-precise model.

https://reviews.llvm.org/D30732