[PATCH] D75981: [LV] Allow large RT checks, if they are a fraction of the scalar cost.

Tue Aug 17 02:54:19 PDT 2021

ebrevnov added a comment.

In D75981#2947904 <https://reviews.llvm.org/D75981#2947904>, @fhahn wrote:

> Rebased again after recent changes.
>
>>>>> That's a good point, but I think that would be better as separate change, because that's a more aggressive change than replacing existing limit. IIUC that's more in line with your D71053 <https://reviews.llvm.org/D71053>.
>>>
>>> You are right, I'm essentially asking to follow D71053 <https://reviews.llvm.org/D71053>. First of all, in sake of doing progress I'm not going to block this change if you promise continue working on cost model driven approach.
>>> But I personally think that it would save a lot of time if we go with cost model based approach in the first place because most time consuming thing would be fixing performance regressions and not the implementation itself. I will leave it on you to decide :-).
>
> The way I see it the current patch is already using a cost-model based approach: it already computes the cost of the runtime checks and the cost of the scalar loop and compares them.
>
> The formula used in the patch initially is conservative I think, in that it allows larger runtime checks only if them failing only adds a small overhead to the cost of scalar loop in total.
>
> Of course we can choose other formulas, e.g. computing the cost of all vector iterations + RT checks and compare it against the cost of all scalar iterations. This is more optimistic as it assumes the runtime checks succeed.
>
> The main reason I went for the conservative approach initially was because we are already seeing regressions in benchmarks caused by runtime checks for vector loops never taken. I don't want to make this problem worse for now.
>
> Personally I'd prefer to start with a more conservative heuristic to start with, see how it goes & iron out issues in the infrastructure.
>
> How does that sound?

I'm fine to go with more conservative heuristic. The change I would like to see is to move runtime checks cost calculation inside cost model. This way CM.selectVectorizationFactor would return VF with cost of runtime checks already taken into account. It would be a bit inconvenient to "merge" with existing limits for runtime checks though. What if we just effectively disable current limits by putting them under an option for now and delete entire eventually?

>>>> I'm not sure that there's much benefit at the moment, because there will be no changes. The focus of those tests seems to be more about vectorizing small trip count loops with an epilogue and not the cost of memory runtime checks (there are no memory runtime checks for the test I think)
>>
>> I believe both test cases have vectorization with runtime checks. Look for "; CHECK:       vector.memcheck:"
>
> Ah I see, thanks! I think I meant that the patch seems to lift a different but related limitation (tiny trip count vectorization with epilogue) vs this patch which deals with the runtime check threshold.

I think I understand now. Indeed, for the tests to make sense we would need to allow vectorization of short trip count loops with runtime checks. The question is taken off.

>>> But clearly, the current cut-offs are just bogusly low.
>>> But i also guess simply bumping them won't really solve the problem,
>>> so i guess we need to redefine them. But what is the right metric,
>>> especially if the trip count is not constant?
>>> Cost of a single scalar loop iteration?
>>
>> May be not the best approach, but taking some reasonable average across applications works good enough. Another metric to consider is benefit from vectorization. Thus loops with expected 3x improvement should be more likely vectorized than 1.1x.
>
> Agreed, for cases where the trip count is unknown we will have to still make an educated guess. It should still be better/more informed than the single number cut-off for the number of runtime checks. But as I said, I think we should start with the cases where the trip count is known, make sure it works well for that case and move on from there. This also gives us time to iron out any issues with the infrastructure.

Agree. This is unrelated to the current patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75981/new/

https://reviews.llvm.org/D75981