[PATCH] D71053: [LV] Take overhead of run-time checks into account during vectorization.

Wed Mar 11 04:30:39 PDT 2020

fhahn added a comment.

In D71053#1878524 <https://reviews.llvm.org/D71053#1878524>, @ebrevnov wrote:

> @fhahn , Thanks for taking the time to watch!
>
> In D71053#1878424 <https://reviews.llvm.org/D71053#1878424>, @fhahn wrote:
>
> > > While 1) may look like a better approach I see it as very problematic in practice. That's why I decided to go with the 2).
> >
> > Sorry if I missed an earlier discussion about this, but why do you think that expanding the RT checks is very problematic in practice? Is it the cost of expanding the RT checks early (and maybe throwing them away later)?
> >  I am currently experimenting with this approach to see if it is feasible to use size estimates to better decide on the number of runtime checks (including mem ptr checks), regardless of the vectorisation mode.
>
>
> You slightly misread the approach suggested in 1). The wild idea was to not to generate any instructions at all. The suggestion is to make dry-run of SCEV but instead of generating code we could evaluate cost of instructions requested to be built. To make it less invasive we could for example pass custom Builder which will do that (not really sure if this is possible though). Another issue is that TTI still needs an instruction to evaluate cost.
>
> IIUC you suggest to actually built IR generated by SCEVExpander for run-time checks, evaluate it's cost and then disregard if not needed. I would be more than happy if we could do that. But  I believe current infrastructure doesn't allow to generate  side IR to play with and then disregard. AFAIK this limitation was the main motivation for VPlan introduction.

I finally had time to wrap up a proof-of-concept patch that creates the runtime checks up-front and drops them if not required: D75980 <https://reviews.llvm.org/D75980>. It seems to work quite well in practice.

> 
> 
>> The approach in this patch seems to delay & interleave parts of cost-modelling with codegen and adds quite a bit of additional plumbing through the codebase to do so.
> 
> Agree. It will be really great if we can find something more straightforward. Just to note that some of complexity of the current patch is necessity to support both current and new ways of doing cost modeling. In addition at the moment current patch targets short trip count loops only. We could streamline the code a bit more if make it default for all loops.
> 
>> IIUC, the basic idea is to enable vectorisation , if the cost of `runtime checks + vector cost < scalar cost`?  I think this might be too relaxed, as the cost of the RT checks have to be paid, even if they fail and we execute the scalar loop. And I guess the larger the number of checks, the larger the probability of one failing. Something related also came up in https://bugs.llvm.org/show_bug.cgi?id=44662 . I am not yet sure what better alternatives are there though.
> 
> Agree. Unfortunately don't have a good idea either. Just a note that we currently disregard this for trip count check which is always generated for non constant trip count loops.

Building on D75980 <https://reviews.llvm.org/D75980>  if have put up D75981 <https://reviews.llvm.org/D75981> which allows larger trip counts if the cost of the trip count is less than 0.5% of the expected scalar cost of the loop. The threshold is quite arbitrarily chosen (and would need further analysis/benchmarking to tune), but it might be a viable general idea.

I think it would be relatively straight forward to implement the logic in this patch on top of D75980 <https://reviews.llvm.org/D75980> as well, if there is agreement that building the RT checks up-front is preferable.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71053/new/

https://reviews.llvm.org/D71053