[PATCH] D100381: [RFC] Improve loop distribute cost model

Wed Apr 21 05:32:54 PDT 2021

fhahn added a comment.

In D100381#2704777 <https://reviews.llvm.org/D100381#2704777>, @sanwou01 wrote:

> Looking at TSVC a bit, @xbolva00 :
>
> - s221 <https://github.com/UoB-HPC/TSVC_2/blob/badf9adb2974867ac0937718d85a44dec6dec95a/src/tsvc.c#L1019> won't distribute because the second read of `a[i]` is removed by EarlyCSE, so there is no unique load instruction for a second loop. For this loop I'm not convinced that distribution is likely to help performance; it's a trade-off between (some) vectorization and re-loading both `a[i]` and `d[i]`.
> - s222 <https://github.com/UoB-HPC/TSVC_2/blob/badf9adb2974867ac0937718d85a44dec6dec95a/src/tsvc.c#L1061> also gets mangled by EarlyCSE, but the result would still be distributable if it weren't for the order of the stores to `e[i]` and `a[i]`. This runs into a limitation of LoopAccessAnalysis, which can't reorder instructions. Perhaps it could help to do a bit of scheduling on IR?
> - s2275 <https://github.com/UoB-HPC/TSVC_2/blob/badf9adb2974867ac0937718d85a44dec6dec95a/src/tsvc.c#L1794> as mentioned above, this runs into another LoopAccessAnalysis limitation: it only handles innermost loops. I'm not sure how easy (if at all possible) it would be to lift that restriction.
>
> So, unfortunately, it looks like we can't handle these loops without some pretty fundamental changes to LoopAccessAnalysis. Thoughts?

Yes, those are known limitations and I would recommend focusing on showing loop-distribute's value with current LAA.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100381/new/

https://reviews.llvm.org/D100381