[PATCH] D100381: [RFC] Improve loop distribute cost model

Wed Apr 21 05:09:35 PDT 2021

sanwou01 added a comment.

Looking at TSVC a bit, @xbolva00 :

- s221 <https://github.com/UoB-HPC/TSVC_2/blob/badf9adb2974867ac0937718d85a44dec6dec95a/src/tsvc.c#L1019> won't distribute because the second read of `a[i]` is removed by EarlyCSE, so there is no unique load instruction for a second loop. For this loop I'm not convinced that distribution is likely to help performance; it's a trade-off between (some) vectorization and re-loading both `a[i]` and `d[i]`.
- s222 <https://github.com/UoB-HPC/TSVC_2/blob/badf9adb2974867ac0937718d85a44dec6dec95a/src/tsvc.c#L1061> also gets mangled by EarlyCSE, but the result would still be distributable if it weren't for the order of the stores to `e[i]` and `a[i]`. This runs into a limitation of LoopAccessAnalysis, which can't reorder instructions. Perhaps it could help to do a bit of scheduling on IR?
- s2275 <https://github.com/UoB-HPC/TSVC_2/blob/badf9adb2974867ac0937718d85a44dec6dec95a/src/tsvc.c#L1794> as mentioned above, this runs into another LoopAccessAnalysis limitation: it only handles innermost loops. I'm not sure how easy (if at all possible) it would be to lift that restriction.

So, unfortunately, it looks like we can't handle these loops without some pretty fundamental changes to LoopAccessAnalysis. Thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100381/new/

https://reviews.llvm.org/D100381