[PATCH] D86262: [LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass

Nemanja Ivanovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 20 15:54:41 PDT 2020


nemanjai added a comment.

In D86262#2228156 <https://reviews.llvm.org/D86262#2228156>, @fhahn wrote:

> In D86262#2228123 <https://reviews.llvm.org/D86262#2228123>, @nemanjai wrote:
>
>> I agree that the justification provided seems somewhat inadequate - using `memset/memcpy` as a canonical form and teaching DA to handle it seems quite reasonable. However, there are libraries that are highly tuned wrt. when to call these functions and when not to. So it seems to me perfectly reasonable to provide an option to disable this for such uses. For example, a function may perform a memcpy/memset operation on data that is known to only be called only from range-limited sites. The compiler (without PGO+LTO which are often not an option) cannot know that it should expand such calls in the back end.
>
> The issue you raise seems to be a cost-modeling issue, right? Ideally the compiler would know whether it is faster to use a loop or a memcpy. IIUC you are worried about the cases where the compiler replaces a loop with a memset, but the loop would be faster? Is there a way to improve the cost-modeling? IMO adding options like this potentially leads to papering over cost-modeling issues, rather than addressing the underlying issue. Also, such options tend to now work well together with LTO.

I suppose we can certainly treat this (at least in part) as a cost modeling issue - in that AFAICT, LIR does not do any cost modeling. If it is able to transform a loop into a `memcpy`, it does. If `memcpy` is the canonical form, this is OK as a canonicalization pass. However, given that we don't really get to expand these things into loops later on, it is a problem.

For `memcmp`, we added code (3a7578c6589b910f9a04bae7f7f121dfe3281578) to expand them that ultimately got moved out into a separate pass (063bed9baff63a0d716a5c9533cf2601dafbe0e0). I don't really remember the details of how much handling it has for non-constant lengths but presumably it does (or at least we have an obvious place to add it).
However, I do not think we have a similar pass for `memcpy` and `memset`. When we get into the SDAG, we have `TargetLowering::findOptimalMemOpLowering()` which can only handle constant lengths and presumably cannot expand into loops (being basic block local and all).

When we combine all of the above, code like this ends up with a call to `memcpy`:

  void smallcp(unsigned *__restrict a, unsigned *__restrict b, unsigned len) {
    for (unsigned i = 0; i < (len % 4); i++)
      a[i] = b[i];
  }

And my comment regarding "highly tuned libraries" was really meant to suggest a situation where for example, the `rem` is on the caller side in a different module so even a perfect cost model wouldn't help (well without LTO anyway).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86262/new/

https://reviews.llvm.org/D86262



More information about the llvm-commits mailing list