[all-commits] [llvm/llvm-project] 7f5520: [SCEV] Extend trip count to avoid overflow by default

Mon Oct 11 09:56:10 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7f55209cee55fa2f7d5954f7ec7df77d90585a7b
      https://github.com/llvm/llvm-project/commit/7f55209cee55fa2f7d5954f7ec7df77d90585a7b
  Author: Philip Reames <listmail at philipreames.com>
  Date:   2021-10-11 (Mon, 11 Oct 2021)

  Changed paths:
    M llvm/include/llvm/Analysis/ScalarEvolution.h
    M llvm/lib/Analysis/ScalarEvolution.cpp
    M llvm/lib/Transforms/Scalar/LoopFlatten.cpp

  Log Message:
  -----------
  [SCEV] Extend trip count to avoid overflow by default

As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined

There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8.

In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so.

When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case.

This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow.

I want to emphasize that this change is *not* NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case.

Differential Revision: https://reviews.llvm.org/D110587