[all-commits] [llvm/llvm-project] 6253b7: [SLP] Respect control dependence within a block du...

Sat Mar 19 13:36:40 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 6253b77da9f3fd7ee3286a2b4d39e87b0f0f55f5
      https://github.com/llvm/llvm-project/commit/6253b77da9f3fd7ee3286a2b4d39e87b0f0f55f5
  Author: Philip Reames <listmail at philipreames.com>
  Date:   2022-03-19 (Sat, 19 Mar 2022)

  Changed paths:
    M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    M llvm/test/Transforms/SLPVectorizer/X86/control-dependence.ll

  Log Message:
  -----------
  [SLP] Respect control dependence within a block during scheduling

This fixes an active miscompile visible in the test changes.  The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block.  The result is that we could (and in some rare cases *did*) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any.

Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops.
* The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory.  In theory, such a case is possible, but not likely in practice.
* The later case is likely more of an issue in practice.  After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress.  Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue.  I'm a bit shocked we don't seem to have a bug report which hit this in real code actually.

Compile time wise, this results in a single extra scan of the scheduling window in the common case.  Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once.  The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan.  So the potentially expensive case is when we a call in a big schedule window which is frequently extended.  We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered.  That same case is already degenerate with memory dependences which is more expensive than the control dependence scan.

We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now.

This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.