[all-commits] [llvm/llvm-project] bd7535: [LV] Fix a conceptual mistake around meaning of un...

Thu Jul 21 15:53:18 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: bd75350180a2b830daf8cd406b9c9aee44e4e18c
      https://github.com/llvm/llvm-project/commit/bd75350180a2b830daf8cd406b9c9aee44e4e18c
  Author: Philip Reames <preames at rivosinc.com>
  Date:   2022-07-21 (Thu, 21 Jul 2022)

  Changed paths:
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    M llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
    M llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
    M llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

  Log Message:
  -----------
  [LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst

This code confuses LV's "Uniform" and LVL/LAI's "Uniform".  Despite the
common name, these are different.
* LVs notion means that only the first lane *of each unrolled part* is
  required.  That is, lanes within a single unroll factor are considered
  uniform.  This allows e.g. widenable memory ops to be considered
  uses of uniform computations.
* LVL and LAI's notion refers to all lanes across all unrollings.

IsUniformMem is in turn defined in terms of LAI's notion.  Thus a
UniformMemOpmeans is a memory operation with a loop invariant address.
This means the same address is accessed in every iteration.

The tweaked piece of code was trying to match a uniform mem op (i.e.
fully loop invariant address), but instead checked for LV's notion of
uniformity.  In theory, this meant with UF > 1, we could speculate
a load which wasn't safe to execute.

This ends up being mostly silent in current code as it is nearly
impossible to create the case where this difference is visible.  The
closest I've come in the test case from 54cb87, but even then, the
incorrect result is only visible in the vplan debug output; before this
change we sink the unsafely speculated load back into the user's predicate
blocks before emitting IR.  Both before and after IR are correct so the
differences aren't "interesting".

The other test changes are uninteresting.  They're cases where LV's uniform
analysis is slightly weaker than SCEV isLoopInvariant.