[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
Sun Aug 16 17:42:16 PDT 2015

This is related to an issue in loop strength reduction [1] that I've
been trying to fix on and off for a while.  [1] has a more detailed
description of the issue and an example, but briefly put, I want LSR
to consider formulae that have "Zext T" as base and/or scale
registers, and to appropriately rate such formulae.

My first attempt[2] at fixing this was buggy and had to be reverted.
While it is certainly possible to "fix" the bug in [2], the fixed
version is ugly and fragile, and I'd prefer having something cleaner.

The key issue here is that currently there is no way to represent
"Zext T" as a *distinct computation* within SCEV -- a SCEV expression
of the form "Zext T" (any SCEV expression, actually) is always
maximally simplified on construction.  This means I cannot "factor
out" a Zext from a base register (say) and force SCEV to maintain it
as a "SCEVZeroExtend T"; SCEV is free to re-simplify "SCEVZeroExtend
T" to something equivalent on construction.  This is not just an issue
in theory -- factoring out a 'SCEVZeroExtend' from an expression 'E'
requires us to prove that 'E' == 'SCEVZeroExtend T'.  SCEV can use
that exact same fact to "simplify" 'SCEVZeroExtend T' back to 'E' on
construction.  There is an analogous issue w.r.t. sign extension

There are two possibilities I see here -- the problem can be solved within
LoopStrengthReduce or within ScalarEvolution.

# Within LoopStrengthReduce

Solving it within LoopStrengthReduce is tricky: due to the
representation issue in SCEV mentioned above, the fact that a scale or
base register is actually a narrower than needed and has to be zero /
sign extended before use needs to be represented separately from the
SCEV*.  Every place in LoopStrengthReduce that analyzes and
manipulates registers needs to be updated to understand this bit of
side information ([2] was buggy because not every place in LSR had
been updated this way in the change).

The cleanest approach I could come up with is to introduce a `class
Register` that wraps a SCEV and some associated metadata; and change
LSR to reason about registers using this `Register` class (and not raw
SCEV pointers).  All of the "this i64 register is actually a zero
extended i32 register" can then be made to live within this `Register`
class, making the structure of the code more obvious.

# Within ScalarEvolution

Another way to fix the problem is to introduce a new kind of SCEV node
-- a `SCEVOpaqueExpr`.  `SCEVOpaqueExpr` wraps another SCEV
expression, and prevents optimizations from optimizing across the
`SCEVOpaqueExpr` boundary.  This will allow us to keep representing
registers as SCEV pointers as they are today while solving the zext
representation problem.  After factoring a SCEV expression 'E' into
'SCEVZeroExtend T', 'SCEVZeroExtend T' can be "frozen" by representing
it as 'SCEVZeroExtend (SCEVOpaqueExpr T)'.  Since 'SCEVZeroExtend
(SCEVOpaqueExpr T)' computes the same value as 'SCEVZeroExtend T' ==
'E', there is no need to update the parts of LSR that analyze / modify
registers to preserve correctness.

Note: this type of "opaque" node also can be used to force the SCEV
expander to generate a particular sequence of expressions; something
that is achieved by the "expand(getSCEVUnknown(expand(Op)))" idiom in
LSR currently.

So far I'm leaning towards the second approach, but given that it would
be a substantial change (though potentially useful elsewhere), I'd like to
hear what others think before diving in.

-- Sanjoy

[1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html
[2]: http://reviews.llvm.org/rL243939

More information about the llvm-dev mailing list