[LLVMdev] SSA or not SSA?

Thu Jul 17 05:48:04 PDT 2008

[Note: moderately off-topic; this is an attempt to explain the rationale
of SSA that is not particular to LLVM. The relevance is that it may help
some new front-end writers grok the LLVM IR faster. ]

On Thu, 2008-07-17 at 08:13 -0400, Daniel Berlin wrote:
> On Thu, Jul 17, 2008 at 6:34 AM, Matthieu Moy <Matthieu.Moy at imag.fr> wrote:
> > Patrick Meredith <pmeredit at uiuc.edu> writes:
> > In short, "Single Assignment" means "Single '='", not "Single
> > 'store'".
> 
> Sure, because store is not an assignment.
> It would be impossible to correctly rename memory for all programs,
> since may-alias is undecidable at compile time and must-alias is
> uncomputable at compile time.
> The best you can do are conservative approximations, which would mean
> sometimes your ssa form would be wrong.

What you say is true in the general case, but there are a few languages
for which this is not true, notably Haskell (because the I/O monad
remains purely functional from a semantics perspective). Also, in
strongly typed languages we can often induce a partition on the heap by
type or by potential interference. The SafeCode work uses both
techniques in a nicely integrated way.

Going back to the original question though, perhaps the following will
help explain what is going on with SSA conceptually:

SSA is a compromise. In an idealized (though perhaps boring) world, what
a compiler writer would like to see is a world in which every identifier
is bound (note: not assigned) exactly once and is never mutated. Such an
intermediate form would be purely functional. This would make various
forms of dependency analysis exceptionally easy, and it would allow us
to substitute expressions for their variables everywhere (term
substitution), or conversely it would let us do that substitution in
reverse (a.k.a. common sub-expression elimination). I'll call this
"static single binding" form (SSB).

In practice, given the languages that we need to compile in the real
world, we can't get there. We need writable locations in the heap and
also writable variables on the stack.

Even though we need these things, restricting our IR graph so that it
only allows mutation in restricted places is still very useful. It
concentrates all of the hard parts of things like dependency analysis on
a few places, and it still lets us do term substitution and common
sub-expression most of the time. It gives us an IR form where we can
analyze which parts of the code are dependent on assignment and which
are not. For these reasons we want to stay as close to the "pure" form
SSB that we can, admitting only enough pragmatism to express the
languages we are trying to compile.

So a typical SSA form makes two compromises relative to SSB.

  1. Since we cannot (in general) analyze the heap for aliasing issues,
     we allow updates in the heap arbitrarily. This is why stores
     to i32* are okay where stores to i32 are not.

     Additionally, stores to heap are intimately tied up with data
     structure representations, and therefore with the semantics of
     locations, where stores to stack are much less so. The issue of
     location semantics generally stops us from "renaming" heap
     locations in the kinds of ways that we can rename stack-based
     values.

  2. We allow updates (assignments) on stack locations, but only in
     specialized and carefully defined SSA graph nodes.

Don't know if any of that helps. None of it is particular to LLVM, but
perhaps it will help you think about how to generate IR more
effectively.

shap