[llvm-dev] Design issues in LLVM IR

Thu Jun 10 13:44:59 PDT 2021

On Wed, Jun 9, 2021 at 6:19 PM Chris Lattner <clattner at nondot.org> wrote:

> Nikita Popov wrote a great block post last week: “Design issues in LLVM IR
> <https://www.npopov.com/2021/06/02/Design-issues-in-LLVM-IR.html>” that I
> just found.  It is well framed and nicely written, it seems like a good
> idea to discuss this on llvm-dev.  :-)
>
> Here are my 2c for what it is worth:
>
> a) I completely agree we should continue to invest in fixing the core of
> LLVM.  There are long standing issues that we should fix, and not doing so
> slows things down, leads to worse quality of results, etc.
>
> b) I completely agree with his framing on canonicalization and its value.
> I think that LLVM has historically taken this a bit too far (e.g. loop
> transformations, the old IndVar/LSR dichotomy among others) but many of
> those have already been walked back.
>
> c) I completely agree we need to continue to march towards opaque
> pointers, I’m a fan of this work.
>
> d) I’m less enthused about eliminating type based GEP.  The post is right
> that indexing computations are expensive, but that is largely due to the
> algorithms used, not the IR structure.  If this was the thing to fix, then
> we should fix other aspects of the design.  The thing that I’m particularly
> concerned about is array indexes: I think we need to preserve the ability
> to do simple dependence analysis and other array subscript indexing
> analyses in the middle end.  I think the sweet spot is to drop types from
> pointers, but keep them on GEPs.  Alternatively, finish the typeless
> pointer migration and then evaluate what to do with GEPs only when that
> completes.
>

Right, I don't think it makes sense to address this while we still have
pointer element types. That would require encoding an extra result pointer
type argument for GEPs, which just makes things worse.

Once we have opaque pointers, I do think it's important to canonicalize
GEPs in some way, as we do see optimization failures related to different
GEP representations. Using raw offset arithmetic seems the easiest way to
do that to me, as generating a canonical type is fairly awkward for more
involved expressions like %p + 3 * %i1 + 4 * %i2 + 5, and also runs into
DataLayout dependence issues (you can technically give i8 an alignment
greater 1, in case there is no longer any convenient type for byte
addressing).

I don't think removing GEP types would make much difference when it comes
to analyzing arrays. As far as GEPs are concerned, types like [4 x i32] do
not restrict valid indices (both -1 and 5 would be legal indices for that
type, and possibly even inbounds), so in the end they only provide size
information anyway. (The one exception here is the "inrange" attribute,
which is only available for constant expression GEPs.)

> e) Constant Expressions are a disaster.  In addition to the problem
> identified, there are also many annoying cases to deal with, eg. When
> constexprs exist in phi nodes, trapping constexprs, etc.  In my opinion,
> the fix is to eliminate them entirely, in a few steps:
>
>     1) Introduce a new “RelocatableConstant” object which is *not* a
> mirror of all the IR operations in LLVM, but is instead designed to be used
> in global variables and allows the standard “globalpointer+offset” pattern
> that object files support, and we should add a new MachoRelocatableConstant
> class to represent the “(gv1-gv2+offset)” relocations macho supports.  The
> presence of this would make codegen and frontends easier to write, and get
> rid of all the fiddly pattern matching stuff.  I think we need to talk
> about whether “offset” is a byte offset, or whether it is a series of
> (constant integer) field indexes in a GEP like operation.  I would argue
> for the later to make inter procedural optimizations easier to write, but
> it is debatable.
>

Something that isn't entirely clear to me is whether these two types of
constants cover everything that is supported. LLVM is happy to take
something like this:

@a = global i64 0
@g = global i64 sdiv (i64 ptrtoint (i64* getelementptr (i64, i64* @a, i64
1) to i64), i64 3)

And produce this kind of assembly from it:

g:
.quad (a+8)/3

The code that decides what is accepted in initializers is
https://github.com/llvm/llvm-project/blob/aaaeb4b160fe94e0ad3bcd6073eea4807f84a33a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp#L2445
and covers quite a few operations. Did this code just get over-generalized,
or is there some reason for the set of operations it supports?

Regards,
Nikita

    2) Move the general constant folding API off of ConstantExpr to
> somewhere else, it never should have been there for reasons pointed out in
> the blog.
>
>     3) Eliminate ConstExpr: after #1, we don’t need a mirror of the LLVM
> IR in constant nodes.  Constant folding should be a failable operation and
> would return the primitive nodes like ConstantInt.  The asmparser / byte
> code parser could auto upgrade general unfolded constexprs to instructions
> when in a function and to [Macho]RelocatableConstant
>
> In any case, I’d love to see progress on any of these.  I’d personally
> love to see the typeless pointers land because we’re in an unfortunate
> in-between state, and we should close off partial transitions.
>
> -Chris
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210610/e79e717f/attachment.html>