[llvm-dev] Design issues in LLVM IR

Thu Jun 10 03:01:31 PDT 2021

> a) I completely agree we should continue to invest in fixing the core of
> LLVM.  There are long standing issues that we should fix, and not doing so
> slows things down, leads to worse quality of results, etc.
>

Absolutely!

> b) I completely agree with his framing on canonicalization and its value.
> I think that LLVM has historically taken this a bit too far (e.g. loop
> transformations, the old IndVar/LSR dichotomy among others) but many of
> those have already been walked back.
>
> c) I completely agree we need to continue to march towards opaque
> pointers, I’m a fan of this work.
>

+1. Also +1 on the point that partial transitions are the worst :)

d) I’m less enthused about eliminating type based GEP.  The post is right
> that indexing computations are expensive, but that is largely due to the
> algorithms used, not the IR structure.  If this was the thing to fix, then
> we should fix other aspects of the design.  The thing that I’m particularly
> concerned about is array indexes: I think we need to preserve the ability
> to do simple dependence analysis and other array subscript indexing
> analyses in the middle end.  I think the sweet spot is to drop types from
> pointers, but keep them on GEPs.  Alternatively, finish the typeless
> pointer migration and then evaluate what to do with GEPs only when that
> completes.
>

Yeah. I've been thinking it would be interesting to be able to do more
backend-ish work in LLVM IR, in which case the _ability_ to do raw pointer
arithmetic without bitcast hell would be nice. But I think typeless
pointers already give us that anyway: adding a pointer and an offset is
just a GEP of i8 type.

> e) Constant Expressions are a disaster.  In addition to the problem
> identified, there are also many annoying cases to deal with, eg. When
> constexprs exist in phi nodes, trapping constexprs, etc.  In my opinion,
> the fix is to eliminate them entirely, in a few steps:
>
>     1) Introduce a new “RelocatableConstant” object which is *not* a
> mirror of all the IR operations in LLVM, but is instead designed to be used
> in global variables and allows the standard “globalpointer+offset” pattern
> that object files support, and we should add a new MachoRelocatableConstant
> class to represent the “(gv1-gv2+offset)” relocations macho supports.  The
> presence of this would make codegen and frontends easier to write, and get
> rid of all the fiddly pattern matching stuff.  I think we need to talk
> about whether “offset” is a byte offset, or whether it is a series of
> (constant integer) field indexes in a GEP like operation.  I would argue
> for the later to make inter procedural optimizations easier to write, but
> it is debatable.
>
>     2) Move the general constant folding API off of ConstantExpr to
> somewhere else, it never should have been there for reasons pointed out in
> the blog.
>
>     3) Eliminate ConstExpr: after #1, we don’t need a mirror of the LLVM
> IR in constant nodes.  Constant folding should be a failable operation and
> would return the primitive nodes like ConstantInt.  The asmparser / byte
> code parser could auto upgrade general unfolded constexprs to instructions
> when in a function and to [Macho]RelocatableConstant
>

Right. I'd like to see more of the learnings of MLIR make it into LLVM IR.
It's quite unfortunate that the introduction of MLIR caused a sort of split
in the community. I understand why it happened: MLIR is a radical departure
that would never have made it through review as a modification of LLVM IR.
However, now that MLIR exists and many of its ideas have been proven out,
it may be time to go back and put (some of?) them into LLVM IR as a step
towards re-unification.

There are many of those ideas to choose from. One that I'm partial to is
the ability to have instructions with multiple result values. This would
allow us to fix a number of low-key annoyances, like the integration of
inline assembly with convergence analysis for GPU backends. Inline assembly
can produce multiple results, and there's no reason why all of them should
be divergent or all of them should be uniform; it could be a mix, and yet
because the inline asm instruction returns a single (struct) value, we
can't cleanly express this.

Cheers,
Nicolai

> In any case, I’d love to see progress on any of these.  I’d personally
> love to see the typeless pointers land because we’re in an unfortunate
> in-between state, and we should close off partial transitions.
>
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210610/75ad1ff4/attachment-0001.html>