[llvm-dev] Improving codegen for not-quite-constant-folding

Philip Reames via llvm-dev llvm-dev at lists.llvm.org
Wed Jul 7 10:26:09 PDT 2021


There's a bit of prior work in this area inspired by the java icmp and 
lcmp bytecode semantics which are fairly similar to C++ space ship 
operators.

See matchThreeWayIntCompare in InstCombine.  This recognizes the select 
form after all the control flow has been eliminated.  We relied on jump 
threading and simplify cfg to eliminate the control flow, so that 
doesn't do anything for the question you posed.

Your example also involves multiple element comparisons which 
complicates the matching significantly.  (I suspect you know this, just 
stating it for the record.)

On the partial PRE, fitting that into the existing logic in GVN wouldn't 
be terrible.  However, I really question the profitability of this 
transform in general.  It's effectively hoisting an instruction into 
path where it didn't previously execute.  Doing that without regards to 
relative frequencies can be quite painful.

You could also try to PRE the condition back into the predecessors.  We 
explicitly avoid that today due to profitability problems, but maybe you 
can find a sub-case which is profitable and enable it selectively?

Another observation is that we appear to not be canonicalizing the 
operands of the selects.  In the optimized IR, I see:

   %10 = icmp eq i64 %1, %3, !dbg !42
   %11 = icmp ult i64 %1, %3, !dbg !44
   %12 = select i1 %11, i8 -1, i8 1, !dbg !44
   %13 = select i1 %10, i8 0, i8 %12, !dbg !44
   br label %14, !dbg !44

14:                                               ; preds = %6, %9
   %15 = phi i8 [ %8, %6 ], [ %13, %9 ], !dbg !33
   %16 = icmp slt i8 %15, 0, !dbg !45

What's interesting here is that 1 and 0 are indistinguishable. Not sure 
what would happen if we rewrote the i8 1 to be i8 0, but maybe that 
unlocks some progress?  Actually, I think it does. Once that's done, 
anding the two conditions becomes more obviously profitable.  Maybe we 
could back propagate constant ranges and canonicalize constants in some 
way?  (Actually, now that I write that, why can't simplify demanded bits 
do exactly that?!)

Philip

On 6/1/21 7:56 PM, David Goldblatt via llvm-dev wrote:
> Hi all,
>
> Some Facebook colleagues and I are looking at poor codegen for the
> C++20 spaceship operator (i.e. operator <=>, which performs 3-way
> comparisons), when used to implement simple comparisons (e.g.
> operator<). A simple example is at
> https://gcc.godbolt.org/z/qhn6cf7nv, which desugars into something
> like https://gcc.godbolt.org/z/4z4e1jb6T. By comparison, a
> hand-written operator< would probably look like one of
> https://gcc.godbolt.org/z/zn1en8fGW, or
> https://gcc.godbolt.org/z/6oY5exTa8, or
> https://gcc.godbolt.org/z/j6734zWYz.
>
> A reduced example with the same issue is here:
> https://gcc.godbolt.org/z/6E1KxE83x. Broadly, a tree of select/phi
> nodes with constants at the leaves isn't subject to constant folding,
> even though the reduced example could be rewritten as something like:
> https://gcc.godbolt.org/z/356qr6rKK (this saves a multiply, and in
> "real" examples enables further simplifications).
>
> GCC's GVN/PRE pass does a better job here with -O2 -ftree-partial-pre
> ("partial pre" is not a typo; it's "partial partial redundancy
> elimination", in addition to regular partial redundancy elimination),
> or -O3 (which turns on -ftree-partial-pre). -ftree-partial-pre tracks
> partial anticipation of expressions (those that are evaluated on some
> path to exit) on a per-BB level, and inserts the necessary phis to
> ensure that a partially anticipated expression is available in a
> temporary if temporaries with the same value number are available in
> all a BB's predecessors. It does constant folding after phi
> translation before checking for availability, and regards all
> constants as always available. After phi-translating the result into
> each branch of the ternary operators, we get constants, so the
> optimization kicks in and we can replace each leaf constant with the
> result of multiplying it by 123.
>
> A simpler approach that could catch cases like this would be to find
> instances of trees in which the leaves are constants, and the non-leaf
> nodes are phis or selects. If the root of such a tree is the input
> into an operation we could otherwise constant fold (if that root were
> a constant), we constant-fold each leaf of the tree instead. This
> would miss some optimization opportunities that the GCC approach would
> catch, however.
>
> The former feels tricky to implement on top of GVN or NewGVN (and,
> it's unclear that it's always a net improvement). The latter seems
> reasonable, although it would require a one-off pass targeted at this
> relatively narrow case, which feels heavyweight. My inclination is to
> attempt the simpler approach, but wanted a sanity check that this
> seems reasonable and is not better done elsewhere (or with a different
> approach entirely).
>
> Best,
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list