[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Mon Nov 30 10:07:45 PST 2015

----- Original Message -----
> From: "Krzysztof Parzyszek via llvm-dev" <llvm-dev at lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Monday, November 30, 2015 9:30:40 AM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection
> 
> On 11/19/2015 6:58 PM, Eric Christopher via llvm-dev wrote:
> > It'll be interesting to see how this is going to be developed and
> > how to
> > keep the target independentness of the code generator with this new
> > scheme. I.e. this is basically turning (in my mind) into "every
> > backend
> > for themselves" with very little target independent unification.
> 
> I really don't mind the "every backend for themselves" approach.  The
> instruction selection pass is about as target-specific as a common
> pass
> can get, and the more work the generic code tries to do, the more
> potential it has to be inflexible.  This is not to say that a generic
> code will necessarily be bad, but that a target-centric approach has
> a
> better chance of working out better, even if it means that more work
> is
> required to implement instruction selection for a new target.
> 
> As someone mentioned in another email, the canonicalization currently
> done in the DAG combiner has a tendency to interfere with what
> individual targets may prefer.  One example of it that I remember for
> Hexagon was that the LLVM IR had a combination of shifts left and
> right
> to extract a bitfield from a longer integer.  Hexagon has an
> instruction
> to do that and it's quite simple to map the shifts into that
> instruction.  The combiner, hovewer, would fold the shifts leaving
> only
> the minimum sequence of operations necessary to get the bitfield.
>  This
> seems to be better from the generic point of view, but it makes it
> practically impossible for us to match it to the "extract"
> instruction,
> and in practice the code turns out to be worse.  This is the only
> reason
> why we have the HexagonGenExtract pass---we detect the patterns in
> the
> LLVM IR and generate "extract" intrinsics before the combiner mangles
> them up into unrecognizable forms.

I sympathize with this, but this is not uniquely a backend consideration. Even though we generally canonicalize these patterns at the IR level is a roughly consistent way, if the backend is not really robust in its matching logic, you'll miss a lot just from input IR differences. There's ~1K lines of code in PPCISelDAGToDAG.cpp (look for BitPermutationSelector) to match these in a robust way, and I don't see any good way around that.

> The same goes for replacing ADD
> with
> OR when the bits in the operands do not overlap.

This is true for other targets too, at least when the ADD is part of an addressing expression. PowerPC, for example, has code in various places to recognize ORs, in combination with some known-bits information, as surrogates for ADDs. It seems like, globally, we could do a better job here.

>  We have code that
> specifically undoes that, since for us, if the original code had an
> ADD,
> it is pretty much always better if it remains an ADD.
> 
> There were cases in the past when we had to disable parts of
> CodeGenPrepare, or else it would happily promote i32 into i64 where
> it
> wasn't strictly necessary.  I64 is a legal type on Hexagon, but it
> uses
> pairs of registers which, in practical terms, means that our register
> set is cut by half when 64-bit values are used.

This is a common problem, but is not unique to CGP. Parts of the mid-level optimizer (e.g. IndVarSimplify) also do integer promotion in inopportune way for some targets. Also, CGP has a lot of target hooks to turn off things like this, and this should certainly be optional (in practice, this widening is sometimes information destroying, and thus, not always reversible).

> 
> On the other hand, having a relatively simple, generic IR makes it
> easier to simplify code that is no longer subjected to the LLVM IR's
> constraints (e.g. getelementptr expressed as +/*, etc.).  Hexagon has
> a
> lot of very specific complex/compound instructions and a given code
> can
> be written in many different ways.  This makes it harder to optimize
> code after the specific instructions have been selected.  For
> example, a
> pass that would try to simplify arithmetic code would need to deal
> with
> the dozens of variants of add/multiplication instructions, instead of
> simply looking at some generic GADD/GMPY.

I have mixed feelings about this. I agree that sometimes we do too much, and sometimes we make transformation where we should be providing better analysis instead, but I think there is still a lot of value in the common backend optimizations.

In this context, it is worth thinking about the various reasons why these opportunities exist in the first place (not a complete list):

 1. The process of lowering GEPs (and some other relatively-higher-level IR constructs) into instructions that represent explicitly the underlying computations being performed (accounting for target capabilities) expose generic peephole/CSE opportunities.

 2. The process of introducing target-specific nodes to represent partial behaviors introduces generic peephole/CSE opportunities. What I mean by this is, for example, when a floating-point division or sqrt is lowered to a target-specific reciprocal estimate function plus some Newton iterations, those Newton iterations are generic floating-point adds, multiplies, etc. that can be further optimized.

 3. The process of type/operation legalization, especially when operations are split, promoted and/or turned into procedures involving stack loads and stores, present opportunities not apparent at the IR level.

 4. Some generic optimizations, such as store merging, need more-detailed cost information than is available through TTI, and so are done in the backend.

In short, within our current framework, there are many reasons why we have DAGCombine and related code, and I think that while moving specific aspects into the targets might be good overall, leaving all of that to each target is too much. The fact that you can get a reasonable set of backend optimizations for normalish targets is a strong point of LLVM.

Thanks again,
Hal

> 
> -Krzysztof
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory