[llvm-dev] [AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets
llvm-dev at lists.llvm.org
Tue Oct 8 07:43:18 PDT 2019
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Joan Lluch
> via llvm-dev
> Sent: Monday, October 07, 2019 6:22 PM
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] [AVR] [MSP430] Code gen improvements for 8 bit and 16
> bit targets
> Hi All,
> While implementing a custom 16 bit target for academical and demonstration
> purposes, I unexpectedly found that LLVM was not really ready for 8 bit
> and 16 bit targets. Let me expose why.
> Target backends can be divided into two major categories, with essentially
> nothing in between:
> Type 1: The big 32 or 64 bit targets. Heavily pipelined with expensive
> branches, running at clock frequencies up to the GHZ range. Aimed at
> workstations, computers or smartphones. For example PowerPC, x86 and ARM.
> Type 2: The 8 or 16 bit targets. Non-pipelined processors, running at
> frequencies on the MHz range, generally fast access to memory, aimed at
> the embedded marked or low consumption applications (they are virtually
> everywhere). LLVM currently implements an experimental AVR target and the
> LLVM does a great for Type 1 targets, but it can be improved for Type 2
> The essential target feature that makes one way of code generation better
> for either type 1 or type 2 targets, is pipelining. For type 1 we want
> branching to be avoided for as much as possible. Turning branching code
> into sequential instructions with the execution of speculative code is
> advantageous. These targets have instruction sets that help with that
> goal, in particular cheap ‘shifts’ and ‘cmove' type instructions.
> Type 2 targets, on the contrary, have cheap branching. Their instruction
> set is not particularly designed to assist branching avoidance because
> that’s not required. In fact, branching on these targets is often
> desirable, as opposed to transforms creating expensive speculative
> execution. ‘Shifts’ are only one-single-bit, and conditional execution
> instructions other than branches are not available.
> The current situation is that some LLVM target-independent optimisations
> are not really that ‘independent' when we bring type 2 targets into the
> mix. Unfortunately, LLVM was apparently designed with type 1 targets in
> mind alone, which causes degraded code for type 2 targets. In relation to
> this, I posted a couple of bug reports that show some of these issues:
> The first bug is already fixed by somebody who also suggested me to raise
> this subject on the llvm-dev mailing list, which I’m doing now.
> Incidentally, most code degradations happen on the DAGCombine code. It’s a
> bug because LLVM may create transforms into instructions that are not
> Legal for some targets. Such transforms are detrimental on those targets.
> This bug won't show for most targets, but it is nonetheless particularly
> affecting targets with no native shifts support. The bug consists on the
> transformation of already relatively cheap code to expensive one. The fix
> prevents that.
> Still, although the above DAGCombine code gets fixed, the poor code
> generation issue will REMAIN. In fact, the same kind of transformations
> are performed earlier as part of the IR optimisations, in the InstCombine
> pass. The result is that the IR /already/ incorporates the undesirable
> transformations for type 2 targets, which DAGCombine can't do anything
> At this point, reverse pattern matching looks as the obvious solution, but
> I think it’s not the right one because that would need to be implemented
> on every single current or future (type 2) target. It is also difficult to
> get rid of undesired transforms when they carry complexity, or are the
> result or consecutive combinations. Delegating the whole solution to only
> reverse pattern matching code, will just perpetuate the overall problem,
> which will continue affecting future target developments. Some reverse
> pattern matching is acceptable and desirable to deal with very specific
> target features, but not as a global solution to this problem.
> On a previous email, a statement was posted that in recent years attempts
> have been made to remove code from InstCombine and port it to DAGCombiner.
> I agree that this is a good thing to do, but it was reportedly difficult
> and associated with potential problems or unanticipated regressions. I
> understand those concerns and I acknowledge the involved work as
> challenging. However, in order to solve the presented problem, some work
> is still required in InstCombine.
> Therefore, I wondered if something in between could still be done, so this
> is my proposal: There are already many command line compiler options that
> modify IR output in several ways. Some options are even target dependent,
> and some targets even explicitly set them (In RenderTargetOptions). The
> InstCombine code, has itself its own small set of options, for example
> "instcombine-maxarray-size” or "instcombine-code-sinking”. Command line
> compiler options produce functionally equivalent IR output, while
> respecting stablished canonicalizations. In all cases, the output is just
> valid IR code in a proper form that depends on the selected options. As an
> example -O0 produces a very different output than -O3, or -Os, all of them
> are valid as the input to any target backend. My suggestion would be to
> incorporate a compiler option acting on the InstCombine pass. The option
> would improve IR code aimed at Type 2 targets. Of course, this option
> would not be enabled by default so the IR output would remain exactly as
> it is today if not explicitly enabled.
An option is certainly one way to get this effect; another would be to
add some sort of target-specific query, which would drive the same choices
in the IR transforms. TargetTransformInfo appears to be full of these
sorts of queries.
> What this option would need to do in practice is really easy and
> straightforward. Just bypassing (avoiding) certain transformations that
> might be considered harmful for targets benefiting from it. I performed
> some simple tests, specially directed at the InstCombineSelect
> transformations, and I found them to work great and generating greatly
> improved code for both the MSP430 and AVR targets.
> Now, I am aware that this proposal might come a bit unexpected and even
> regarded as inelegant or undesirable, but maybe after some careful
> balancing of pros and cons, it is just what we need to do, if we really
> care about LLVM as a viable platform for 8 and 16 bit targets. As stated
> earlier, It’s easy to implement, it’s just an optional compiler setting
> not affecting major targets at all, and the future extend of it can be
> gradually defined or agreed upon as it is put into operation. Any views
> would be appreciated.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev