[llvm-dev] [AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

Wed Nov 13 15:05:49 PST 2019

As before, I'm not convinced that we want to allow target-based
enable/disable in instcombine for performance. That undermines having a
target-independent canonical form in the 1st place.

It's not clear to me what the remaining motivating cases look like. If you
could post those here or as bugs, I think you'd have a better chance of
finding an answer.

Let's take a minimal example starting in C and compiling for MSP430 since
that's what we have used as a public approximation of your target:

short isNotNegativeUsingBigShift(short x) {
  return (unsigned short)(~x) >> 15;
}

short isNotNegativeUsingCmp(short x) {
  return x > -1;
}

Currently, we will canonicalize these to the shift form (but you could
argue that is backwards).
Alive proof for logical equivalence:
https://rise4fun.com/Alive/uGH

If we disable the instcombine for this, we would have IR like this:

define signext i16 @isNotNegativeUsingShift(i16 signext %x) {
  %signbit = lshr i16 %x, 15
  %r = xor i16 %signbit, 1
  ret i16 %r
}

define signext i16 @isNotNegativeUsingCmp(i16 signext %x) {
  %cmp = icmp sgt i16 %x, -1
  %r = zext i1 %cmp to i16
  ret i16 %r
}

And compile that for MSP430:
$ ./llc -o - -mtriple=msp430 shift.ll
isNotNegativeUsingShift:                ; @isNotNegativeUsingShift
; %bb.0:
inv r12
swpb r12
mov.b r12, r12
clrc
rrc r12
rra r12
rra r12
rra r12
rra r12
rra r12
rra r12
ret

isNotNegativeUsingCmp:                  ; @isNotNegativeUsingCmp
; %bb.0:
mov r12, r13
mov #1, r12
tst r13
jge .LBB1_2
; %bb.1:
clr r12
.LBB1_2:
ret

How do you intend to optimize code that is written in the 1st form? Or is
that not allowed in some way?

On Wed, Nov 13, 2019 at 5:52 AM Joan Lluch via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Roman,
>
> Thanks for your input.
>
> The subject of reverse transformations was discussed before (it’s even
> mentioned on my reference message below) and I think there was a general
> agreement that it’s best to avoid reversals if the issue can be dealt with
> in a better way from the origins. I also understood that there was a
> general support about /moving/ as much transformations as possible from
> InstCombine to DAGCombine, although this is a major goal and not the
> subject of this.
>
> This proposal does not aim to remove all shifts, this is just not possible
> or even desirable. All targets have shifts, however large amount shifts can
> be particularly expensive for some targets, as they require a large
> sequence of instructions to complete them.
>
> We only want to prevent NEW shifts that are emitted as a consequence of
> transformations. LLVM tends to act too easily at creating new shifts in
> circumstances where they are not desirable for some targets. We just want
> to improve on this.
>
> Finally, this does not aim at all to create a different canonical
> representation. This was also mentioned on the reference message.
>
> I understood from the very beginning that this proposal could be
> controversial, and I still think that the ultimate solution would be to
> move a lot of InstCombine into DAGCombine. However, the latter is a major
> goal with strong impacts on all targets that would require really strong
> support and hard work from many community members. I’m advocating for
> something in the middle that would solve the problem for the affected
> targets with ZERO impact for the non-affected ones.
>
> I hope this helps to clarify it.
>
> Thanks again,
>
> John
>
> On 13 Nov 2019, at 10:39, Roman Lebedev <lebedev.ri at gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 12:26 PM Joan Lluch via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> Hi All,
>
> In relation to the subject of this message I got my first round of patches
> successfully reviewed and committed. As a matter of reference, they are the
> following:
>
> https://reviews.llvm.org/D69116
> https://reviews.llvm.org/D69120
> https://reviews.llvm.org/D69326
> https://reviews.llvm.org/D70042
>
> They provided hooks in TargetLowering and DAGCombine that enable
> interested targets to implement a filter for expensive shift operations.
> The patches work by preventing certain transformations that would result in
> expensive code for these targets.
>
> I want to express my gratitude to the LLVM community and particularly to
> members @spatel and @asl who have directly followed, helped with, and
> reviewed these patches.
>
>
> This is half of what’s required to get the full benefits. As I exposed
> before, in order to get this fully functional, we need to do some work on
> InstCombine. This is because some of the transformations that we want to
> avoid, are created earlier in InstCombine, thus deactivating the patches
> above.
>
> My general proposal when I started this (quoted below for reference), was
> to implement a command line option that would act on InstCombine by
> bypassing (preventing) certain transformations. I still think that this is
> the easier and safer way to obtain the desired goals, but I want to subject
> that to the consideration of the community again to make sure I am on the
> right track.
>
> My current concrete proposal is to add a command line option (boolean)
> that I would name “enable-shift-relaxation” or just  “relax-shifts”. This
> option would act in several places in InstCombineCasts and in
> InstCombineSelect with the described effects.
>
> I'm not really sold on this part, for the reasons previously discussed.
>
> This is only going to avoid creating such shifts, in passes that will
> be adjusted.
> This will not completely ban such shifts, meaning they still can exist.
> Which means this will only partially prevent 'degrading' existing IR.
> What about the ones that were already present in the original input
> (from C code, e.g.)?
>
> I think you just want to add an inverse set of DAGCombine transforms,
> also guarded with that target hook you added. That way there's no chance
> to still end up with unfavorable shifts on your target, and no middle-end
> impact from having more than one canonical representation.
>
> I also need to ask about the best way to present tests cases for that. I
> learned how to create test files for codegen transforms (IR to Assembly),
> but now I will be working on the “target Independent” side. For my internal
> work, I have manually been testing C-code to IR generation, but I do not
> know how to create proper test cases for the llvm project. Any help on this
> would be appreciated.
>
> Thanks in advance
>
> John
>
> Roman
>
> On 8 Oct 2019, at 00:22, Joan Lluch <joan.lluch at icloud.com> wrote:
>
> Hi All,
>
> While implementing a custom 16 bit target for academical and demonstration
> purposes, I unexpectedly found that LLVM was not really ready for 8 bit and
> 16 bit targets. Let me expose why.
>
> Target backends can be divided into two major categories, with essentially
> nothing in between:
>
> Type 1: The big 32 or 64 bit targets. Heavily pipelined with expensive
> branches, running at clock frequencies up to the GHZ range. Aimed at
> workstations, computers or smartphones. For example PowerPC, x86 and ARM.
>
> Type 2: The 8 or 16 bit targets. Non-pipelined processors, running at
> frequencies on the MHz range, generally fast access to memory, aimed at the
> embedded marked or low consumption applications (they are virtually
> everywhere). LLVM currently implements an experimental AVR target and the
> MSP430.
>
> LLVM does a great for Type 1 targets, but it can be improved for Type 2
> targets.
>
> The essential target feature that makes one way of code generation better
> for either type 1 or type 2 targets, is pipelining. For type 1 we want
> branching to be avoided for as much as possible. Turning branching code
> into sequential instructions with the execution of speculative code is
> advantageous. These targets have instruction sets that help with that goal,
> in particular cheap ‘shifts’ and ‘cmove' type instructions.
>
> Type 2 targets, on the contrary, have cheap branching. Their instruction
> set is not particularly designed to assist branching avoidance because
> that’s not required. In fact, branching on these targets is often
> desirable, as opposed to transforms creating expensive speculative
> execution. ‘Shifts’ are only one-single-bit, and conditional execution
> instructions other than branches are not available.
>
> The current situation is that some LLVM target-independent optimisations
> are not really that ‘independent' when we bring type 2 targets into the
> mix. Unfortunately, LLVM was apparently designed with type 1 targets in
> mind alone, which causes degraded code for type 2 targets. In relation to
> this, I posted a couple of bug reports that show some of these issues:
>
> https://bugs.llvm.org/show_bug.cgi?id=43542
> https://bugs.llvm.org/show_bug.cgi?id=43559
>
> The first bug is already fixed by somebody who also suggested me to raise
> this subject on the llvm-dev mailing list, which I’m doing now.
>
> Incidentally, most code degradations happen on the DAGCombine code. It’s a
> bug because LLVM may create transforms into instructions that are not Legal
> for some targets. Such transforms are detrimental on those targets. This
> bug won't show for most targets, but it is nonetheless particularly
> affecting targets with no native shifts support. The bug consists on the
> transformation of already relatively cheap code to expensive one. The fix
> prevents that.
>
> Still, although the above DAGCombine code gets fixed, the poor code
> generation issue will REMAIN. In fact, the same kind of transformations are
> performed earlier as part of the IR optimisations, in the InstCombine pass.
> The result is that the IR /already/ incorporates the undesirable
> transformations for type 2 targets, which DAGCombine can't do anything
> about.
>
> At this point, reverse pattern matching looks as the obvious solution, but
> I think it’s not the right one because that would need to be implemented on
> every single current or future (type 2) target. It is also difficult to get
> rid of undesired transforms when they carry complexity, or are the result
> or consecutive combinations. Delegating the whole solution to only reverse
> pattern matching code, will just perpetuate the overall problem, which will
> continue affecting future target developments. Some reverse pattern
> matching is acceptable and desirable to deal with very specific target
> features, but not as a global solution to this problem.
>
> On a previous email, a statement was posted that in recent years attempts
> have been made to remove code from InstCombine and port it to DAGCombiner.
> I agree that this is a good thing to do, but it was reportedly difficult
> and associated with potential problems or unanticipated regressions. I
> understand those concerns and I acknowledge the involved work as
> challenging. However, in order to solve the presented problem, some work is
> still required in InstCombine.
>
> Therefore, I wondered if something in between could still be done, so this
> is my proposal: There are already many command line compiler options that
> modify IR output in several ways. Some options are even target dependent,
> and some targets even explicitly set them (In RenderTargetOptions). The
> InstCombine code, has itself its own small set of options, for example
> "instcombine-maxarray-size” or "instcombine-code-sinking”. Command line
> compiler options produce functionally equivalent IR output, while
> respecting stablished canonicalizations. In all cases, the output is just
> valid IR code in a proper form that depends on the selected options. As an
> example -O0 produces a very different output than -O3, or -Os, all of them
> are valid as the input to any target backend. My suggestion would be to
> incorporate a compiler option acting on the InstCombine pass. The option
> would improve IR code aimed at Type 2 targets. Of course, this option would
> not be enabled by default so the IR output would remain exactly as it is
> today if not explicitly enabled.
>
> What this option would need to do in practice is really easy and
> straightforward. Just bypassing (avoiding) certain transformations that
> might be considered harmful for targets benefiting from it. I performed
> some simple tests, specially directed at the InstCombineSelect
> transformations, and I found them to work great and generating greatly
> improved code for both the MSP430 and AVR targets.
>
> Now, I am aware that this proposal might come a bit unexpected and even
> regarded as inelegant or undesirable, but maybe after some careful
> balancing of pros and cons, it is just what we need to do, if we really
> care about LLVM as a viable platform for 8 and 16 bit targets. As stated
> earlier, It’s easy to implement, it’s just an optional compiler setting not
> affecting major targets at all, and the future extend of it can be
> gradually defined or agreed upon as it is put into operation.  Any views
> would be appreciated.
>
> John.
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191113/7f51eab2/attachment-0001.html>