[PATCH] Add custom lowering for add/sub with overflow ARM

Renato Golin renato.golin at linaro.org
Thu May 8 23:33:27 PDT 2014


LGTM. Thanks!

Cheers,
Renato
On 9 May 2014 01:10, "Louis Gerbarg" <lgg at apple.com> wrote:

> Okay, I am pretty sure the issue is that because of the swapping back and
> forth between CMN and CMP in some cases I end up with the negated compare
> from ARM64, which is why I am using HS rather than LO. If you have a
> specific case where the generated code fails let me know, but based on
> examination I believe this is correct, and I have a lot of random testing
> that seems to agree.
>
> Thanks,
> Louis
>
> On May 8, 2014, at 12:18 PM, Louis Gerbarg <lgg at apple.com> wrote:
>
> Off the top of my head it is not obvious what I am doing differently from
> ARM64 (unlike the CMN cases where I am doing different tests and it makes
> sense they are distinct). Having said that I came up with the flags on
> paper, and then tested them on device with both hand selected edge cases
> and random data points by comparing them to the code generated by platform
> independent lowering of the intrinsics. I have 2^36 random test cases I ran
> on a device to make sure it was working.
>
> I also just ran similar tests on the existing 64 bit code vs 64 bit code
> without platform specific intrinsic lowering and it also works… I don’t see
> anything obvious that would explain why they both work (and neither do if I
> swap the condition codes). My random tests all went through LowerXALUO, not
> LowerSELECT, and ARM64 does some clever stuff there around Inverting codes
> that I am not… I’ll stare at it a bit see if I can understand what js going
> on.
>
> Louis
>
> On May 7, 2014, at 3:31 PM, Juergen Ributzka <juergen at apple.com> wrote:
>
> I think ARMCC::HS is the wrong condition code for USUBO. I stumbled over
> the same problem with ARM64.
> -Juergen
>
> On May 7, 2014, at 3:13 PM, Louis Gerbarg <lgg at apple.com> wrote:
>
> Okay, here is a newer revision of the patch. Changes:
>
> 1) I only registered for i32s. I think there were probably some issues
> with i64s and i32 is going to be the more common case on ARM.
> 2) I updated all the tests to use regexps and track variables.
> 3) I spent some time trying to track through and get it to go to
> ADDS/SUBS, but in the end that proved to be more work than I thought it
> would be due to the differences between ARM and ARM64 in how the CPSR is
> handled, and the ARM models CMPs directly (where as ARM64 converts them to
> SUBS), and the fact that ARM backend only has working plumbing for CMP, not
> CMN. This patch is substantially better behavior than we currently have,
> and I would rather get it in now than wait trying to eek out that last bit
> of optimization.
>
> Louis
>
> <0001-Add-custom-lowering-for-add-sub-with-overflow-intrin.patch>
>
> On May 6, 2014, at 1:46 PM, Louis Gerbarg <lgg at apple.com> wrote:
>
>
> On May 2, 2014, at 11:40 PM, Tim Northover <t.p.northover at gmail.com>
> wrote:
>
> Hi Louis,
>
> On 3 May 2014 01:43, Louis Gerbarg <lgg at apple.com> wrote:
>
> The attached patch provides custom lowering of overflow intrinsics for the
> ARM architecture.
>
>
> As Pete said, a very good idea!
>
> I think I'd prefer it if the tests also tracked data dependencies as
> well as just the instructions used though. If you swapped some
> operands around the results might be disastrous.
>
>       add     r2, r0, r1
>       mov     r1, #1
>       cmp     r2, r0
>       movvc   r1, #0
>
>
> Couldn't this be simplified to:
>
>       adds r2, r0, r1
>       mov r1, #1
>       movvc r1, #0
>
> ? This would probably involve adding ARMISD::ADDS and ARMISD::SUBS
> nodes (see ARM64 for reference, it already has them).
>
>
> Yes, this can be simplified further. I took another look at this morning.
> The issue that prevented me from doing it in the first place is that the
> arm backend handles the CPSR somewhat differently than ARM64. There are
> already somewhat analogous nodes (ARMISD::ADDC and ARMISD::SUBC),  but they
> decide whether or not to set flags based on the liveness of the CPSR, so I
> need to figure out how to get that wired up. I don’t think adding explicit
> ADDS/SUBS nodes would make that any simpler because I suspect they would
> have to deal with equivalent CPSR issues in the tablegen files.
>
> Louis
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140509/e8a5b51b/attachment.html>


More information about the llvm-commits mailing list