[PATCH] Add custom lowering for add/sub with overflow ARM

Louis Gerbarg lgg at apple.com
Thu May 8 15:19:08 PDT 2014


Okay, I am pretty sure the issue is that because of the swapping back and forth between CMN and CMP in some cases I end up with the negated compare from ARM64, which is why I am using HS rather than LO. If you have a specific case where the generated code fails let me know, but based on examination I believe this is correct, and I have a lot of random testing that seems to agree.

Thanks,
Louis

On May 8, 2014, at 12:18 PM, Louis Gerbarg <lgg at apple.com> wrote:

> Off the top of my head it is not obvious what I am doing differently from ARM64 (unlike the CMN cases where I am doing different tests and it makes sense they are distinct). Having said that I came up with the flags on paper, and then tested them on device with both hand selected edge cases and random data points by comparing them to the code generated by platform independent lowering of the intrinsics. I have 2^36 random test cases I ran on a device to make sure it was working.
> 
> I also just ran similar tests on the existing 64 bit code vs 64 bit code without platform specific intrinsic lowering and it also works… I don’t see anything obvious that would explain why they both work (and neither do if I swap the condition codes). My random tests all went through LowerXALUO, not LowerSELECT, and ARM64 does some clever stuff there around Inverting codes that I am not… I’ll stare at it a bit see if I can understand what js going on.
> 
> Louis
> 
> On May 7, 2014, at 3:31 PM, Juergen Ributzka <juergen at apple.com> wrote:
> 
>> I think ARMCC::HS is the wrong condition code for USUBO. I stumbled over the same problem with ARM64.
>> -Juergen
>> 
>> On May 7, 2014, at 3:13 PM, Louis Gerbarg <lgg at apple.com> wrote:
>> 
>>> Okay, here is a newer revision of the patch. Changes:
>>> 
>>> 1) I only registered for i32s. I think there were probably some issues with i64s and i32 is going to be the more common case on ARM.
>>> 2) I updated all the tests to use regexps and track variables.
>>> 3) I spent some time trying to track through and get it to go to ADDS/SUBS, but in the end that proved to be more work than I thought it would be due to the differences between ARM and ARM64 in how the CPSR is handled, and the ARM models CMPs directly (where as ARM64 converts them to SUBS), and the fact that ARM backend only has working plumbing for CMP, not CMN. This patch is substantially better behavior than we currently have, and I would rather get it in now than wait trying to eek out that last bit of optimization.
>>> 
>>> Louis
>>> 
>>> <0001-Add-custom-lowering-for-add-sub-with-overflow-intrin.patch>
>>> 
>>> On May 6, 2014, at 1:46 PM, Louis Gerbarg <lgg at apple.com> wrote:
>>> 
>>>> 
>>>> On May 2, 2014, at 11:40 PM, Tim Northover <t.p.northover at gmail.com> wrote:
>>>> 
>>>>> Hi Louis,
>>>>> 
>>>>> On 3 May 2014 01:43, Louis Gerbarg <lgg at apple.com> wrote:
>>>>>> The attached patch provides custom lowering of overflow intrinsics for the ARM architecture.
>>>>> 
>>>>> As Pete said, a very good idea!
>>>>> 
>>>>> I think I'd prefer it if the tests also tracked data dependencies as
>>>>> well as just the instructions used though. If you swapped some
>>>>> operands around the results might be disastrous.
>>>>> 
>>>>>>       add     r2, r0, r1
>>>>>>       mov     r1, #1
>>>>>>       cmp     r2, r0
>>>>>>       movvc   r1, #0
>>>>> 
>>>>> Couldn't this be simplified to:
>>>>> 
>>>>>       adds r2, r0, r1
>>>>>       mov r1, #1
>>>>>       movvc r1, #0
>>>>> 
>>>>> ? This would probably involve adding ARMISD::ADDS and ARMISD::SUBS
>>>>> nodes (see ARM64 for reference, it already has them).
>>>> 
>>>> Yes, this can be simplified further. I took another look at this morning. The issue that prevented me from doing it in the first place is that the arm backend handles the CPSR somewhat differently than ARM64. There are already somewhat analogous nodes (ARMISD::ADDC and ARMISD::SUBC),  but they decide whether or not to set flags based on the liveness of the CPSR, so I need to figure out how to get that wired up. I don’t think adding explicit ADDS/SUBS nodes would make that any simpler because I suspect they would have to deal with equivalent CPSR issues in the tablegen files.
>>>> 
>>>> Louis
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140508/122c729f/attachment.html>


More information about the llvm-commits mailing list