[PATCH] D11678: [CodeGen] Fixes absdiff intrinsic: LangRef doc/test case improvement and corresponding code change

Tue Aug 4 02:16:22 PDT 2015

Hi,

This all depends on how you define your intrinsic. It’s currently ambiguous - we have to choose what uabsdiff should do.

Given this expression:

call i8 @llvm.uabsdiff(i8 1, i8 130)

What should be the result?

If we expand, we get:

%1 = sub i8 1, i8 130   ; pure result is -129, but has to be truncated to fit into i8.

Neither unsigned nor signed comparisons will get the right result here. The result of the subtract after truncation is 127 in both signed and unsigned representations, which is greater than zero and is definitely not 129!

Instead, if we expand with range expansion:

%1 = zext i8 1 to i16 ; zext because we’re treating the inputs as unsigned
%2 = zext i8 130 to i16
%3 = sub i16 %1, %1 ; 0xff7f = -129
%4 = icmp sgt i16 %3, 0 ; false
%5 = sub i16 0, %3  ; 129
%6 = select i1 %4, i16 %3, %5 ; %5 = 129
%7 = trunc i16 %6 to i8 ; 129

So the end result is correct and in range, if we promote the intermediate calculations to a larger bit width.

So we need to decide: what should the behaviour of uabsdiff be? Should it behave as if there is larger intermediate precision or not?

I vote yes, because all the actual absdiff implementations I know of implement this behaviour.

Cheers,

James

On 4 Aug 2015, at 09:56, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com<mailto:Asghar-ahmad.Shahid at amd.com>> wrote:

Hi Mikhail,

AFAIU, I think we need to use “ugt” as the “modulo 2^n” result of the difference for unsigned overflow
will be wrapped to zero.

But I too would be happy to be corrected if I'm wrong:)

Regards,
Shahid

From: Mikhail Zolotukhin [mailto:mzolotukhin at apple.com]
Sent: Tuesday, August 04, 2015 3:02 AM
To: James Molloy
Cc: reviews+D11678+public+e92bec0f352bb617 at reviews.llvm.org<mailto:reviews+D11678+public+e92bec0f352bb617 at reviews.llvm.org>; Shahid, Asghar-ahmad; james.molloy at arm.com<mailto:james.molloy at arm.com>; hfinkel at anl.gov<mailto:hfinkel at anl.gov>; llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: Re: [PATCH] D11678: [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change

Hi James,

According to LLVM Language reference manual, arithmetic operations do perform truncation:
If the difference has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.

So, I think that if we use "uge" for unsigned, we're actually fine here. But I'd be happy to be corrected if I'm wrong here:)

Michael

On Aug 3, 2015, at 1:09 PM, James Molloy <james at jamesmolloy.co.uk<mailto:james at jamesmolloy.co.uk>> wrote:

Hi Michael,

I think the complexity comes from the subtraction having as its domain two unsigned integers- so it's range must be a larger signed integer.

Signed comparison for unsigned values is clearly wrong as you say, but I could contruct a testcase that shows incorrect behaviour with an unsigned comparison too. I think the only correct behaviour is to extend the inputs first and truncate the result.

But I've been wrong before :)

James
On Mon, 3 Aug 2015 at 21:05, Michael Zolotukhin <mzolotukhin at apple.com<mailto:mzolotukhin at apple.com>> wrote:
mzolotukhin added inline comments.

================
Comment at: docs/LangRef.rst:10387-10390
@@ -10386,6 +10386,6 @@

     %sub = sub nsw <4 x i32> %a, %b
-    %ispos = icmp sgt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>
+    %ispos = icmp sge <4 x i32> %sub, zeroinitializer
     %neg = sub nsw <4 x i32> zeroinitializer, %sub
     %1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg

----------------
ashahid wrote:
> mzolotukhin wrote:
> > What's the difference between `llvm.uabsdiff` and `llvm.sabsdiff` then?
> The difference is the presence of NSW flag in case of llvm.sabsdiff.
I still don't think it's correct. NSW is just a hint to optimizers, but it doesn't add any additional logic. It does assert that the expression won't overflow, but the operations we execute are still the same. That is, currently the only difference between signed and unsigned version is that for signed version we could get an undefined behavior in some cases. This is clearly incorrect, because we should get different results without undefined behavior in some cases (e.g. `<-1,-1,-1,-1>` and `<1,1,1,1>` - it should give `<254,254,254,254>` for `uabsdiff.v4i8` and `<2,2,2,2>` for `sabsdiff.v4i8`).

What really should be the difference, as far is I understand, is condition code in the comparison:
```
%ispos = icmp sge <4 x i32> %sub, zeroinitializer
```
As far as I understand, we should use `uge` for unsigned and `sge` for signed case.



Repository:
  rL LLVM

http://reviews.llvm.org/D11678




_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150804/c19c30b7/attachment.html>