[PATCH] Add Integer Saturation Intrinsics.

Wed Apr 1 23:36:04 PDT 2015

I understand the codegen possible optimizations.
But when we are talking about ssat intrinsic creation on IR level as one of  InstCombine pass,
we need strong target dependent cost model, because coupling of min-max sequence to ssat is not always profitable for X86.

>   (ssat (v8i32))
> into (the mid types get tricky, so this is best left to selection proper):
>   (v8i32 (PMOVSX (v8i16 (PACKSS (v8i32), undef))))
> Which might still be better than the full icmp+select sequence.

MIN+MAX is better than PMOVSX + PACKSS

-  Elena

-----Original Message-----
From: Ahmed Bougacha [mailto:ahmed.bougacha at gmail.com] 
Sent: Wednesday, April 01, 2015 17:51
To: Demikhovsky, Elena
Cc: reviews+D6976+public+4d709814889a9ede at reviews.llvm.org; weimingz at codeaurora.org; echristo at gmail.com; llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] Add Integer Saturation Intrinsics.

On Tue, Mar 31, 2015 at 11:23 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> Unlike ARM, X86 can saturate (signed and unsigned) from 16 to 8, from 32 to 16 with following truncation and these instructions are SIMD.

Right, that's what I meant by pack with saturation, and I fully intend to support these =)

> The ideal form would be something like <16 x i16> 
> @llvm.ssat.v16i32(<16 x i32> %val). And the sequence is Cmp Select Cmp 
> Select trunc
>
> I'm thinking how to define an intrinsic convenient for all targets..

With my patches, I have the intrinsics always return the same type, and for PACKSS/PACKUS I DAGCombine the straightforward:

   (concat_vectors (v8i16 (trunc (v8i32 (ssat (v8i32))))),
                   (v8i16 (trunc (v8i32 (ssat (v8i32))))))

into:

   (v16i16 (PACKSS (v8i32), (v8i32)))

But you can also do:

   (v8i16 (trunc (ssat (v8i32))))

into:

   (v16i16 (PACKSS (v8i32), undef))

And even:

   (ssat (v8i32))

into (the mid types get tricky, so this is best left to selection proper):

   (v8i32 (PMOVSX (v8i16 (PACKSS (v8i32), undef))))

Which might still be better than the full icmp+select sequence.

Anyway: from my IR-level testing it works fine, though I admit I don't think I've seen this happen in the test-suite, except perhaps some vectorized loop in paq8p (I'll have to check again with the fixed cost model).  Then again that's the same for saturation in general.

Does that sound good to you?

-Ahmed

>
> -  Elena
>
>
> -----Original Message-----
> From: Ahmed Bougacha [mailto:ahmed.bougacha at gmail.com]
> Sent: Monday, March 30, 2015 19:38
> To: Demikhovsky, Elena
> Cc: reviews+D6976+public+4d709814889a9ede at reviews.llvm.org; 
> weimingz at codeaurora.org; echristo at gmail.com; llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH] Add Integer Saturation Intrinsics.
>
> On Mon, Mar 30, 2015 at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
>> Hi,
>>
>> I'm interested in these intrinsics for X86.
>
> Great!  I have patches matching these intrinsics to generate add, sub, and pack with saturation.  Is that what you want, or is there something else I missed?
>
>> Do you plan to push them as target independent?
>
> Yes, whatever we decide (intrinsics or not), the solution will be target-independent (and I'm working on AArch64, ARM, and X86).
>
> -Ahmed
>
>> Thanks.
>>
>> -  Elena
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.