[LLVMdev] [RFC] Integer Saturation Intrinsics
Ahmed Bougacha
ahmed.bougacha at gmail.com
Wed Jan 14 16:16:06 PST 2015
On Thu, Jan 15, 2015 at 12:42 AM, Philip Reames
<listmail at philipreames.com> wrote:
> At a very high level, why do we need these intrinsics?
In short, to catch sequences you can't catch in the SelectionDAG.
> What is the use case? What are typical values for N?
Typically, you get this from (a little overlapping) compression, DSP,
or pixel-handling code.
Off the top of my head, this occurs in paq8p in the test-suite, as
well as a few other tests.
You'd have something like:
a = x + y;
if (a < -128)
a = -128;
if (a > 127)
a = 127;
> Have you looked at just generating the conditional IR and then pattern
> matching late? What's the performance trade off involved?
That's a valid concern. The original problem is, we can't catch this
kind of thing in the SelectionDAG, because we're limited by a single
basic block. I guess we could (and I gather that's the alternative
you're presenting?) canonicalize the control flow to the 2icmp+2select
sequence, but I wasn't sure that was "workable". Truth be told, I
didn't investigate this very thoroughly, as I didn't expect reluctance
on adding intrinsics! I'll look into it some more: avoid adding the
intrinsic, keep the codegen additions as is, match the pattern in CGP
instead of InstCombines.
> My default position is that we shouldn't add these unless there is a
> compelling use case. We *still* haven't gotten the *.with.overflow
> intrinsics piped through everywhere in the optimizer. I'm reluctant to keep
> adding new ones.
I get your point, and agree with the general idea, but I think
saturation is easier to reason about so wouldn't be as big a problem.
At least it was pretty fast to implement the usual suspects:
vectorizability, known bits, etc.. Do you have a specific part in
mind?
Anyway, thanks for the feedback!
-Ahmed
> Philip
>
>
> On 01/14/2015 02:08 PM, Ahmed Bougacha wrote:
>>
>> Hi all,
>>
>> The patches linked below introduce a new family of intrinsics, for
>> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
>> Quoting the added documentation:
>>
>> %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)
>>
>> is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself
>> implementable as the following IR:
>>
>> %min_sint_n = i32 ... ; the min. signed integer of bitwidth n,
>> -2^(n-1)
>> %max_sint_n = i32 ... ; the max. signed integer of bitwidth n,
>> 2^(n-1)-1
>> %0 = icmp slt i32 %x, %min_sint_n
>> %1 = select i1 %0, i32 %min_sint_n, i32 %x
>> %2 = icmp sgt i32 %1, %max_sint_n
>> %r = select i1 %2, i32 %max_sint_n, i32 %1
>>
>>
>> As a starting point, here are two patches:
>> - http://reviews.llvm.org/D6976 Add Integer Saturation Intrinsics.
>> - http://reviews.llvm.org/D6977 [CodeGen] Add legalization for
>> Integer Saturation Intrinsics.
>>
>> From there, we can generate several new instructions, more efficient
>> than their expanded counterpart. Locally, I have worked on:
>> - ARM: the SSAT/USAT instructions (scalar)
>> - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar
>> saturating arithmetic)
>> - X86: PACK SS/US (vector, saturate+truncate)
>> - X86: PADD/SUB S/US (vector, saturating arithmetic)
>>
>> Anyway, let's first agree on the intrinsics, so that further
>> development is done on trunk.
>>
>> Thanks!
>> -Ahmed
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
More information about the llvm-dev
mailing list