[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
Andrew Kelley via llvm-dev
llvm-dev at lists.llvm.org
Sat Feb 9 12:56:25 PST 2019
On 2/9/19 2:05 PM, Craig Topper wrote:
> Something like this should work I think.
>
> ; ModuleID = 'test.ll'
> source_filename = "test.ll"
>
> define void @entry(<4 x i32>* %a, <4 x i32>* %b, <4 x i32>* %x) {
> Entry:
> %tmp = load <4 x i32>, <4 x i32>* %a, align 16
> %tmp1 = load <4 x i32>, <4 x i32>* %b, align 16
> %tmp2 = add <4 x i32> %tmp, %tmp1
> %tmpsign = icmp slt <4 x i32> %tmp, zeroinitializer
> %tmp1sign = icmp slt <4 x i32> %tmp1, zeroinitializer
> %sumsign = icmp slt <4 x i32> %tmp2, zeroinitializer
> %signsequal = icmp eq <4 x i1> %tmpsign, %tmp1sign
> %summismatch = icmp ne <4 x i1> %sumsign, %tmpsign
> %overflow = and <4 x i1> %signsequal, %summismatch
> %tmp5 = bitcast <4 x i1> %overflow to i4
> %tmp6 = icmp ne i4 %tmp5, 0
> br i1 %tmp6, label %OverflowFail, label %OverflowOk
>
> OverflowFail: ; preds = %Entry
> tail call fastcc void @panic()
> unreachable
>
> OverflowOk: ; preds = %Entry
> store <4 x i32> %tmp2, <4 x i32>* %x, align 16
> ret void
> }
>
> declare fastcc void @panic()
Thanks! I was able to get it working with your hint:
> %tmp5 = bitcast <4 x i1> %overflow to i4
(Thanks also to LebedevRI who pointed this out on IRC)
Until LLVM 9 when the llvm.*.with.overflow.* intrinsics gain vector
support, here's what I ended up with:
%a = alloca <4 x i32>, align 16
%b = alloca <4 x i32>, align 16
%x = alloca <4 x i32>, align 16
store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16,
!dbg !55
store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16,
!dbg !56
%0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
%1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
%2 = sext <4 x i32> %0 to <4 x i33>, !dbg !59
%3 = sext <4 x i32> %1 to <4 x i33>, !dbg !59
%4 = add <4 x i33> %2, %3, !dbg !59
%5 = trunc <4 x i33> %4 to <4 x i32>, !dbg !59
%6 = sext <4 x i32> %5 to <4 x i33>, !dbg !59
%7 = icmp ne <4 x i33> %4, %6, !dbg !59
%8 = bitcast <4 x i1> %7 to i4, !dbg !59
%9 = icmp ne i4 %8, 0, !dbg !59
br i1 %9, label %OverflowFail, label %OverflowOk, !dbg !59
Idea being: extend and do the operation with more bits. Truncate to get
the result. Re-extend the result and check if it is the same as the
pre-truncated result.
This works pretty well unless the vector integer size is as big or
larger than the native vector register. Here's a quick performance test:
https://gist.github.com/andrewrk/b9734f9c310d8b79ec7271e7c0df4023
Summary: safety-checked integer addition with no optimizations
<4 x i32>:
scalar = 893 MiB/s
vector = 3.58 GiB/s
<16 x i128>:
scalar = 3.6 GiB/s
vector = 2.5 GiB/s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/ddeb0994/attachment.sig>
More information about the llvm-dev
mailing list