<div><div dir="auto">I should mention the code I provided is equivalent to how the vector version of sadd with overflow is expanded in the backend by default. Having the vector intrinsic probably won’t improve the generated code over what you can achieve with IR. Unless the target has some native way of detecting overflow on vectors like the overflow flag on scalars.</div></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Feb 9, 2019 at 12:56 PM Andrew Kelley <<a href="mailto:andrew@ziglang.org">andrew@ziglang.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 2/9/19 2:05 PM, Craig Topper wrote:<br>

> Something like this should work I think.<br>

> <br>

> ; ModuleID = 'test.ll'<br>

> source_filename = "test.ll"<br>

> <br>

> define void @entry(<4 x i32>* %a, <4 x i32>* %b, <4 x i32>* %x) {<br>

> Entry:<br>

>   %tmp = load <4 x i32>, <4 x i32>* %a, align 16<br>

>   %tmp1 = load <4 x i32>, <4 x i32>* %b, align 16<br>

>   %tmp2 = add <4 x i32> %tmp, %tmp1<br>

>   %tmpsign = icmp slt <4 x i32> %tmp, zeroinitializer<br>

>   %tmp1sign = icmp slt <4 x i32> %tmp1, zeroinitializer<br>

>   %sumsign = icmp slt <4 x i32> %tmp2, zeroinitializer<br>

>   %signsequal = icmp eq <4 x i1> %tmpsign, %tmp1sign<br>

>   %summismatch = icmp ne <4 x i1> %sumsign, %tmpsign<br>

>   %overflow = and <4 x i1> %signsequal, %summismatch<br>

>   %tmp5 = bitcast <4 x i1> %overflow to i4<br>

>   %tmp6 = icmp ne i4 %tmp5, 0<br>

>   br i1 %tmp6, label %OverflowFail, label %OverflowOk<br>

> <br>

> OverflowFail:                                     ; preds = %Entry<br>

>   tail call fastcc void @panic()<br>

>   unreachable<br>

> <br>

> OverflowOk:                                       ; preds = %Entry<br>

>   store <4 x i32> %tmp2, <4 x i32>* %x, align 16<br>

>   ret void<br>

> }<br>

> <br>

> declare fastcc void @panic()<br>

<br>

<br>

Thanks! I was able to get it working with your hint:<br>

<br>

>   %tmp5 = bitcast <4 x i1> %overflow to i4<br>

<br>

(Thanks also to LebedevRI who pointed this out on IRC)<br>

<br>

Until LLVM 9 when the llvm.*.with.overflow.* intrinsics gain vector<br>

support, here's what I ended up with:<br>

<br>

  %a = alloca <4 x i32>, align 16<br>

  %b = alloca <4 x i32>, align 16<br>

  %x = alloca <4 x i32>, align 16<br>

  store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16,<br>

!dbg !55<br>

  store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16,<br>

!dbg !56<br>

  %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57<br>

  %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58<br>

  %2 = sext <4 x i32> %0 to <4 x i33>, !dbg !59<br>

  %3 = sext <4 x i32> %1 to <4 x i33>, !dbg !59<br>

  %4 = add <4 x i33> %2, %3, !dbg !59<br>

  %5 = trunc <4 x i33> %4 to <4 x i32>, !dbg !59<br>

  %6 = sext <4 x i32> %5 to <4 x i33>, !dbg !59<br>

  %7 = icmp ne <4 x i33> %4, %6, !dbg !59<br>

  %8 = bitcast <4 x i1> %7 to i4, !dbg !59<br>

  %9 = icmp ne i4 %8, 0, !dbg !59<br>

  br i1 %9, label %OverflowFail, label %OverflowOk, !dbg !59<br>

<br>

Idea being: extend and do the operation with more bits. Truncate to get<br>

the result. Re-extend the result and check if it is the same as the<br>

pre-truncated result.<br>

<br>

This works pretty well unless the vector integer size is as big or<br>

larger than the native vector register. Here's a quick performance test:<br>

<br>

<a href="https://gist.github.com/andrewrk/b9734f9c310d8b79ec7271e7c0df4023" rel="noreferrer" target="_blank">https://gist.github.com/andrewrk/b9734f9c310d8b79ec7271e7c0df4023</a><br>

<br>

Summary: safety-checked integer addition with no optimizations<br>

<br>

<4 x i32>:<br>

scalar = 893 MiB/s<br>

vector = 3.58 GiB/s<br>

<br>

<16 x i128>:<br>

scalar = 3.6 GiB/s<br>

vector = 2.5 GiB/s<br>

<br>

<br>

</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">~Craig</div>