[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

Sat Feb 9 10:05:31 PST 2019

>>     On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>         I don't think I understand your pseudocode using
>>         llvm.experimental.vector.reduce.umax. All of the types you
>>         showed are scalar, but that intrinsic doesn't work on scalars
>>         so I'm having a hard time understanding what you're trying to
>>         do with it. llvm.experimental.vector.reduce.umax takes a
>>         vector input and returns a scalar result. Are you wanting to
>>         find if any of the additions overflowed or a mask of which
>>         addition overflowed?

Apologies for the confusion - let me try to clarify. Here is frontend
code that works now:

export fn entry() void {
    var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 };
    var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 };
    var x = a +% b;
}

This generates the following LLVM IR code:

define void @entry() #2 !dbg !41 {
Entry:
  %a = alloca <4 x i32>, align 16
  %b = alloca <4 x i32>, align 16
  %x = alloca <4 x i32>, align 16
  store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16,
!dbg !55
  call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45,
metadata !DIExpression()), !dbg !55
  store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16,
!dbg !56
  call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51,
metadata !DIExpression()), !dbg !56
  %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
  %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
  %2 = add <4 x i32> %0, %1, !dbg !59
  store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60
  call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53,
metadata !DIExpression()), !dbg !60
  ret void, !dbg !61
}

However I used the +% operator, which in Zig is wrapping addition. Now I
want to implement the + operator for vectors, which Zig defines to panic
if any of the elements overflowed. Here is how the IR could look for this:

define void @entry() #2 !dbg !41 {
Entry:
  %a = alloca <4 x i32>, align 16
  %b = alloca <4 x i32>, align 16
  %x = alloca <4 x i32>, align 16
  store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16,
!dbg !55
  store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16,
!dbg !56
  %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
  %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
  %2 = call { <4 x i32>, <4 x i1> } @llvm.sadd.with.overflow.i32(i32 %0,
i32 %1)
  %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56
  %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56
  %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4)
  br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56

OverflowFail:                                     ; preds = %Entry
  tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg !56
  unreachable, !dbg !56

OverflowOk:                                       ; preds = %Entry
  store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60
  ret void, !dbg !61
}

You can see that it depends on @llvm.sadd.with.overflow working on
vector types, and it relies on @llvm.experimental.vector.reduce.umax. I
will note that my strategy with sign extension and icmp would be a
semantically equivalent alternative to @llvm.sadd.with.overflow.

On 2/9/19 12:37 PM, Nikita Popov wrote:
> On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk
> <mailto:llvm-dev at redking.me.uk>> wrote:
>     Regarding the reduction functions - I think the integer intrinsics
>     at least are relatively stable and we can probably investigate
>     dropping the experimental tag before the next release (assuming
>     someone has the time to take on the work) - it'd be nice to have the
>     SLP vectorizer emit reduction intrinsics directly for these.
> 
> The vector reduction intrinsics still need quite a lot of work. Apart
> from SplitVecOp, all legalizations are currently missing. This is only
> noticeable on AArch64 right now, because all other targets expand vector
> reductions prior to codegen.

My follow-up question, then, is this:

What do you recommend, in terms of LLVM IR, in order to obtain the %5
value above?

Thanks for the help,
Andrew

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/5feb6cc8/attachment.sig>