[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
Craig Topper via llvm-dev
llvm-dev at lists.llvm.org
Sat Feb 9 11:05:36 PST 2019
Something like this should work I think.
; ModuleID = 'test.ll'
source_filename = "test.ll"
define void @entry(<4 x i32>* %a, <4 x i32>* %b, <4 x i32>* %x) {
Entry:
%tmp = load <4 x i32>, <4 x i32>* %a, align 16
%tmp1 = load <4 x i32>, <4 x i32>* %b, align 16
%tmp2 = add <4 x i32> %tmp, %tmp1
%tmpsign = icmp slt <4 x i32> %tmp, zeroinitializer
%tmp1sign = icmp slt <4 x i32> %tmp1, zeroinitializer
%sumsign = icmp slt <4 x i32> %tmp2, zeroinitializer
%signsequal = icmp eq <4 x i1> %tmpsign, %tmp1sign
%summismatch = icmp ne <4 x i1> %sumsign, %tmpsign
%overflow = and <4 x i1> %signsequal, %summismatch
%tmp5 = bitcast <4 x i1> %overflow to i4
%tmp6 = icmp ne i4 %tmp5, 0
br i1 %tmp6, label %OverflowFail, label %OverflowOk
OverflowFail: ; preds = %Entry
tail call fastcc void @panic()
unreachable
OverflowOk: ; preds = %Entry
store <4 x i32> %tmp2, <4 x i32>* %x, align 16
ret void
}
declare fastcc void @panic()
~Craig
On Sat, Feb 9, 2019 at 10:05 AM Andrew Kelley <andrew at ziglang.org> wrote:
> >> On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev
> >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >>
> >> I don't think I understand your pseudocode using
> >> llvm.experimental.vector.reduce.umax. All of the types you
> >> showed are scalar, but that intrinsic doesn't work on scalars
> >> so I'm having a hard time understanding what you're trying to
> >> do with it. llvm.experimental.vector.reduce.umax takes a
> >> vector input and returns a scalar result. Are you wanting to
> >> find if any of the additions overflowed or a mask of which
> >> addition overflowed?
>
> Apologies for the confusion - let me try to clarify. Here is frontend
> code that works now:
>
> export fn entry() void {
> var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 };
> var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 };
> var x = a +% b;
> }
>
> This generates the following LLVM IR code:
>
> define void @entry() #2 !dbg !41 {
> Entry:
> %a = alloca <4 x i32>, align 16
> %b = alloca <4 x i32>, align 16
> %x = alloca <4 x i32>, align 16
> store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16,
> !dbg !55
> call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45,
> metadata !DIExpression()), !dbg !55
> store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16,
> !dbg !56
> call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51,
> metadata !DIExpression()), !dbg !56
> %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
> %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
> %2 = add <4 x i32> %0, %1, !dbg !59
> store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60
> call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53,
> metadata !DIExpression()), !dbg !60
> ret void, !dbg !61
> }
>
> However I used the +% operator, which in Zig is wrapping addition. Now I
> want to implement the + operator for vectors, which Zig defines to panic
> if any of the elements overflowed. Here is how the IR could look for this:
>
> define void @entry() #2 !dbg !41 {
> Entry:
> %a = alloca <4 x i32>, align 16
> %b = alloca <4 x i32>, align 16
> %x = alloca <4 x i32>, align 16
> store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16,
> !dbg !55
> store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16,
> !dbg !56
> %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
> %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
> %2 = call { <4 x i32>, <4 x i1> } @llvm.sadd.with.overflow.i32(i32 %0,
> i32 %1)
> %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56
> %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56
> %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4)
> br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56
>
> OverflowFail: ; preds = %Entry
> tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg !56
> unreachable, !dbg !56
>
> OverflowOk: ; preds = %Entry
> store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60
> ret void, !dbg !61
> }
>
> You can see that it depends on @llvm.sadd.with.overflow working on
> vector types, and it relies on @llvm.experimental.vector.reduce.umax. I
> will note that my strategy with sign extension and icmp would be a
> semantically equivalent alternative to @llvm.sadd.with.overflow.
>
> On 2/9/19 12:37 PM, Nikita Popov wrote:
> > On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk
> > <mailto:llvm-dev at redking.me.uk>> wrote:
> > Regarding the reduction functions - I think the integer intrinsics
> > at least are relatively stable and we can probably investigate
> > dropping the experimental tag before the next release (assuming
> > someone has the time to take on the work) - it'd be nice to have the
> > SLP vectorizer emit reduction intrinsics directly for these.
> >
> > The vector reduction intrinsics still need quite a lot of work. Apart
> > from SplitVecOp, all legalizations are currently missing. This is only
> > noticeable on AArch64 right now, because all other targets expand vector
> > reductions prior to codegen.
>
> My follow-up question, then, is this:
>
> What do you recommend, in terms of LLVM IR, in order to obtain the %5
> value above?
>
> Thanks for the help,
> Andrew
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/dc04a888/attachment.html>
More information about the llvm-dev
mailing list