[cfe-dev] [llvm-dev] Writing built-ins for instructions returning multiple operands
Martin J. O'Riordan via cfe-dev
cfe-dev at lists.llvm.org
Fri Sep 11 02:46:18 PDT 2015
Thanks for this feedback. Although my example (contrived) used an 'int', the actual instructions involved use vector operands so it’s a bit more tricky, but the approach you have outlined looks workable. I had been avoiding the notion of "pass-by-reference", but the transformations you have outlined should allow me to represent this using pointers or references in C/C++, but lower to the intended instruction and eliminate the implied indirection.
All the best,
MartinO
-----Original Message-----
From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall
Sent: 09 September 2015 11:43
To: mats petersson <mats at planetcatfish.com>
Cc: Martin J. O'Riordan <martin.oriordan at movidius.com>; llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Writing built-ins for instructions returning multiple operands
On 9 Sep 2015, at 11:31, mats petersson via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> However, if we have, say, an instruction that returns two distinct values (div that also gives the remainder, as a simple example), you will either have to return a (small) struct, or pass in a pointer to be filled in by the function [the latter is not ideal from an optimisation perspective, as the optimiser has a harder time knowing if the output is aliased with something else.
It’s important to differentiate the C builtin from the LLVM intrinsic. It’s generally more useable (and idiomatic) in C to have additional return values become arguments returned by pointer. It’s generally more useful in LLVM IR to have multiple return values as a struct. For an example, consider the overflow-checked builtins.
The following C for a function that multiplies two numbers and returns either the result or 0 on overflow:
unsigned int mul(unsigned int x, unsigned int y) {
unsigned int result;
return __builtin_umul_overflow(x, y, &result) == 0 ? 0 : result; }
This becomes some fairly complex IR, with the key part being:
%result = alloca i32, align 4
...
%5 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %3, i32 %4) ...
%7 = extractvalue { i32, i1 } %5, 0
store i32 %7, i32* %result, align 4
The SROA happily turns this entire function into:
define i32 @mul(i32 %x, i32 %y) #0 {
%1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
%2 = extractvalue { i32, i1 } %1, 1
%3 = extractvalue { i32, i1 } %1, 0
%4 = zext i1 %2 to i32
%5 = icmp eq i32 %4, 0
br i1 %5, label %6, label %7
; <label>:6 ; preds = %0
br label %8
; <label>:7 ; preds = %0
br label %8
; <label>:8 ; preds = %7, %6
%9 = phi i32 [ 0, %6 ], [ %3, %7 ]
ret i32 %9
}
SimplifyCFG then turns the branches into a single select:
define i32 @mul(i32 %x, i32 %y) #0 {
%1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
%2 = extractvalue { i32, i1 } %1, 1
%3 = extractvalue { i32, i1 } %1, 0
%4 = zext i1 %2 to i32
%5 = icmp eq i32 %4, 0
%. = select i1 %5, i32 0, i32 %3
ret i32 %.
}
And instcombine gets rid of the redundant zext / icmp:
define i32 @mul(i32 %x, i32 %y) #0 {
%1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
%2 = extractvalue { i32, i1 } %1, 1
%3 = extractvalue { i32, i1 } %1, 0
%. = select i1 %2, i32 %3, i32 0
ret i32 %.
}
TL;DR version: Just because you expose a builtin to C as something that takes a pointer doesn’t mean that the optimisers will struggle with it if you expose a sensible LLVM IR intrinsic.
David
More information about the cfe-dev
mailing list