[cfe-dev] [llvm-dev] Writing built-ins for instructions returning multiple operands
David Chisnall via cfe-dev
cfe-dev at lists.llvm.org
Wed Sep 9 03:43:16 PDT 2015
On 9 Sep 2015, at 11:31, mats petersson via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> However, if we have, say, an instruction that returns two distinct values (div that also gives the remainder, as a simple example), you will either have to return a (small) struct, or pass in a pointer to be filled in by the function [the latter is not ideal from an optimisation perspective, as the optimiser has a harder time knowing if the output is aliased with something else.
It’s important to differentiate the C builtin from the LLVM intrinsic. It’s generally more useable (and idiomatic) in C to have additional return values become arguments returned by pointer. It’s generally more useful in LLVM IR to have multiple return values as a struct. For an example, consider the overflow-checked builtins.
The following C for a function that multiplies two numbers and returns either the result or 0 on overflow:
unsigned int mul(unsigned int x, unsigned int y)
{
unsigned int result;
return __builtin_umul_overflow(x, y, &result) == 0 ? 0 : result;
}
This becomes some fairly complex IR, with the key part being:
%result = alloca i32, align 4
...
%5 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %3, i32 %4)
...
%7 = extractvalue { i32, i1 } %5, 0
store i32 %7, i32* %result, align 4
The SROA happily turns this entire function into:
define i32 @mul(i32 %x, i32 %y) #0 {
%1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
%2 = extractvalue { i32, i1 } %1, 1
%3 = extractvalue { i32, i1 } %1, 0
%4 = zext i1 %2 to i32
%5 = icmp eq i32 %4, 0
br i1 %5, label %6, label %7
; <label>:6 ; preds = %0
br label %8
; <label>:7 ; preds = %0
br label %8
; <label>:8 ; preds = %7, %6
%9 = phi i32 [ 0, %6 ], [ %3, %7 ]
ret i32 %9
}
SimplifyCFG then turns the branches into a single select:
define i32 @mul(i32 %x, i32 %y) #0 {
%1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
%2 = extractvalue { i32, i1 } %1, 1
%3 = extractvalue { i32, i1 } %1, 0
%4 = zext i1 %2 to i32
%5 = icmp eq i32 %4, 0
%. = select i1 %5, i32 0, i32 %3
ret i32 %.
}
And instcombine gets rid of the redundant zext / icmp:
define i32 @mul(i32 %x, i32 %y) #0 {
%1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
%2 = extractvalue { i32, i1 } %1, 1
%3 = extractvalue { i32, i1 } %1, 0
%. = select i1 %2, i32 %3, i32 0
ret i32 %.
}
TL;DR version: Just because you expose a builtin to C as something that takes a pointer doesn’t mean that the optimisers will struggle with it if you expose a sensible LLVM IR intrinsic.
David
More information about the cfe-dev
mailing list