[cfe-dev] [llvm-dev] Writing built-ins for instructions returning multiple operands

Wed Sep 9 03:43:16 PDT 2015

On 9 Sep 2015, at 11:31, mats petersson via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> However, if we have, say, an instruction that returns two distinct values (div that also gives the remainder, as a simple example), you will either have to return a (small) struct, or pass in a pointer to be filled in by the function [the latter is not ideal from an optimisation perspective, as the optimiser has a harder time knowing if the output is aliased with something else.

It’s important to differentiate the C builtin from the LLVM intrinsic.  It’s generally more useable (and idiomatic) in C to have additional return values become arguments returned by pointer.  It’s generally more useful in LLVM IR to have multiple return values as a struct.  For an example, consider the overflow-checked builtins.

The following C for a function that multiplies two numbers and returns either the result or 0 on overflow:

unsigned int mul(unsigned int x, unsigned int y)
{
	unsigned int result;
	return __builtin_umul_overflow(x, y, &result) == 0 ? 0 : result;
}

This becomes some fairly complex IR, with the key part being:

  %result = alloca i32, align 4
...
  %5 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %3, i32 %4)
...
  %7 = extractvalue { i32, i1 } %5, 0
  store i32 %7, i32* %result, align 4

The SROA happily turns this entire function into:

define i32 @mul(i32 %x, i32 %y) #0 {
  %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %2 = extractvalue { i32, i1 } %1, 1
  %3 = extractvalue { i32, i1 } %1, 0
  %4 = zext i1 %2 to i32
  %5 = icmp eq i32 %4, 0
  br i1 %5, label %6, label %7

; <label>:6                                       ; preds = %0
  br label %8

; <label>:7                                       ; preds = %0
  br label %8

; <label>:8                                       ; preds = %7, %6
  %9 = phi i32 [ 0, %6 ], [ %3, %7 ]
  ret i32 %9
}

SimplifyCFG then turns the branches into a single select:

define i32 @mul(i32 %x, i32 %y) #0 {
  %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %2 = extractvalue { i32, i1 } %1, 1
  %3 = extractvalue { i32, i1 } %1, 0
  %4 = zext i1 %2 to i32
  %5 = icmp eq i32 %4, 0
  %. = select i1 %5, i32 0, i32 %3
  ret i32 %.
}

And instcombine gets rid of the redundant zext / icmp:

define i32 @mul(i32 %x, i32 %y) #0 {
  %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %2 = extractvalue { i32, i1 } %1, 1
  %3 = extractvalue { i32, i1 } %1, 0
  %. = select i1 %2, i32 %3, i32 0
  ret i32 %.
}

TL;DR version: Just because you expose a builtin to C as something that takes a pointer doesn’t mean that the optimisers will struggle with it if you expose a sensible LLVM IR intrinsic.

David