[LLVMdev] Plans considering first class structs and multiple return values

Fri May 30 09:11:02 PDT 2008

Hi all,

I've been implementing some stuff that uses the new structs-as-firstclass
values code. Apart from some implementation problems, I'm spotting a few
structural problems that seem non-trivial to fix.

In particular, now that structs are a first class values, the old way or
returning multiple values is a bit confusing. The old way had a variable
number of arguments to the return instruction, which could come out at the
caller end as some special aggregrate value that could be indexed using
getresult.

Now that aggregrates are first class, you should be able to simply return a
single struct and access the result in the caller using the new extractvalue
instruction.

However, both approaches define a function with a struct as a return type:
It's not possible to tell the difference between both from looking at the
function type. So, any caller cannot know for sure what to do with the
result... 

Also, there is still a lot of code that assumes that having a struct return
type means you're using a multiple return statement, preventing things like 

	define {i32, i32} @bar()
	{
		%res1 = insertvalue { i32, i32 } zeroinitializer, i32 1, i32 0
		%res2 = insertvalue { i32, i32 } %res1, i32 2, i32 0
		ret { i32, i32 } %res2
	}

from working (the validator currently barfs over this).

The lack of this distinction also means it is not so trivial to "add" an extra
argument to a function: If the return type is an aggregrate, should you add an
element to that aggregrate, or create a new struct containing the previous
struct and the new value? And what about functions that return void and want
an extra argument? Messy.

The main cause of this is actually the special case for returning a single
value. Instead of returning a struct with one element, you just return the
element. You could make this more consistent by making a function always
return a struct, which most of the time will just contain a single field. I'm
not sure that this is really a usable approach (or what the ABI impact is),
but it could be useful. In particular, a function returning a struct would
then be declared as returning {{i32, i32}} (to distinguish it between a
function returning two values, which would be {i32, i32}). This is also
consistent with making a void function return {}. This kind of stuff could
make a lot of code a lot more regular, but might be a bit annoying to
implemented.

Furthermore, as far as I've understood, the intention is to remove the
"multiple return value" support in favour of returning structs. I take it this
means that at least the getresult instruction will be removed, and possible
the multiple operand return as well. This would partly solve some issues, but
will completely remove the concept of returning multiple values (unless you
adopt the above approach of always returning structs, even for single values).

Additionally, the current form of the ret instruction is still useful, for
making multiple return values readable. In particular, writing
	ret i32 1, i32 2
is a lot more readable than the (functionally identical) three line return
statement above. However, to make the first class aggregrates even more
usable, it might be better to remove the multi operand return instruction and
add support for literal aggregrates. Currently, I think these are only
supported as global constants. It would be useful if the following was valid:
	ret { i32, i32 } { i32 1, i32 2 } 
However, llvm-as rejects this. Interestingly, 'zeroinitializer' is a valid
operand to ret, which should be similar to '{ i32 1, i32 2 }' in that they are
both literal aggregates.

Even more, one would also like to be able to build non constant structs in a
similar manner. i.e., writing
	ret { i32, i32 } { i32 %a, i32 %b }
would be a lot more useful than the current
	ret i32 %a, i32 %b
form, since in the first form the ret instruction still has a single operand
that is easy to work with.

Perhaps if using a non-constant literal struct is not so trivial, a
instruction can be added for that instead. Ie,
	%res = buildagg i32 %a, i32 %b
	ret %res

This is still a clean way of building a struct (a lot easier to work with than
nested insertvalues IMHO) but also leaves the ret instruction simple.

Anyway, if using a literal struct as an operand would work, then the multiple
return value ret instruction can probably be removed alltogether.

I've attached a patch with some changes I made to get things working a bit,
but it's not really a decent patch yet. The changes to Andersens' pass are
plain wrong, things can break horribly when you fiddle with structs containing
pointers this way. The others prevent things from asserting and I think they
are consistent with the current state of the code, but will need change
depending on what happens to the return instruction.

Any enlightening thoughs on this issue? I'm I've put down mine in a slightly
chaotic manner, hopefully things will still get across clearly :-)

Gr.

Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aggregrates.diff
Type: text/x-diff
Size: 2384 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080530/39dbc07f/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080530/39dbc07f/attachment.sig>