[llvm-commits] [PATCH] Add bindings for walking the instruction stream.

Gordon Henriksen gordonhenriksen at mac.com
Fri Mar 21 07:35:50 PDT 2008


On 2008-03-21, at 03:22, Erick Tryzelaar wrote:

> I just grepped through all the llvm and clang code, and there were  
> literally 13 references to "SetInsertPoint", including the  
> definitions. So what does everyone do instead?

1. The constructor for LLVM{,Folding}Builder is a convenience for  
SetInsertPoint.
2. The instruction constructors all have overloads which accept  
Instruction *InsertBefore and BasicBlock *InsertAtEnd. This is more  
convenient when building one or two instructions, and is how  
LLVM{,Folding}Builder are implemented.

> On Thu, Mar 20, 2008 at 1:50 PM, Gordon Henriksen
> <gordonhenriksen at mac.com> wrote:
>
>> Actually, analysis and transformation algorithms would be well- 
>> served initial_instr/terminator bindings which raise Not_found  
>> instead of yielding None. This would serve well such clients since  
>> they would not have to explicitly handle the case of an empty block  
>> or one with with no terminator, neither of which occur in valid IR.
>
> Is that true? Because what if you're at the terminator and you take  
> "next_instr" on it?

Hm? Being able to write an algorithm that groks the CFG requires (1)  
knowing the terminator [which must always be present in valid IR] and  
its type and (2) discovering the predecessor blocks, which are  
operands to the terminator.

> Another thing that may or may not be interesting is that we could  
> plug the "next_instr" into an ocaml stream (they've got a  
> constructor for unit -> 'a option), and use the camlp4 pattern  
> matching on it. We'd have to figure out how to pattern match  
> instructions, though.

We'd probably have to write things like

| _ as i where is_call i & is_function (param 0 i) ->

I suppose it would be possible to deconstruct the instruction into a  
variant, but that would be rather elaborate and likely wasteful.

> Do you think it'd be useful to also be able to iterate from an  
> arbitrary starting position and to iterate in reverse?

It's possible to build those if necessary from the 4 basic navigation  
functions.

>> I was trying to avoid this n x m explosion, although it is probably  
>> bounded at 8 = {builder,position}_{before,after,at_end,at_start}.  
>> In its place, I advocate the composable concepts of:
>>
>> • IR navigation (first, next, last, prev). This is useful in other  
>> contexts.
>> • A pair of builder positioning functions (builder_at,  
>> position_builder) that are as expressive as  
>> LLVMBuilder(BasicBlock*, BasicBlock::iterator).
>>
>> Your first e-mail implied you were for such a concept, although I  
>> think you over engineered it slightly. A perfect functional concept  
>> for LLVM's ilist iterator would be thus:
>>
>> type 'a 'b llposition =
>> | Before of 'a
>> | At_end of 'b
>
> I actually changed my mind against this, because the usage of it  
> would be either: "position_at (Before i)" or "position_before i"  
> which is nearly the same.


Yup, they're similar in simple cases. The difference being that the  
iterator concept handles all the cases uniformly.

> The primitives "position_before" and "position_at_end" really  
> represent all possibilities, so we'd only have to bind those two.

This is what I'm getting at. This gets back to, in C++, the 2 possible  
states of an ilist iterator:

before an instruction: i != bb->end(), *i is a valid Instruction*; i  
is a thin wrapper around an Instruction*
at the end of a basic block: i == bb->end(), *i asserts; i wraps a non- 
Instruction tail node owned by the BasicBlock

These are the only two necessary states. The types (llbasicblock *  
llvalue option) and (Before llvalue | At_end llbasicblock) both  
encapsulate this as well, the former with the possibility for  
inconsistency if instr_parent i ≠ bb.

In C++, a (BasicBlock*, BasicBlock::iterator) tuple is used in places  
because the end iterator doesn't have a getParent facility.

> It'd also be faster than having to box and unbox instructions.  
> Everything else is convenience.
>
> Now, we could add a convenience function that was like:
>
> type 'a 'b llposition =
>  | Before of 'a
>  | After of 'a
>  | At_front of 'b
>  | At_end of 'b
>
> val position_builder: llbuilder -> (llvalue, llbasicblock)  
> llposition -> unit
>
> Which would use "position_before" and "position_at_end" to capture  
> all the variants. Is that what you intended?

Nope, the example I wrote is what I intended. :) Let me repaste it:

>> external builder : unit -> llbuilder
>> external position_builder : llvalue llbasicblock llposition ->  
>> llbuilder
>>
>> let builder_at ip =
>>   let b = builder () in
>>   position_builder ip b;
>>   b
>>
>> let builder_at_end bb = builder_at (At_end bb)
>> let builder_before i = builder_at (Before i)
>> let builder_at_start bb = builder_at (first_instr bb)
>> let builder_after i = builder_at (succ_instr i)
>> let position_at_end bb = position_builder (At_end bb)
>> let position_before i = position_builder (Before i)
>> let position_at_start bb = position_builder (first_instr bb)
>> let position_after i = position_builder (succ_instr i)
>>
>
> One thing though is that this still won't let you call  
> "position_builder b (first_instr bb)", you'd have to unbox the  
> option and rebox in "position_builder b (At_front i)".

If we provide 'a 'b position, then the navigation functions should  
return those instead of 'a option precisely so that such conversion is  
not necessary. See my example.

A different set of names might less imply that they actually return  
instructions directly. Perhaps

val instr_begin : llbasicblock -> llvalue llbasicblock llposition
val instr_end : llbasicblock -> llvalue llbasicblock llposition
val instr_succ : llvalue -> llvalue llbasicblock llposition

instr_pred is an oddity, though, now that I start to type it out. It's  
the reverse_iterator problem. C++ solves this by essentially saying…

type 'a 'b reverse_iterator =
| After of 'a option
| At_start of 'b option

val instr_end : llbasicblock -> llvalue llbasicblock reverse_iterator
val instr_pred : llvalue -> llvalue llbasicblock reverse_iterator

instr_end and instr_pred would need to operate on these values, and  
they spoil my utopian grand-unified iterator concept a bit. Although  
do note that Before i and At_end bb likely obviate the need to use  
these functions for anything but reverse iteration, and conversion  
between the forms is not complicated. Could work…

> Now we could make the type:
>
> type 'a 'b llposition =
>  | Before of 'a option
>  | After of 'a option
>  | At_front of 'b option
>  | At_end of 'b option
>
> But then it seems to be getting a little overly complicated.

It is.

Before (None) | After (None) | At_front (None) | At_end (None) are  
invalid.
After (Some _) | At_front (Some _) are redundant.

> it'll be a little more complicated to pull the data out of a variant  
> in C,

It's not bad at all with the minimal variant (Before of 'a | At_end of  
'b) that directly corresponds to the C++ iterator. I think this is the  
extent of it:

if (!Tag_val(i, 0))
   (LLVMValueRef) Op_val(Field(i, 0)); // Before of llvalue
else
   (LLVMBasicBlockRef) Op_val(Field(i, 0)); // At_end of llbasicblock



Thanks Erick.

— Gordon


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20080321/2394057b/attachment.html>


More information about the llvm-commits mailing list