[llvm-commits] [PATCH] Add bindings for walking the instruction stream.
Gordon Henriksen
gordonhenriksen at mac.com
Fri Mar 21 07:35:50 PDT 2008
On 2008-03-21, at 03:22, Erick Tryzelaar wrote:
> I just grepped through all the llvm and clang code, and there were
> literally 13 references to "SetInsertPoint", including the
> definitions. So what does everyone do instead?
1. The constructor for LLVM{,Folding}Builder is a convenience for
SetInsertPoint.
2. The instruction constructors all have overloads which accept
Instruction *InsertBefore and BasicBlock *InsertAtEnd. This is more
convenient when building one or two instructions, and is how
LLVM{,Folding}Builder are implemented.
> On Thu, Mar 20, 2008 at 1:50 PM, Gordon Henriksen
> <gordonhenriksen at mac.com> wrote:
>
>> Actually, analysis and transformation algorithms would be well-
>> served initial_instr/terminator bindings which raise Not_found
>> instead of yielding None. This would serve well such clients since
>> they would not have to explicitly handle the case of an empty block
>> or one with with no terminator, neither of which occur in valid IR.
>
> Is that true? Because what if you're at the terminator and you take
> "next_instr" on it?
Hm? Being able to write an algorithm that groks the CFG requires (1)
knowing the terminator [which must always be present in valid IR] and
its type and (2) discovering the predecessor blocks, which are
operands to the terminator.
> Another thing that may or may not be interesting is that we could
> plug the "next_instr" into an ocaml stream (they've got a
> constructor for unit -> 'a option), and use the camlp4 pattern
> matching on it. We'd have to figure out how to pattern match
> instructions, though.
We'd probably have to write things like
| _ as i where is_call i & is_function (param 0 i) ->
I suppose it would be possible to deconstruct the instruction into a
variant, but that would be rather elaborate and likely wasteful.
> Do you think it'd be useful to also be able to iterate from an
> arbitrary starting position and to iterate in reverse?
It's possible to build those if necessary from the 4 basic navigation
functions.
>> I was trying to avoid this n x m explosion, although it is probably
>> bounded at 8 = {builder,position}_{before,after,at_end,at_start}.
>> In its place, I advocate the composable concepts of:
>>
>> • IR navigation (first, next, last, prev). This is useful in other
>> contexts.
>> • A pair of builder positioning functions (builder_at,
>> position_builder) that are as expressive as
>> LLVMBuilder(BasicBlock*, BasicBlock::iterator).
>>
>> Your first e-mail implied you were for such a concept, although I
>> think you over engineered it slightly. A perfect functional concept
>> for LLVM's ilist iterator would be thus:
>>
>> type 'a 'b llposition =
>> | Before of 'a
>> | At_end of 'b
>
> I actually changed my mind against this, because the usage of it
> would be either: "position_at (Before i)" or "position_before i"
> which is nearly the same.
Yup, they're similar in simple cases. The difference being that the
iterator concept handles all the cases uniformly.
> The primitives "position_before" and "position_at_end" really
> represent all possibilities, so we'd only have to bind those two.
This is what I'm getting at. This gets back to, in C++, the 2 possible
states of an ilist iterator:
before an instruction: i != bb->end(), *i is a valid Instruction*; i
is a thin wrapper around an Instruction*
at the end of a basic block: i == bb->end(), *i asserts; i wraps a non-
Instruction tail node owned by the BasicBlock
These are the only two necessary states. The types (llbasicblock *
llvalue option) and (Before llvalue | At_end llbasicblock) both
encapsulate this as well, the former with the possibility for
inconsistency if instr_parent i ≠ bb.
In C++, a (BasicBlock*, BasicBlock::iterator) tuple is used in places
because the end iterator doesn't have a getParent facility.
> It'd also be faster than having to box and unbox instructions.
> Everything else is convenience.
>
> Now, we could add a convenience function that was like:
>
> type 'a 'b llposition =
> | Before of 'a
> | After of 'a
> | At_front of 'b
> | At_end of 'b
>
> val position_builder: llbuilder -> (llvalue, llbasicblock)
> llposition -> unit
>
> Which would use "position_before" and "position_at_end" to capture
> all the variants. Is that what you intended?
Nope, the example I wrote is what I intended. :) Let me repaste it:
>> external builder : unit -> llbuilder
>> external position_builder : llvalue llbasicblock llposition ->
>> llbuilder
>>
>> let builder_at ip =
>> let b = builder () in
>> position_builder ip b;
>> b
>>
>> let builder_at_end bb = builder_at (At_end bb)
>> let builder_before i = builder_at (Before i)
>> let builder_at_start bb = builder_at (first_instr bb)
>> let builder_after i = builder_at (succ_instr i)
>> let position_at_end bb = position_builder (At_end bb)
>> let position_before i = position_builder (Before i)
>> let position_at_start bb = position_builder (first_instr bb)
>> let position_after i = position_builder (succ_instr i)
>>
>
> One thing though is that this still won't let you call
> "position_builder b (first_instr bb)", you'd have to unbox the
> option and rebox in "position_builder b (At_front i)".
If we provide 'a 'b position, then the navigation functions should
return those instead of 'a option precisely so that such conversion is
not necessary. See my example.
A different set of names might less imply that they actually return
instructions directly. Perhaps
val instr_begin : llbasicblock -> llvalue llbasicblock llposition
val instr_end : llbasicblock -> llvalue llbasicblock llposition
val instr_succ : llvalue -> llvalue llbasicblock llposition
instr_pred is an oddity, though, now that I start to type it out. It's
the reverse_iterator problem. C++ solves this by essentially saying…
type 'a 'b reverse_iterator =
| After of 'a option
| At_start of 'b option
val instr_end : llbasicblock -> llvalue llbasicblock reverse_iterator
val instr_pred : llvalue -> llvalue llbasicblock reverse_iterator
instr_end and instr_pred would need to operate on these values, and
they spoil my utopian grand-unified iterator concept a bit. Although
do note that Before i and At_end bb likely obviate the need to use
these functions for anything but reverse iteration, and conversion
between the forms is not complicated. Could work…
> Now we could make the type:
>
> type 'a 'b llposition =
> | Before of 'a option
> | After of 'a option
> | At_front of 'b option
> | At_end of 'b option
>
> But then it seems to be getting a little overly complicated.
It is.
Before (None) | After (None) | At_front (None) | At_end (None) are
invalid.
After (Some _) | At_front (Some _) are redundant.
> it'll be a little more complicated to pull the data out of a variant
> in C,
It's not bad at all with the minimal variant (Before of 'a | At_end of
'b) that directly corresponds to the C++ iterator. I think this is the
extent of it:
if (!Tag_val(i, 0))
(LLVMValueRef) Op_val(Field(i, 0)); // Before of llvalue
else
(LLVMBasicBlockRef) Op_val(Field(i, 0)); // At_end of llbasicblock
Thanks Erick.
— Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20080321/2394057b/attachment.html>
More information about the llvm-commits
mailing list