[LLVMdev] predicated execution

Thu Mar 8 02:25:25 PST 2007

Evan,

thanks for your detailed answer!

On Mar 8, 2007, at 10:03 AM, Evan Cheng wrote:
>> a performance critical issue will be proper use of predicated
>> execution. if-conversion can either be performed early in the code
>> generation process [1] (exposing larger basic blocks to the
>> optimizers) or deferred until code generation is almost
>> complete [2].
>
> Option 1 makes a lot of sense, most predication aware compilers go
> this route. I haven't read the paper. But I am guessing their system
> is rediscovering CFG etc. at link time, reverse compiling assembly
> code into some internal representation? That is an interesting
> approach for a company who is servicing clients who do not provide
> their source code.
>
> If you use LLVM and you build option 1, you can do something like
> option 2 without a lot of horribleness. :-) LLVM does link time
> optimization at bytecode level.
right, i agree option 1 might be favorable in many terms. however,  
this would involve major changes in various components (instruction  
selection, register allocation, probably various optimization passes)  
and i don't have the time now to do this. also, it would be very hard  
to predict on the llvm level how code will look like after code  
generation and register allocation, i.e., aggressive if-conversion  
and hyperblock generation will probably require partial reverse if- 
conversion to be effective.

>> for the time being, i'm planning to go with the second approach and
>> have a late optimization pass over the selected machine instructions
>> that
>> a] preliminarily schedules and bundles the selected instructions
>> b] speculatively executes instructions in the predecessor block if
>> there are unused resources
>> c] converts blocks B into a appropriately predicated version
>> (eliminating branches) if it's profitable for the particular
>> architecture.
>
> Ok. It doesn't sound like option 2 though (it has nothing to do with
> LTO). Item (b) sounds like meld scheduling, if that what you are
> considering?
not exactly and yes, it has nothing to do with LTO. the idea is just  
to move instructions to the predecessor block (if there is only one),  
if they can execute there (conditionally) for free, i.e., there are  
unused slots/nops that might be replaced with a particular  
instruction. this decreases the size of blocks for (c).

> However, you are in luck! It's extremely likely we will be starting
> on predication work for the ARM backend in the very near future. ARM
> has an extremely limited predication model (just condition codes), so
> it's not yet clear how much of our work will be applicable to your
> needs. But we'll try to keep our work as target independent as  
> possible.
thats what i was hoping. for the time being, i need a fast  
implementation that gets me an idea of the performance that can be  
reached with the existing infrastructure (our target doesn't exist  
yet in silicon but we do have a simulator). i'll report first results  
as they become available.

cheers,

-
dietmar