[PATCH] Add a Scalarize pass

Thu Nov 14 11:17:58 PST 2013

On Nov 14, 2013, at 4:16 AM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote:

> Michael Ilseman <milseman at apple.com> writes:
>> This comment pertains purely to implementation details of the pass.
>> 
>> I notice that you seem to lazily expand values (which may exist very far
>> away from the current visit point) and just iterate over the blocks in
>> any given order. Would you perhaps be better off if you could instead
>> visit (almost) all the defs before the uses (caveat: backedges)? You
>> should be able to use the ReversePostOrderTraversal class to visit
>> (non-backedge) defs before their uses. For the backedge case, you could
>> just defer those Phis until the end. Then you don't ever need any lazy
>> expansion, and can just map vector values to scalar values.
> 
> The reason for using lazy creation was just to cut down on the amount
> of garbage IR, rather than because of traversal order.  I initially did
> it eagerly but wasn't happy with the number of dead extractelement
> instructions it created.  Obviously they'd be cleaned up by DCE, but it
> seemed better not to create them unless we knew they would be used.
> 
> E.g. if an argument is only used by a function call (which we don't
> scalarise), the eager form would generate a sequence of extractelements
> for the argument anyway, as soon as it sees the argument.  Same for
> function call results if the result is passed to another function,
> or used as the operand to a store (which we don't scalarise by default).
> 

Sounds good.

> It also felt more natural to handle insertelement like this rather than
> create a scattered form for every insertelement instruction.
> 
> I agree that using a reverse post-order traversal would cut down
> on the number of cases in which we create a sequence of extractelements
> for an instruction only for us later to scalarise the instruction itself.
> I'm just not sure whether the overhead of creating the traversal
> order would outweight the benefits there or not.  One thing in favour
> of the current approach is that it should be cheaper for functions that
> don't have vector operations.
> 

The overhead would be that up-front we traverse the CFG and make a set+stack of BBs, keeping the stack around for the duration of runOnFunction. I suppose it may depend largely on what constitutes your functions. This is a specialized pass for a specialized domain, so I would normally assume that domain has a fair number of vectors. So, maybe an RPOT would be cheaper than potentially generating lots of extra IR. However, it might be worse on code with few scalarizable vectors and tons of small BBs. As you mentioned earlier about lazy vs. eager, avoiding creating spurious IR is worth a little bit of analysis.

I might of missed this earlier in the thread, but what is your specific domain?

>> I notice inside Scatterer's operator [], you might return null at the
>> bottom of the function. Is this safe? It seems like users pass on the
>> result directly to the builder, which sounds dangerous.
> 
> No, it never returns null.  If the operand[] optimisations fail,
> the fallback is always to create an extractelement (for vectors)
> or a getelementptr (for pointers).

Ah, I misread the code, thanks for the clarification.

> 
>> There's no test in this patch with dynamic InsertValues, so I'm not
>> sure if it's a case you're planning on handling or not. Either way, a
>> comment/assert would be nice :)
> 
> Yeah, good catch, thanks.  There should definitely be a test for that.
> I've just uploaded an updated patch with this as f12 in basic.ll.
> 

Thanks!

> Thanks,
> Richard
>