[LLVMdev] Handling Masked Vector Operations

Thu May 2 10:07:27 PDT 2013

Nadav Rotem <nrotem at apple.com> writes:

>     It seems the only solution is to create an intrinsic:
>     
>     llvm_int_load_masked mask, [addr]
>     
>     But this unnecessarily shuts down optimization.
>     
>     
>
> I think that using intrinsics is the right solution. I imagine that
> most interesting load/store optimizations happen before vectorization,
> so I am not sure how much we can gain by optimizing masked
> load/stores. 

Perhaps that is true.  If this is the only intrinsic we need (well, a
store too), maybe it's not too bad.

>     Similar problems exist with any trapping instruction (div, mod,
>     etc.).
>     It gets even worse when you consider than any floating point
>     operation
>     can trap on a signalling NaN input.
>     
>
> For DIV/MOD you can blend the inputs BEFORE the operation. You can
> place ones or zeros depending on the operation. 

That's true but it's inefficient.  I suppose we can write patterns to
match the input selects as well and just drop them, opting for the
masked operation.  But this all requires that these
select/select/op/select sequences stay intact throughout llvm so isel
can match it.  I'm not totally confident that's possible.

>     
>     So are there any ideas out there for how to efficiently handle
>     this?
>     We've talked about llvm and masks before and it's clear that there
>     is
>     strong resistance to adding masks to the IR. 
>
> Yes. I think that the consensus is that we don't need to predicate the
> IR itself to support MIC-like processors. 

Perhaps not but I think we need a little more than we have right now.
I'll ponder this some more but in the mean time, please continue to add
thoughts and ideas.

                                -David