[LLVMdev] Handling Masked Vector Operations
dag at cray.com
dag at cray.com
Thu May 2 10:07:27 PDT 2013
Nadav Rotem <nrotem at apple.com> writes:
> It seems the only solution is to create an intrinsic:
> llvm_int_load_masked mask, [addr]
> But this unnecessarily shuts down optimization.
> I think that using intrinsics is the right solution. I imagine that
> most interesting load/store optimizations happen before vectorization,
> so I am not sure how much we can gain by optimizing masked
Perhaps that is true. If this is the only intrinsic we need (well, a
store too), maybe it's not too bad.
> Similar problems exist with any trapping instruction (div, mod,
> It gets even worse when you consider than any floating point
> can trap on a signalling NaN input.
> For DIV/MOD you can blend the inputs BEFORE the operation. You can
> place ones or zeros depending on the operation.
That's true but it's inefficient. I suppose we can write patterns to
match the input selects as well and just drop them, opting for the
masked operation. But this all requires that these
select/select/op/select sequences stay intact throughout llvm so isel
can match it. I'm not totally confident that's possible.
> So are there any ideas out there for how to efficiently handle
> We've talked about llvm and masks before and it's clear that there
> strong resistance to adding masks to the IR.
> Yes. I think that the consensus is that we don't need to predicate the
> IR itself to support MIC-like processors.
Perhaps not but I think we need a little more than we have right now.
I'll ponder this some more but in the mean time, please continue to add
thoughts and ideas.
More information about the llvm-dev