[llvm-dev] [RFC] Vector Predication

Thu Jan 31 23:52:27 PST 2019

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Thu, Jan 31, 2019 at 10:22 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> We're in-progress designing a RISC-V extension (http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) that would have variable-length vectors of short vectors (1 to 4):
> <VL x <4 x float>>
> where each predicate bit masks out a whole short vector. We're using this extension to vectorize graphics code where where variables in the pre-vectorization code are short vectors.
> So, vectorizing code like:
> for(int i = 0; i < 1000; i++)
> {
>     vec4 color = colors[i];
>     vec3 normal = normals[i];
>     color.rgb *= fmax(0.0, dot(normal, light_dir));
>     colors[i] = color;
> }
>
> I'm planning on passing already vectorized code into LLVM and using LLVM as a backend for optimization and JIT code generation.
>
> Do you think the EVL proposal would support an ISA like this as it's currently
> written (by pattern matching on predicate expansion and vector-length
> multiplication)?

whilst it may be tempting to suggest that a solution is to multiply up
the bits in the predicate (into groups of 3 or 4), the problem with
that is that if there are operations that require vec3 or vec4 as
operands interspersed with predicated operations that do not, that
realistically implies a need for two separate predicate registers,
otherwise cycles are wasted swapping predicates OR it implies that the
architecture *allows* two separate predicate registers to be selected.

 consequently, it would be much, much better to be able to have a
single bit of a predicate apply to the *entire* vec3 or vec4 type, on
each outer loop.

l.