[PATCH] D57504: RFC: Prototype & Roadmap for vector predication in LLVM

Wed Feb 12 08:33:28 PST 2020

craig.topper added a comment.

In D57504#1872374 <https://reviews.llvm.org/D57504#1872374>, @lkcl wrote:

> In D57504#1872310 <https://reviews.llvm.org/D57504#1872310>, @simoll wrote:
>
> > I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.
>
>
> ah yes got you.
>
> >>> This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying `MVL`.
> >> 
> >> and x86-for-hinting-the-SIMD-length.
> > 
> > For x86 with scalable types, yes. For "classic" SIMD types `MVL == W` of `<W x type>`
>
> mmm... i don't believe that's a wise choice / decision / assumption.  i am partly-guessing-and-making-architectural-assumptions here: imagine that the (very-well-informed) programmer knows how the pipelines of a particular processor work (and i do mean very well), they know that there are a couple of separate pipelines, one which handles e.g. NxFP32, one which handles MxFP64, but that if you issue SIMD instructions of width N=Mx2, it will result in a "blockage" (stall) and under-utilisation.
>
> *however*... if you issue *half* the workload (i.e. MVL == W/2) for the FP32 instructions interleaved with "full" workload (MVL==W for the FP64 ops), *then*, because of the way that the architecture works the two suites of instructions *will* go to the separate pipelines, *will* get done in parallel, because you're not overloading the exact same 64-bit-wide pipeline entrypoint if you'd done... you get what i'm trying to say?
>
> i think what i'm trying to say works better for MMX (the instructions which shared the FP regfile with SIMD instructions, is that right? or is it SSE?) - there you definitely want control over how much of the regfile is allocated to SIMD and how much remains actual for scalar-FP usage, and if MVL == W as a hard-coded assumption, with no "hint", you could end up taking up far more of the FP regfile for SIMD MMX than is efficient / effective.

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57504/new/

https://reviews.llvm.org/D57504