[PATCH] D57504: RFC: Prototype & Roadmap for vector predication in LLVM
Jacob Lifshay via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 4 10:03:11 PST 2020
programmerjake added a comment.
>From what I recall, the plan is to implement this by using fixed-size vector types combined with VL-based ops. MVL would be the size of those vector types.
Quoting all of lkcl's email so it ends up in Phabricator:
On Tue, Feb 4, 2020 at 3:48 AM @lkcl wrote:
> In D57504#1856586 <https://reviews.llvm.org/D57504#1856586>, @simoll wrote:
>
> > In D57504#1856207 <https://reviews.llvm.org/D57504#1856207>, @andrew.w.kaylor wrote:
> >
> > > In D57504#1854330 <https://reviews.llvm.org/D57504#1854330>, @simoll wrote:
> > >
> > > > Exactly. The VE target strictly requires `VL <= MVL` or you'll get a
> > > > hardware exception. Enforcing strict UB here means VP-users have to
> > > > explicitly drop instructions that keep the VL within bounds. This means
> > > > that we can optimize the VL computation code and that it can be factored
> > > > into cost calculations, etc. With Options 2 & 3 this would happen only
> > > > very late in the backend when most scalar optimizations are already
> > > > done.
> > >
> > >
> > > I think I'm lost here. Which thing is VL and which is MVL in this
> > > scenario?
> >
> >
> > VL == %evl
> > MVL == W
> > Sorry for the vector speak :)
>
>
> ah. right. that bit of information was important, simon :) without
> clarification, i assumed W was the "required vector length at the
> program loop level", whoops..
>
> > I agree that, in the end, the semantics will be based solely on IR-types.
> > However, what that semantics should look like for the `%evl > W` case
> > depends on the way targets can handle this to make sure that whatever we
> > specify on IR-level is at least reasonable for all targets.
>
> okaaay, riight, so the purpose of the discussion is, e.g., to work out
> how to represent things like for-loops in the strcpy example here, is
> that right?
>
> https://www.sigarch.org/simd-instructions-considered-harmful/
>
> so %evl > W (i.e. %evl > MVL) in RVV, it is the very effort of trying
> to *set* %evl to the loop length, this is retried *in every loop*.
> and the implementation (in hardware) very very specifically -
> unbeknownst to the programmer (and to the IR writer) - hard-limits
> %evl *to* MVL.
>
> to be clear: although the programmer *tries* to set %evl > MVL, this
> *never happens*: %evl will *always* be actually set to <= MVL.
>
> it's quite clever.
>
> it is really really important - a critical part of the design of RVV
> loops - that the programmer (or LLVM compiler developer in this case)
> *not* even know or make any assumptions about what MVL will be. some
> hardware will actually have MVL equal to 1. some really unbelievably
> powerful and stupidly expensive hardware might have MVL equal to 65536
> (yes really, 65536 wide vector ALUs) and the critical thing is, the
> assembly code *does not care*. it still works perfectly on both,
> despite the fact that you have no idea, really, what value MVL is
> going to be.
>
> SimpleV is different in that you absolutely must explicitly declare,
> as part of any assembly loops (or any other instructions), precisely
> and exactly how large MVL is to be. this is because it is an
> "allocation of the number of **scalar** registers - from the *scalar*
> regfile - to be used for the vector operation".
>
> thus, for SimpleV, we do actually need a way in LLVM to represent
> (set) MVL, because it is quite literally an "explicit reservation of a
> certain size and number of registers".
>
> think of it as a way to say "hey y'know these upcoming SIMD
> instructions? yeah, we need to set them to all be of length 8 for this
> set. then, like, next we need to set all the upcoming SIMD
> instructions to 16, y'ken". actually they're not SIMD they're
> vector-ops but you get the idea.
>
> this we do with an *extra* parameter to the SV.SETVL instruction
> https://libre-riscv.org/simple_v_extension/appendix/#index8h1
>
> SV.SETVL a2, t4, 8 # MVL==8
>
> now, *if* we have a way to set MVL (through LLVM-IR), we can *also*
> use that for doing saving/restoring of entire scalar register files
> with a single instruction, as well as use it for function call
> register stack save/restore.
>
> basically when we have control over MVL through LLVM-IR, we get a
> "LD.MULTI" and "ST.MULTI" instruction "for free" as an accidental
> side-benefit.
>
> SV.SETMVL #32 ; tells the hardware that vector operations are to
> use 32 *scalar* regs
> SV.LD a0, f0, #8 ; loads registers f0 thru f31 from the address at (a0+8)
>
> for SIMD systems such as x86 and ARM, the only way to keep loops as
> simple as RVV and SV, you'd need an instruction which, when you got to
> the last run through the loop, then whilst %evl would be set to some
> fixed-width-at-the-SIMD-boundary, some predicate mask was set up
> *instead*... and thus despite the SIMD operation still being 4 (or 8,
> or 16), the elements at the end were left alone (masked out)
>
> without such an instruction (one which sets up the predicate bitmask
> as not being all 1s on the last loop) you'd have to have a sequence of
> instructions that effectively do the same job, and those instructions
> will, clearly, impact performance due to them being executed on each
> and every loop.
>
> this is, unless the above is expressly supported in a single
> instruction (one equivalent to SETVL
> which sets up the predicate mask on the last loop) i am sorry to have
> to use this particular phrase, a dog's dinner approach when compared
> to variable-run vectorisation, and it's why i keep warning that
> attempting to add support for fixed-power-of-two-%evl in this proposal
> is not a good idea.
>
> even if you _do_ have such an instruction (or a really really short
> sequence that's equivalent and does not impact the length of the loop
> too badly), the fact that the assembly code has to use 16 wide SIMD if
> you want to do high-performance but then if you have short loops you
> are wasting ALU resources but if you use 4 wide SIMD to stop wasting
> ALU resources you can't do high-performance, you are screwed both
> coming and going, and, ultimately, have to resort to stripmining to
> properly solve it, and at that point we're *definitely* outside of the
> scope of this proposal [as i understand it].
>
> l.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57504/new/
https://reviews.llvm.org/D57504
More information about the llvm-commits
mailing list