[llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction

Wed Aug 4 15:05:51 PDT 2021

On August 3, 2021 5:32:29 PM UTC, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>(renato thank you for cc'ing, due to digest subscription at the moment)
>
>On Tue, Aug 3, 2021 at 3:25 PM Renato Golin <rengolin at gmail.com> wrote:

>>  * For example, some reduction intrinsics were added to address
>bloat, but no target is forced to use them.
>
>excellent.  iteration and reduction (including fixed schedule
>paralleliseable reduction) is one of the intrinsics being added to
>SVP64.  

apologies to all for the follow-up, i realised i joined iteration and reduction together as if they were the same concept: they are not.

Iterative Sum when carried out on add of a Vector containing all 1s results in a Pascal Triangle Vector output

example of existing hardware that has actual Iteration instructions: Section 8.15 of SX-Aurora ISA guide, p8-297, the pseudocode for Iterative Add:

for (i = 0 to VL-1) {
    Vx(i) ← Vy(i) + Vx(i-1), where Vx(-1)=Sy
}

where if Vx and Vy are the same register you get the Pascal Triangle effect.

https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf

SVP64 does not have this *specifically* added: it is achieved incidentally by issuing an add where the src and dest registers differ by one (SVP64 sits on top of a rather large scalar regfile, 128 64 bit entries)

   sv.add r1, r1, r0

we did however need to add a "reverse gear" (for (i = 0 to VL-1)) which was needed for ffmpeg's MP3 CODEC ironically to *avoid* the Pascal Triangle effect (and not need to copy a large batch of registers instead) 

can anyone say if LLVM SVE happened to add Iteration?

l.