[PATCH] D85558: [SVE] Implement fixed-width ZEXT lowering

Fri Aug 7 16:06:45 PDT 2020

paulwalker-arm added a comment.

It does seem like we're risking treading on each others toes :) Perhaps worth syncing up so we don't duplicate effort. I had been trying to kill off D71767 <https://reviews.llvm.org/D71767> but instead I'll updated it (probably tomorrow) to show work I've already got downstream where I just need to write tests.  I doubt I'll get chance to cover additional nodes over the next few weeks so everything else should be fair game.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-zext.ll:78-85
+define <32 x i16> @zext_v32i8_v32i16(<32 x i8>* %in) #0 {
+; VBITS_GE_512-LABEL: zext_v32i8_v32i16:
+; VBITS_GE_512:    ptrue [[PG0:p[0-9]+]].b, vl32
+; VBITS_GE_512-NEXT:    ld1b { [[ZPR:z[0-9]+]].b }, [[PG0:p[0-9]+]]/z, [x0]
+; VBITS_GE_512-NEXT:    ptrue [[PG1:p[0-9]+]].h, vl32
+; VBITS_GE_512-NEXT:    and [[ZPR]].h, [[ZPR]].h, #0xff
+; VBITS_GE_512-NEXT:    st1h { [[ZPR]].h }, [[PG1]], [x8]
----------------
The test's output shows this is not really a zero extend.

The load will read 32 bytes into consecutive byte lanes.  The and will zero the odd numbered bytes (i.e. zero-extend the even lanes).  The store will write those 16 zero-extend even numbered bytes along with 16 more zeros that result from the load zeroing byte 32 onward.

I'm afraid to say that currently the extends are not going to be a cheap operation being they are effectively the reverse of the truncate operation.  Truncates use a uzp1 sequence, so the extends require an upklo sequence.

I've got a patch already to do this that I'll add you to.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85558/new/

https://reviews.llvm.org/D85558