[PATCH] D85558: [SVE] Implement fixed-width ZEXT lowering

Fri Aug 7 16:34:53 PDT 2020

cameron.mcinally abandoned this revision.
cameron.mcinally added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-zext.ll:78-85
+define <32 x i16> @zext_v32i8_v32i16(<32 x i8>* %in) #0 {
+; VBITS_GE_512-LABEL: zext_v32i8_v32i16:
+; VBITS_GE_512:    ptrue [[PG0:p[0-9]+]].b, vl32
+; VBITS_GE_512-NEXT:    ld1b { [[ZPR:z[0-9]+]].b }, [[PG0:p[0-9]+]]/z, [x0]
+; VBITS_GE_512-NEXT:    ptrue [[PG1:p[0-9]+]].h, vl32
+; VBITS_GE_512-NEXT:    and [[ZPR]].h, [[ZPR]].h, #0xff
+; VBITS_GE_512-NEXT:    st1h { [[ZPR]].h }, [[PG1]], [x8]
----------------
paulwalker-arm wrote:
> The test's output shows this is not really a zero extend.
> 
> The load will read 32 bytes into consecutive byte lanes.  The and will zero the odd numbered bytes (i.e. zero-extend the even lanes).  The store will write those 16 zero-extend even numbered bytes along with 16 more zeros that result from the load zeroing byte 32 onward.
> 
> I'm afraid to say that currently the extends are not going to be a cheap operation being they are effectively the reverse of the truncate operation.  Truncates use a uzp1 sequence, so the extends require an upklo sequence.
> 
> I've got a patch already to do this that I'll add you to.
Ah, they're packed. I got mixed up. Was thinking of it as:

`<n x 2 x i64> res = zext(<n x 2 x i32> x)`

But, yeah, I see the problem with that now. Thanks.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85558/new/

https://reviews.llvm.org/D85558