[PATCH] D85558: [SVE] Implement fixed-width ZEXT lowering
Cameron McInally via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 7 16:34:53 PDT 2020
cameron.mcinally abandoned this revision.
cameron.mcinally added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-zext.ll:78-85
+define <32 x i16> @zext_v32i8_v32i16(<32 x i8>* %in) #0 {
+; VBITS_GE_512-LABEL: zext_v32i8_v32i16:
+; VBITS_GE_512: ptrue [[PG0:p[0-9]+]].b, vl32
+; VBITS_GE_512-NEXT: ld1b { [[ZPR:z[0-9]+]].b }, [[PG0:p[0-9]+]]/z, [x0]
+; VBITS_GE_512-NEXT: ptrue [[PG1:p[0-9]+]].h, vl32
+; VBITS_GE_512-NEXT: and [[ZPR]].h, [[ZPR]].h, #0xff
+; VBITS_GE_512-NEXT: st1h { [[ZPR]].h }, [[PG1]], [x8]
----------------
paulwalker-arm wrote:
> The test's output shows this is not really a zero extend.
>
> The load will read 32 bytes into consecutive byte lanes. The and will zero the odd numbered bytes (i.e. zero-extend the even lanes). The store will write those 16 zero-extend even numbered bytes along with 16 more zeros that result from the load zeroing byte 32 onward.
>
> I'm afraid to say that currently the extends are not going to be a cheap operation being they are effectively the reverse of the truncate operation. Truncates use a uzp1 sequence, so the extends require an upklo sequence.
>
> I've got a patch already to do this that I'll add you to.
Ah, they're packed. I got mixed up. Was thinking of it as:
`<n x 2 x i64> res = zext(<n x 2 x i32> x)`
But, yeah, I see the problem with that now. Thanks.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85558/new/
https://reviews.llvm.org/D85558
More information about the llvm-commits
mailing list