[PATCH] D83135: [VectorCombine] Narrow ZExt that feed binop followed by trunc.

Fri Jul 3 09:06:50 PDT 2020

fhahn created this revision.
fhahn added reviewers: spatel, RKSimon, lebedev.ri, xbolva00.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: LLVM.
fhahn updated this revision to Diff 275408.
fhahn added a comment.

Move llvm/test/Transforms/VectorCombine/AArch64/lit.local.cfg to NFC test patch.

In the pattern below, the trunc can be eliminated by shortening the
zexts, if the zexts remain.

  trunc (binop (zext), (zext)) to ty -> binop (zext to ty) (zext to ty)

Initially limited to add/sub.

This transform is only performed if the shortened zexts are free (can be
folded into the binary op).

I am not entirely sure VectorCombine is the right place to do the
transform, but I think we want to limit it to cases where we know the
shorter zexts are free/legal on the target. I am not sure if we have an
easy way to check the latter though.

Alive proof sketches (scalar versions so we do not run into timeouts):

- add: https://alive2.llvm.org/ce/z/DgABb-
- add nuw: https://alive2.llvm.org/ce/z/yx5Vag
- add nsw: https://alive2.llvm.org/ce/z/yyVoRU
- sub: https://alive2.llvm.org/ce/z/bKj22_
- sub nuw: https://alive2.llvm.org/ce/z/H8soWR
- sub nsw: https://alive2.llvm.org/ce/z/QLVNDK

On AArch64, codegen for the following input can be improved (this is
from hot code in SPEC2006/h264)

define <8 x i32> @test(<8 x i16>* %p1, <8 x i16>* %p2) {

  %l.1 = load <8 x i16>, <8 x i16>* %p1, align 2
  %ext.1 = zext <8 x i16> %l.1 to <8 x i64>
  %l.2 = load <8 x i16>, <8 x i16>* %p2, align 2
  %ext.2 = zext <8 x i16> %l.2 to <8 x i64>
  %sub = sub nsw <8 x i64> %ext.1, %ext.2
  %t = trunc <8 x i64> %sub to <8 x i32>
  ret <8 x i32> %t

}

Without patch

  ldr     q0, [x0]
  ldr     q1, [x1]
  ushll2  v2.4s, v0.8h, #0
  ushll   v0.4s, v0.4h, #0
  ushll2  v3.4s, v1.8h, #0
  ushll   v1.4s, v1.4h, #0
  usubl2  v4.2d, v0.4s, v1.4s
  usubl   v0.2d, v0.2s, v1.2s
  usubl   v1.2d, v2.2s, v3.2s
  usubl2  v5.2d, v2.4s, v3.4s
  xtn     v1.2s, v1.2d
  xtn     v0.2s, v0.2d
  xtn2    v1.4s, v5.2d
  xtn2    v0.4s, v4.2d
  ret

With patch

  ldr     q0, [x0]
  ldr     q2, [x1]
  usubl2  v1.4s, v0.8h, v2.8h
  usubl   v0.4s, v0.4h, v2.4h
  ret

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D83135

Files:
  llvm/lib/Transforms/Vectorize/VectorCombine.cpp
  llvm/test/Transforms/VectorCombine/AArch64/shorten-extend-if-free.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83135.275408.patch
Type: text/x-patch
Size: 5088 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200703/26747ae2/attachment.bin>