[PATCH] D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv

Thu Nov 4 16:41:12 PDT 2021

LemonBoy added subscribers: aemerson, eli.friedman, LemonBoy.
LemonBoy added a comment.

I'm not sure this transform is sound, if you consider a simple case such as `uaddlv4h_from_v8i8` and perform the expansion by hand you'll see wildly different results.
Given an input vector of `u8{1,1,1,1,1,1,1,1}` one would expect `uaddlp` to turn it into `u16{2,2,2,2}` and `uaddlv` to fold it into `u16{0x8}`. On the other hand, replacing everything with a single `uaddlv` operating on a `v0.16b` yields `u16{0x404}` as it skips the pairwise sum step.

We've observed some serious miscompilations after upgrading to LLVM13 as the offending pattern is produced when expanding `ctpop`, as shown in the following example:

  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
  target triple = "aarch64-unknown-linux-musl"

  define i16 @count([4 x i16]* %0) {
  Entry:
    %1 = bitcast [4 x i16]* %0 to <4 x i16>*
    %2 = load <4 x i16>, <4 x i16>* %1, align 2
    %3 = call <4 x i16> @llvm.ctpop.v4i16(<4 x i16> %2)
    %4 = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> %3)
    ret i16 %4
  }

  declare <4 x i16> @llvm.ctpop.v4i16(<4 x i16>)
  declare i16 @llvm.vector.reduce.add.v4i16(<4 x i16>)

cc some more ARM people: @eli.friedman @aemerson

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104236/new/

https://reviews.llvm.org/D104236