[PATCH] D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv
LemonBoy via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 4 16:41:12 PDT 2021
LemonBoy added subscribers: aemerson, eli.friedman, LemonBoy.
LemonBoy added a comment.
I'm not sure this transform is sound, if you consider a simple case such as `uaddlv4h_from_v8i8` and perform the expansion by hand you'll see wildly different results.
Given an input vector of `u8{1,1,1,1,1,1,1,1}` one would expect `uaddlp` to turn it into `u16{2,2,2,2}` and `uaddlv` to fold it into `u16{0x8}`. On the other hand, replacing everything with a single `uaddlv` operating on a `v0.16b` yields `u16{0x404}` as it skips the pairwise sum step.
We've observed some serious miscompilations after upgrading to LLVM13 as the offending pattern is produced when expanding `ctpop`, as shown in the following example:
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-musl"
define i16 @count([4 x i16]* %0) {
Entry:
%1 = bitcast [4 x i16]* %0 to <4 x i16>*
%2 = load <4 x i16>, <4 x i16>* %1, align 2
%3 = call <4 x i16> @llvm.ctpop.v4i16(<4 x i16> %2)
%4 = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> %3)
ret i16 %4
}
declare <4 x i16> @llvm.ctpop.v4i16(<4 x i16>)
declare i16 @llvm.vector.reduce.add.v4i16(<4 x i16>)
cc some more ARM people: @eli.friedman @aemerson
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104236/new/
https://reviews.llvm.org/D104236
More information about the llvm-commits
mailing list