[PATCH] D38313: [InstCombine] Introducing Aggressive Instruction Combine pass
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 19 09:08:48 PST 2017
craig.topper added a comment.
Taking your first example and increasing the element count to get legal types
define i16 @foo(<8 x i32> %X) {
%A1 = zext <8 x i32> %X to <8 x i64>
%B1 = mul <8 x i64> %A1, %A1
%C1 = extractelement <8 x i64> %B1, i32 0
%D1 = extractelement <8 x i64> %B1, i32 1
%E1 = add i64 %C1, %D1
%T = trunc i64 %E1 to i16
ret i16 %T
}
define i16 @bar(<8 x i32> %X) {
%A2 = trunc <8 x i32> %X to <8 x i16>
%B2 = mul <8 x i16> %A2, %A2
%C2 = extractelement <8 x i16> %B2, i32 0
%D2 = extractelement <8 x i16> %B2, i32 1
%T = add i16 %C2, %D2
ret i16 %T
}
Then running that through llc with avx2. I get worse code for bar than foo. Vector truncates on x86 aren't good. There is no truncate instruction until avx512 and even then its 2 uops.
https://reviews.llvm.org/D38313
More information about the llvm-commits
mailing list