[PATCH] D38313: [InstCombine] Introducing Aggressive Instruction Combine pass

Tue Dec 19 09:08:48 PST 2017

craig.topper added a comment.

Taking your first example and increasing the element count to get legal types

  define i16 @foo(<8 x i32> %X) {
    %A1 = zext <8 x i32> %X to <8 x i64>
    %B1 = mul <8 x i64> %A1, %A1
    %C1 = extractelement <8 x i64> %B1, i32 0
    %D1 = extractelement <8 x i64> %B1, i32 1
    %E1 = add i64 %C1, %D1
    %T = trunc i64 %E1 to i16
    ret i16 %T
  }

  define i16 @bar(<8 x i32> %X) {
    %A2 = trunc <8 x i32> %X to <8 x i16>
    %B2 = mul <8 x i16> %A2, %A2
    %C2 = extractelement <8 x i16> %B2, i32 0
    %D2 = extractelement <8 x i16> %B2, i32 1
    %T = add i16 %C2, %D2
    ret i16 %T
  }

Then running that through llc with avx2. I get worse code for bar than foo. Vector truncates on x86 aren't good. There is no truncate instruction until avx512 and even then its 2 uops.

https://reviews.llvm.org/D38313