[all-commits] [llvm/llvm-project] 05c0d3: [X86][SSE] Prefer trunc(movd(x)) to pextrb(x, 0)

Fri Mar 13 11:43:32 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 05c0d3491822b3a74f49be2fe8c8273e436ab7ec
      https://github.com/llvm/llvm-project/commit/05c0d3491822b3a74f49be2fe8c8273e436ab7ec
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-03-13 (Fri, 13 Mar 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/avg.ll
    M llvm/test/CodeGen/X86/avx512-vec3-crash.ll
    M llvm/test/CodeGen/X86/bitcast-vector-bool.ll
    M llvm/test/CodeGen/X86/buildvec-insertvec.ll
    M llvm/test/CodeGen/X86/extract-concat.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-smax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-smin.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
    M llvm/test/CodeGen/X86/scalar_widen_div.ll
    M llvm/test/CodeGen/X86/var-permute-128.ll
    M llvm/test/CodeGen/X86/var-permute-512.ll
    M llvm/test/CodeGen/X86/vector-bitreverse.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
    M llvm/test/CodeGen/X86/vector-reduce-add.ll
    M llvm/test/CodeGen/X86/vector-reduce-and.ll
    M llvm/test/CodeGen/X86/vector-reduce-mul.ll
    M llvm/test/CodeGen/X86/vector-reduce-or.ll
    M llvm/test/CodeGen/X86/vector-reduce-smax.ll
    M llvm/test/CodeGen/X86/vector-reduce-smin.ll
    M llvm/test/CodeGen/X86/vector-reduce-umax.ll
    M llvm/test/CodeGen/X86/vector-reduce-umin.ll
    M llvm/test/CodeGen/X86/vector-reduce-xor.ll
    M llvm/test/CodeGen/X86/widen_bitops-0.ll

  Log Message:
  -----------
  [X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)

If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138